Introduce the notion of statistical power.
Explain the notion of effect size.
Introduce the notion of statistical power.
Explain the notion of effect size.
Demonstrate the relationship between sample size and type II error.
When conducting a hypothesis test, there are four possible scenarios to consider:
When conducting a hypothesis test, there are four possible scenarios to consider:
When conducting a hypothesis test, there are four possible scenarios to consider:
The significance level (denoted by αα) previously defined tells us the probability of making a Type 1 Error.
What about the probability of making a Type II Error?
H0: μ=μ0H0: μ=μ0
versus
HA: μ≠μ0HA: μ≠μ0
H0: μ=μ0H0: μ=μ0
versus
HA: μ≠μ0HA: μ≠μ0
H0: μ=μ0H0: μ=μ0
versus
HA: μ≠μ0HA: μ≠μ0
For the null hypothesis, there is a single value for the parameter.
For the alternative, there is a range of values for the parameter.
H0: μ=μ0H0: μ=μ0
versus
HA: μ≠μ0HA: μ≠μ0
For the null hypothesis, there is a single value for the parameter.
For the alternative, there is a range of values for the parameter.
The point is, computing probabilities for type II error is more complicated than computing probabilities for type I error.
Statistical power is defined to be 1−β1−β, where ββ is the probability of making a type II error, that is, failing to reject a false null hypothesis.
Statistical power is important because it permits us to decide an appropriate sample size for detecting a desired effect size before collecting data.
Statistical power is defined to be 1−β1−β, where ββ is the probability of making a type II error, that is, failing to reject a false null hypothesis.
Statistical power is important because it permits us to decide an appropriate sample size for detecting a desired effect size before collecting data.
An effect size is a degree of difference in values of a parameter from that of the null hypothesis.
H0: p=0.5H0: p=0.5
versus
HA: p≠0.5HA: p≠0.5
H0: p=0.5H0: p=0.5
versus
HA: p≠0.5HA: p≠0.5
H0: p=0.5H0: p=0.5
versus
HA: p≠0.5HA: p≠0.5
Question: Suppose that the coin truly is not fair. How much do we care if p=0.51p=0.51 as opposed to p=0.5p=0.5
This illustrates the concept of effect size.
H0: p=0.5H0: p=0.5
versus
HA: p≠0.5HA: p≠0.5
Question: Suppose that the coin truly is not fair. How much do we care if p=0.51p=0.51 as opposed to p=0.5p=0.5
This illustrates the concept of effect size.
As another example, suppose we are testing to see if a certain drug reduces the duration of headaches. We could use a two-sample t-test to compare a difference in the average length of headaches between a treatment and a control group. How much does it matter if the medication only reduces the duration of headache by two minutes? Especially, if the drug is very expensive.
If you go looking for an effect, you will find one.
As our previous examples show, just because there is an effect doesn't mean that it matters in practice.
If you go looking for an effect, you will find one.
As our previous examples show, just because there is an effect doesn't mean that it matters in practice.
It is also the case that small effects are easier to detect with large sample sizes.
If you go looking for an effect, you will find one.
As our previous examples show, just because there is an effect doesn't mean that it matters in practice.
It is also the case that small effects are easier to detect with large sample sizes.
Important: You should always determine an appropriate sample size before you collect data and especially before you conduct a hypothesis test. A power analysis is the statistical technique used to do so.
If you go looking for an effect, you will find one.
As our previous examples show, just because there is an effect doesn't mean that it matters in practice.
It is also the case that small effects are easier to detect with large sample sizes.
Important: You should always determine an appropriate sample size before you collect data and especially before you conduct a hypothesis test. A power analysis is the statistical technique used to do so.
Over the next few slides, we will illustrate why you need to conduct a power analysis before collecting data and explain the logic behind power analyses.
p-hacking occurs if
p-hacking occurs if
p-hacking is bad statistical practice and should always be avoided.
p-hacking occurs if
p-hacking is bad statistical practice and should always be avoided.
To illustrate p-hacking, we will do the following:
Take 5 samples from two normal distributions with the same mean and variance (so we know the null hypothesis is true). Use a two-sample t-test to compute a p-value.
Do the same thing for samples of size 6, 7, 8, ... to 100.
Plot the p-values versus the sample size.
p-hacking occurs if
p-hacking is bad statistical practice and should always be avoided.
To illustrate p-hacking, we will do the following:
Take 5 samples from two normal distributions with the same mean and variance (so we know the null hypothesis is true). Use a two-sample t-test to compute a p-value.
Do the same thing for samples of size 6, 7, 8, ... to 100.
Plot the p-values versus the sample size.
You will see that as long as we keep increasing the sample size, we will eventually reject a null hypothesis we know to be true.
We have seen that it is essential to choose an appropriate sample size in advance of any data collection and definitely before conducting a hypothesis test.
So, how do we choose a sample size?
We have seen that it is essential to choose an appropriate sample size in advance of any data collection and definitely before conducting a hypothesis test.
So, how do we choose a sample size?
A typical method is to use a power analysis as follows:
Decide an effect size and a significance level.
Take a sample(s) of a fixed size from a distribution(s) that correspond to the difference from the effect size set in the first step.
Compute the probability ββ of failing to reject a false null hypothesis.
Compute the power by taking 1−β1−β.
A statistical test is typically considered "powerful" if its power is at least 0.80.8 (80%).
We have seen that it is essential to choose an appropriate sample size in advance of any data collection and definitely before conducting a hypothesis test.
So, how do we choose a sample size?
A typical method is to use a power analysis as follows:
Decide an effect size and a significance level.
Take a sample(s) of a fixed size from a distribution(s) that correspond to the difference from the effect size set in the first step.
Compute the probability ββ of failing to reject a false null hypothesis.
Compute the power by taking 1−β1−β.
A statistical test is typically considered "powerful" if its power is at least 0.80.8 (80%).
You can also repeat this process to obtain the power as a function of sample size from which you can determine what is the smallest sample size required to get a sufficiently powerful test.
Power analysis can be done via simulation and usually this is the best way to compute power.
For simple tests such as a t-test, R has built-in functions for computing power. For example,
power.t.test
power.prop.test
Power analysis can be done via simulation and usually this is the best way to compute power.
For simple tests such as a t-test, R has built-in functions for computing power. For example,
power.t.test
power.prop.test
Let's see some examples of power analysis.
It is important to choose an appropriate sample size in advance, that is, before collecting data.
Power analysis is the statistical procedure that can be used to determine a good sample size.
It is important to choose an appropriate sample size in advance, that is, before collecting data.
Power analysis is the statistical procedure that can be used to determine a good sample size.
Repeated an analysis with varying sample sizes until observing an effect or significance is p-hacking and this should be avoided.
Our next topic in the course is linear regression.
Linear regression is one a fundamental method for statistical modeling. Much of advanced statistics builds on linear regression. So, it is important to become proficient in the practice of linear regression.
Our next topic in the course is linear regression.
Linear regression is one a fundamental method for statistical modeling. Much of advanced statistics builds on linear regression. So, it is important to become proficient in the practice of linear regression.
It is also important to have some understanding of what is going on "under the hood" in the construction and analysis of linear models. This is taken up in Chapter 8 of the textbook. To get started, please view the video on the next slide.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
o | Tile View: Overview of Slides |
s | Toggle scribble toolbox |
Esc | Back to slideshow |
Introduce the notion of statistical power.
Explain the notion of effect size.
Introduce the notion of statistical power.
Explain the notion of effect size.
Demonstrate the relationship between sample size and type II error.
When conducting a hypothesis test, there are four possible scenarios to consider:
When conducting a hypothesis test, there are four possible scenarios to consider:
When conducting a hypothesis test, there are four possible scenarios to consider:
The significance level (denoted by αα) previously defined tells us the probability of making a Type 1 Error.
What about the probability of making a Type II Error?
H0: μ=μ0H0: μ=μ0
versus
HA: μ≠μ0HA: μ≠μ0
H0: μ=μ0H0: μ=μ0
versus
HA: μ≠μ0HA: μ≠μ0
H0: μ=μ0H0: μ=μ0
versus
HA: μ≠μ0HA: μ≠μ0
For the null hypothesis, there is a single value for the parameter.
For the alternative, there is a range of values for the parameter.
H0: μ=μ0H0: μ=μ0
versus
HA: μ≠μ0HA: μ≠μ0
For the null hypothesis, there is a single value for the parameter.
For the alternative, there is a range of values for the parameter.
The point is, computing probabilities for type II error is more complicated than computing probabilities for type I error.
Statistical power is defined to be 1−β1−β, where ββ is the probability of making a type II error, that is, failing to reject a false null hypothesis.
Statistical power is important because it permits us to decide an appropriate sample size for detecting a desired effect size before collecting data.
Statistical power is defined to be 1−β1−β, where ββ is the probability of making a type II error, that is, failing to reject a false null hypothesis.
Statistical power is important because it permits us to decide an appropriate sample size for detecting a desired effect size before collecting data.
An effect size is a degree of difference in values of a parameter from that of the null hypothesis.
H0: p=0.5H0: p=0.5
versus
HA: p≠0.5HA: p≠0.5
H0: p=0.5H0: p=0.5
versus
HA: p≠0.5HA: p≠0.5
H0: p=0.5H0: p=0.5
versus
HA: p≠0.5HA: p≠0.5
Question: Suppose that the coin truly is not fair. How much do we care if p=0.51p=0.51 as opposed to p=0.5p=0.5
This illustrates the concept of effect size.
H0: p=0.5H0: p=0.5
versus
HA: p≠0.5HA: p≠0.5
Question: Suppose that the coin truly is not fair. How much do we care if p=0.51p=0.51 as opposed to p=0.5p=0.5
This illustrates the concept of effect size.
As another example, suppose we are testing to see if a certain drug reduces the duration of headaches. We could use a two-sample t-test to compare a difference in the average length of headaches between a treatment and a control group. How much does it matter if the medication only reduces the duration of headache by two minutes? Especially, if the drug is very expensive.
If you go looking for an effect, you will find one.
As our previous examples show, just because there is an effect doesn't mean that it matters in practice.
If you go looking for an effect, you will find one.
As our previous examples show, just because there is an effect doesn't mean that it matters in practice.
It is also the case that small effects are easier to detect with large sample sizes.
If you go looking for an effect, you will find one.
As our previous examples show, just because there is an effect doesn't mean that it matters in practice.
It is also the case that small effects are easier to detect with large sample sizes.
Important: You should always determine an appropriate sample size before you collect data and especially before you conduct a hypothesis test. A power analysis is the statistical technique used to do so.
If you go looking for an effect, you will find one.
As our previous examples show, just because there is an effect doesn't mean that it matters in practice.
It is also the case that small effects are easier to detect with large sample sizes.
Important: You should always determine an appropriate sample size before you collect data and especially before you conduct a hypothesis test. A power analysis is the statistical technique used to do so.
Over the next few slides, we will illustrate why you need to conduct a power analysis before collecting data and explain the logic behind power analyses.
p-hacking occurs if
p-hacking occurs if
p-hacking is bad statistical practice and should always be avoided.
p-hacking occurs if
p-hacking is bad statistical practice and should always be avoided.
To illustrate p-hacking, we will do the following:
Take 5 samples from two normal distributions with the same mean and variance (so we know the null hypothesis is true). Use a two-sample t-test to compute a p-value.
Do the same thing for samples of size 6, 7, 8, ... to 100.
Plot the p-values versus the sample size.
p-hacking occurs if
p-hacking is bad statistical practice and should always be avoided.
To illustrate p-hacking, we will do the following:
Take 5 samples from two normal distributions with the same mean and variance (so we know the null hypothesis is true). Use a two-sample t-test to compute a p-value.
Do the same thing for samples of size 6, 7, 8, ... to 100.
Plot the p-values versus the sample size.
You will see that as long as we keep increasing the sample size, we will eventually reject a null hypothesis we know to be true.
We have seen that it is essential to choose an appropriate sample size in advance of any data collection and definitely before conducting a hypothesis test.
So, how do we choose a sample size?
We have seen that it is essential to choose an appropriate sample size in advance of any data collection and definitely before conducting a hypothesis test.
So, how do we choose a sample size?
A typical method is to use a power analysis as follows:
Decide an effect size and a significance level.
Take a sample(s) of a fixed size from a distribution(s) that correspond to the difference from the effect size set in the first step.
Compute the probability ββ of failing to reject a false null hypothesis.
Compute the power by taking 1−β1−β.
A statistical test is typically considered "powerful" if its power is at least 0.80.8 (80%).
We have seen that it is essential to choose an appropriate sample size in advance of any data collection and definitely before conducting a hypothesis test.
So, how do we choose a sample size?
A typical method is to use a power analysis as follows:
Decide an effect size and a significance level.
Take a sample(s) of a fixed size from a distribution(s) that correspond to the difference from the effect size set in the first step.
Compute the probability ββ of failing to reject a false null hypothesis.
Compute the power by taking 1−β1−β.
A statistical test is typically considered "powerful" if its power is at least 0.80.8 (80%).
You can also repeat this process to obtain the power as a function of sample size from which you can determine what is the smallest sample size required to get a sufficiently powerful test.
Power analysis can be done via simulation and usually this is the best way to compute power.
For simple tests such as a t-test, R has built-in functions for computing power. For example,
power.t.test
power.prop.test
Power analysis can be done via simulation and usually this is the best way to compute power.
For simple tests such as a t-test, R has built-in functions for computing power. For example,
power.t.test
power.prop.test
Let's see some examples of power analysis.
It is important to choose an appropriate sample size in advance, that is, before collecting data.
Power analysis is the statistical procedure that can be used to determine a good sample size.
It is important to choose an appropriate sample size in advance, that is, before collecting data.
Power analysis is the statistical procedure that can be used to determine a good sample size.
Repeated an analysis with varying sample sizes until observing an effect or significance is p-hacking and this should be avoided.
Our next topic in the course is linear regression.
Linear regression is one a fundamental method for statistical modeling. Much of advanced statistics builds on linear regression. So, it is important to become proficient in the practice of linear regression.
Our next topic in the course is linear regression.
Linear regression is one a fundamental method for statistical modeling. Much of advanced statistics builds on linear regression. So, it is important to become proficient in the practice of linear regression.
It is also important to have some understanding of what is going on "under the hood" in the construction and analysis of linear models. This is taken up in Chapter 8 of the textbook. To get started, please view the video on the next slide.