MATH 204 Introduction to Statistics

MATH 204 Introduction to Statistics
Lecture 13: Intro to Statistical Power
JMG
1

Goals for LectureIntroduce the notion of statistical power.
2

Goals for Lecture

Introduce the notion of statistical power.
Explain the notion of effect size.

Goals for Lecture

Introduce the notion of statistical power.
Explain the notion of effect size.
Demonstrate the relationship between sample size and type II error.

Power Video

5

Recall Error Types

When conducting a hypothesis test, there are four possible scenarios to consider:

Recall Error Types

When conducting a hypothesis test, there are four possible scenarios to consider:

The significance level (denoted by $α$ ) previously defined tells us the probability of making a Type 1 Error.

Recall Error Types

When conducting a hypothesis test, there are four possible scenarios to consider:

The significance level (denoted by $α$ ) previously defined tells us the probability of making a Type 1 Error.
What about the probability of making a Type II Error?

Type II Probabilities

Consider a hypothesis test for

$H_{0} : μ = μ_{0}$

versus

$H_{A} : μ \neq μ_{0}$

Type II Probabilities

Consider a hypothesis test for

$H_{0} : μ = μ_{0}$

versus

$H_{A} : μ \neq μ_{0}$

For the null hypothesis, there is a single value for the parameter.

Type II Probabilities

Consider a hypothesis test for

$H_{0} : μ = μ_{0}$

versus

$H_{A} : μ \neq μ_{0}$

For the null hypothesis, there is a single value for the parameter.
For the alternative, there is a range of values for the parameter.

Type II Probabilities

Consider a hypothesis test for

$H_{0} : μ = μ_{0}$

versus

$H_{A} : μ \neq μ_{0}$

For the null hypothesis, there is a single value for the parameter.
For the alternative, there is a range of values for the parameter.
The point is, computing probabilities for type II error is more complicated than computing probabilities for type I error.

Statistical PowerStatistical power is defined to be 1−β1−β, where ββ is the probability of making a type II error, that is, failing to reject a false null hypothesis. 
13

Statistical Power

Statistical power is defined to be $1 - β$ , where $β$ is the probability of making a type II error, that is, failing to reject a false null hypothesis.
Statistical power is important because it permits us to decide an appropriate sample size for detecting a desired effect size before collecting data.

Statistical Power

Statistical power is defined to be $1 - β$ , where $β$ is the probability of making a type II error, that is, failing to reject a false null hypothesis.
Statistical power is important because it permits us to decide an appropriate sample size for detecting a desired effect size before collecting data.
An effect size is a degree of difference in values of a parameter from that of the null hypothesis.

Effect Size Examples

Suppose we want to know if a coin is fair or not. That is, we test

$H_{0} : p = 0.5$

versus

$H_{A} : p \neq 0.5$

Effect Size Examples

Suppose we want to know if a coin is fair or not. That is, we test

$H_{0} : p = 0.5$

versus

$H_{A} : p \neq 0.5$

Question: Suppose that the coin truly is not fair. How much do we care if $p = 0.51$ as opposed to $p = 0.5$

Effect Size Examples

Suppose we want to know if a coin is fair or not. That is, we test

$H_{0} : p = 0.5$

versus

$H_{A} : p \neq 0.5$

Question: Suppose that the coin truly is not fair. How much do we care if $p = 0.51$ as opposed to $p = 0.5$
This illustrates the concept of effect size.

Effect Size Examples

Suppose we want to know if a coin is fair or not. That is, we test

$H_{0} : p = 0.5$

versus

$H_{A} : p \neq 0.5$

Question: Suppose that the coin truly is not fair. How much do we care if $p = 0.51$ as opposed to $p = 0.5$
This illustrates the concept of effect size.
As another example, suppose we are testing to see if a certain drug reduces the duration of headaches. We could use a two-sample t-test to compare a difference in the average length of headaches between a treatment and a control group. How much does it matter if the medication only reduces the duration of headache by two minutes? Especially, if the drug is very expensive.

Sample Size and EffectsIf you go looking for an effect, you will find one. 
20

Sample Size and Effects

If you go looking for an effect, you will find one.
As our previous examples show, just because there is an effect doesn't mean that it matters in practice.

Sample Size and Effects

If you go looking for an effect, you will find one.
As our previous examples show, just because there is an effect doesn't mean that it matters in practice.
It is also the case that small effects are easier to detect with large sample sizes.

Sample Size and Effects

If you go looking for an effect, you will find one.
As our previous examples show, just because there is an effect doesn't mean that it matters in practice.
It is also the case that small effects are easier to detect with large sample sizes.
Important: You should always determine an appropriate sample size before you collect data and especially before you conduct a hypothesis test. A power analysis is the statistical technique used to do so.

Sample Size and Effects

If you go looking for an effect, you will find one.
As our previous examples show, just because there is an effect doesn't mean that it matters in practice.
It is also the case that small effects are easier to detect with large sample sizes.
Important: You should always determine an appropriate sample size before you collect data and especially before you conduct a hypothesis test. A power analysis is the statistical technique used to do so.
Over the next few slides, we will illustrate why you need to conduct a power analysis before collecting data and explain the logic behind power analyses.

p-hacking

p-hacking occurs if
- You conduct a statistical analysis, fail to detect an effect, collect more data, redo the analysis, and continue this until you find an effect.

p-hacking

p-hacking occurs if
- You conduct a statistical analysis, fail to detect an effect, collect more data, redo the analysis, and continue this until you find an effect.
p-hacking is bad statistical practice and should always be avoided.

p-hacking

p-hacking occurs if
- You conduct a statistical analysis, fail to detect an effect, collect more data, redo the analysis, and continue this until you find an effect.
p-hacking is bad statistical practice and should always be avoided.
To illustrate p-hacking, we will do the following:
- Take 5 samples from two normal distributions with the same mean and variance (so we know the null hypothesis is true). Use a two-sample t-test to compute a p-value.
- Do the same thing for samples of size 6, 7, 8, ... to 100.
- Plot the p-values versus the sample size.

p-hacking

p-hacking occurs if
- You conduct a statistical analysis, fail to detect an effect, collect more data, redo the analysis, and continue this until you find an effect.
p-hacking is bad statistical practice and should always be avoided.
To illustrate p-hacking, we will do the following:
- Take 5 samples from two normal distributions with the same mean and variance (so we know the null hypothesis is true). Use a two-sample t-test to compute a p-value.
- Do the same thing for samples of size 6, 7, 8, ... to 100.
- Plot the p-values versus the sample size.
You will see that as long as we keep increasing the sample size, we will eventually reject a null hypothesis we know to be true.

p-hacking Simulation

Determining Sample SizeWe have seen that it is essential to choose an appropriate sample size in advance of any data collection and definitely before conducting a hypothesis test.
30

Determining Sample Size

We have seen that it is essential to choose an appropriate sample size in advance of any data collection and definitely before conducting a hypothesis test.
So, how do we choose a sample size?

Determining Sample Size

We have seen that it is essential to choose an appropriate sample size in advance of any data collection and definitely before conducting a hypothesis test.
So, how do we choose a sample size?
A typical method is to use a power analysis as follows:
- Decide an effect size and a significance level.
- Take a sample(s) of a fixed size from a distribution(s) that correspond to the difference from the effect size set in the first step.
- Compute the probability $β$ of failing to reject a false null hypothesis.
- Compute the power by taking $1 - β$ .
- A statistical test is typically considered "powerful" if its power is at least $0.8$ (80%).

Determining Sample Size

We have seen that it is essential to choose an appropriate sample size in advance of any data collection and definitely before conducting a hypothesis test.
So, how do we choose a sample size?
A typical method is to use a power analysis as follows:
- Decide an effect size and a significance level.
- Take a sample(s) of a fixed size from a distribution(s) that correspond to the difference from the effect size set in the first step.
- Compute the probability $β$ of failing to reject a false null hypothesis.
- Compute the power by taking $1 - β$ .
- A statistical test is typically considered "powerful" if its power is at least $0.8$ (80%).
You can also repeat this process to obtain the power as a function of sample size from which you can determine what is the smallest sample size required to get a sufficiently powerful test.

Practical Power AnalysisPower analysis can be done via simulation and usually this is the best way to compute power. 
34

Practical Power Analysis

Power analysis can be done via simulation and usually this is the best way to compute power.
For simple tests such as a t-test, R has built-in functions for computing power. For example,
- power.t.test
- power.prop.test

Practical Power Analysis

Power analysis can be done via simulation and usually this is the best way to compute power.
For simple tests such as a t-test, R has built-in functions for computing power. For example,
- power.t.test
- power.prop.test
Let's see some examples of power analysis.

SummaryIt is important to choose an appropriate sample size in advance, that is, before collecting data. 
37

Summary

It is important to choose an appropriate sample size in advance, that is, before collecting data.
Power analysis is the statistical procedure that can be used to determine a good sample size.

Summary

It is important to choose an appropriate sample size in advance, that is, before collecting data.
Power analysis is the statistical procedure that can be used to determine a good sample size.
Repeated an analysis with varying sample sizes until observing an effect or significance is p-hacking and this should be avoided.

Linear Regression

Our next topic in the course is linear regression.

Linear Regression

Our next topic in the course is linear regression.
Linear regression is one a fundamental method for statistical modeling. Much of advanced statistics builds on linear regression. So, it is important to become proficient in the practice of linear regression.

Linear Regression

Our next topic in the course is linear regression.
Linear regression is one a fundamental method for statistical modeling. Much of advanced statistics builds on linear regression. So, it is important to become proficient in the practice of linear regression.
It is also important to have some understanding of what is going on "under the hood" in the construction and analysis of linear models. This is taken up in Chapter 8 of the textbook. To get started, please view the video on the next slide.

Intro to Regression Video

43

Notes44

Notes45

Notes46

MATH 204 Introduction to Statistics
Lecture 13: Intro to Statistical Power
JMG
1

Goals for LectureIntroduce the notion of statistical power.
2

Goals for Lecture

Introduce the notion of statistical power.
Explain the notion of effect size.

Goals for Lecture

Introduce the notion of statistical power.
Explain the notion of effect size.
Demonstrate the relationship between sample size and type II error.

Power Video

5

Recall Error Types

When conducting a hypothesis test, there are four possible scenarios to consider:

Recall Error Types

When conducting a hypothesis test, there are four possible scenarios to consider:

The significance level (denoted by $α$ ) previously defined tells us the probability of making a Type 1 Error.

Recall Error Types

When conducting a hypothesis test, there are four possible scenarios to consider:

The significance level (denoted by $α$ ) previously defined tells us the probability of making a Type 1 Error.
What about the probability of making a Type II Error?

Type II Probabilities

Consider a hypothesis test for

$H_{0} : μ = μ_{0}$

versus

$H_{A} : μ \neq μ_{0}$

Type II Probabilities

Consider a hypothesis test for

$H_{0} : μ = μ_{0}$

versus

$H_{A} : μ \neq μ_{0}$

For the null hypothesis, there is a single value for the parameter.

Type II Probabilities

Consider a hypothesis test for

$H_{0} : μ = μ_{0}$

versus

$H_{A} : μ \neq μ_{0}$

For the null hypothesis, there is a single value for the parameter.
For the alternative, there is a range of values for the parameter.

Type II Probabilities

Consider a hypothesis test for

$H_{0} : μ = μ_{0}$

versus

$H_{A} : μ \neq μ_{0}$

For the null hypothesis, there is a single value for the parameter.
For the alternative, there is a range of values for the parameter.
The point is, computing probabilities for type II error is more complicated than computing probabilities for type I error.

Statistical PowerStatistical power is defined to be 1−β1−β, where ββ is the probability of making a type II error, that is, failing to reject a false null hypothesis. 
13

Statistical Power

Statistical power is defined to be $1 - β$ , where $β$ is the probability of making a type II error, that is, failing to reject a false null hypothesis.
Statistical power is important because it permits us to decide an appropriate sample size for detecting a desired effect size before collecting data.

Statistical Power

Statistical power is defined to be $1 - β$ , where $β$ is the probability of making a type II error, that is, failing to reject a false null hypothesis.
Statistical power is important because it permits us to decide an appropriate sample size for detecting a desired effect size before collecting data.
An effect size is a degree of difference in values of a parameter from that of the null hypothesis.

Effect Size Examples

Suppose we want to know if a coin is fair or not. That is, we test

$H_{0} : p = 0.5$

versus

$H_{A} : p \neq 0.5$

Effect Size Examples

Suppose we want to know if a coin is fair or not. That is, we test

$H_{0} : p = 0.5$

versus

$H_{A} : p \neq 0.5$

Question: Suppose that the coin truly is not fair. How much do we care if $p = 0.51$ as opposed to $p = 0.5$

Effect Size Examples

Suppose we want to know if a coin is fair or not. That is, we test

$H_{0} : p = 0.5$

versus

$H_{A} : p \neq 0.5$

Question: Suppose that the coin truly is not fair. How much do we care if $p = 0.51$ as opposed to $p = 0.5$
This illustrates the concept of effect size.

Effect Size Examples

Suppose we want to know if a coin is fair or not. That is, we test

$H_{0} : p = 0.5$

versus

$H_{A} : p \neq 0.5$

Question: Suppose that the coin truly is not fair. How much do we care if $p = 0.51$ as opposed to $p = 0.5$
This illustrates the concept of effect size.
As another example, suppose we are testing to see if a certain drug reduces the duration of headaches. We could use a two-sample t-test to compare a difference in the average length of headaches between a treatment and a control group. How much does it matter if the medication only reduces the duration of headache by two minutes? Especially, if the drug is very expensive.

Sample Size and EffectsIf you go looking for an effect, you will find one. 
20

Sample Size and Effects

If you go looking for an effect, you will find one.
As our previous examples show, just because there is an effect doesn't mean that it matters in practice.

Sample Size and Effects

If you go looking for an effect, you will find one.
As our previous examples show, just because there is an effect doesn't mean that it matters in practice.
It is also the case that small effects are easier to detect with large sample sizes.

Sample Size and Effects

If you go looking for an effect, you will find one.
As our previous examples show, just because there is an effect doesn't mean that it matters in practice.
It is also the case that small effects are easier to detect with large sample sizes.
Important: You should always determine an appropriate sample size before you collect data and especially before you conduct a hypothesis test. A power analysis is the statistical technique used to do so.

Sample Size and Effects

If you go looking for an effect, you will find one.
As our previous examples show, just because there is an effect doesn't mean that it matters in practice.
It is also the case that small effects are easier to detect with large sample sizes.
Important: You should always determine an appropriate sample size before you collect data and especially before you conduct a hypothesis test. A power analysis is the statistical technique used to do so.
Over the next few slides, we will illustrate why you need to conduct a power analysis before collecting data and explain the logic behind power analyses.

p-hacking

p-hacking occurs if
- You conduct a statistical analysis, fail to detect an effect, collect more data, redo the analysis, and continue this until you find an effect.

p-hacking

p-hacking occurs if
- You conduct a statistical analysis, fail to detect an effect, collect more data, redo the analysis, and continue this until you find an effect.
p-hacking is bad statistical practice and should always be avoided.

p-hacking

p-hacking occurs if
- You conduct a statistical analysis, fail to detect an effect, collect more data, redo the analysis, and continue this until you find an effect.
p-hacking is bad statistical practice and should always be avoided.
To illustrate p-hacking, we will do the following:
- Take 5 samples from two normal distributions with the same mean and variance (so we know the null hypothesis is true). Use a two-sample t-test to compute a p-value.
- Do the same thing for samples of size 6, 7, 8, ... to 100.
- Plot the p-values versus the sample size.

p-hacking

p-hacking occurs if
- You conduct a statistical analysis, fail to detect an effect, collect more data, redo the analysis, and continue this until you find an effect.
p-hacking is bad statistical practice and should always be avoided.
To illustrate p-hacking, we will do the following:
- Take 5 samples from two normal distributions with the same mean and variance (so we know the null hypothesis is true). Use a two-sample t-test to compute a p-value.
- Do the same thing for samples of size 6, 7, 8, ... to 100.
- Plot the p-values versus the sample size.
You will see that as long as we keep increasing the sample size, we will eventually reject a null hypothesis we know to be true.

p-hacking Simulation

Determining Sample SizeWe have seen that it is essential to choose an appropriate sample size in advance of any data collection and definitely before conducting a hypothesis test.
30

Determining Sample Size

We have seen that it is essential to choose an appropriate sample size in advance of any data collection and definitely before conducting a hypothesis test.
So, how do we choose a sample size?

Determining Sample Size

We have seen that it is essential to choose an appropriate sample size in advance of any data collection and definitely before conducting a hypothesis test.
So, how do we choose a sample size?
A typical method is to use a power analysis as follows:
- Decide an effect size and a significance level.
- Take a sample(s) of a fixed size from a distribution(s) that correspond to the difference from the effect size set in the first step.
- Compute the probability $β$ of failing to reject a false null hypothesis.
- Compute the power by taking $1 - β$ .
- A statistical test is typically considered "powerful" if its power is at least $0.8$ (80%).

Determining Sample Size

We have seen that it is essential to choose an appropriate sample size in advance of any data collection and definitely before conducting a hypothesis test.
So, how do we choose a sample size?
A typical method is to use a power analysis as follows:
- Decide an effect size and a significance level.
- Take a sample(s) of a fixed size from a distribution(s) that correspond to the difference from the effect size set in the first step.
- Compute the probability $β$ of failing to reject a false null hypothesis.
- Compute the power by taking $1 - β$ .
- A statistical test is typically considered "powerful" if its power is at least $0.8$ (80%).
You can also repeat this process to obtain the power as a function of sample size from which you can determine what is the smallest sample size required to get a sufficiently powerful test.

Practical Power AnalysisPower analysis can be done via simulation and usually this is the best way to compute power. 
34

Practical Power Analysis

Power analysis can be done via simulation and usually this is the best way to compute power.
For simple tests such as a t-test, R has built-in functions for computing power. For example,
- power.t.test
- power.prop.test

Practical Power Analysis

Power analysis can be done via simulation and usually this is the best way to compute power.
For simple tests such as a t-test, R has built-in functions for computing power. For example,
- power.t.test
- power.prop.test
Let's see some examples of power analysis.

SummaryIt is important to choose an appropriate sample size in advance, that is, before collecting data. 
37

Summary

It is important to choose an appropriate sample size in advance, that is, before collecting data.
Power analysis is the statistical procedure that can be used to determine a good sample size.

Summary

It is important to choose an appropriate sample size in advance, that is, before collecting data.
Power analysis is the statistical procedure that can be used to determine a good sample size.
Repeated an analysis with varying sample sizes until observing an effect or significance is p-hacking and this should be avoided.

Linear Regression

Our next topic in the course is linear regression.

Linear Regression

Our next topic in the course is linear regression.
Linear regression is one a fundamental method for statistical modeling. Much of advanced statistics builds on linear regression. So, it is important to become proficient in the practice of linear regression.

Linear Regression

Our next topic in the course is linear regression.
Linear regression is one a fundamental method for statistical modeling. Much of advanced statistics builds on linear regression. So, it is important to become proficient in the practice of linear regression.
It is also important to have some understanding of what is going on "under the hood" in the construction and analysis of linear models. This is taken up in Chapter 8 of the textbook. To get started, please view the video on the next slide.

Intro to Regression Video

43

Notes44

Notes45

Notes46

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help
o	Tile View: Overview of Slides
s	Toggle scribble toolbox

MATH 204 Introduction to Statistics

Lecture 13: Intro to Statistical Power

JMG

Goals for Lecture

Goals for Lecture

Goals for Lecture

Power Video

Recall Error Types

Recall Error Types

Recall Error Types

Type II Probabilities

Type II Probabilities

Type II Probabilities

Type II Probabilities

Statistical Power

Statistical Power

Statistical Power

Effect Size Examples

Effect Size Examples

Effect Size Examples

Effect Size Examples

Sample Size and Effects

Sample Size and Effects

Sample Size and Effects

Sample Size and Effects

Sample Size and Effects

p-hacking

p-hacking

p-hacking

p-hacking

p-hacking Simulation

Determining Sample Size

Determining Sample Size

Determining Sample Size

Determining Sample Size

Practical Power Analysis

Practical Power Analysis

Practical Power Analysis

Summary

Summary

Summary

Linear Regression

Linear Regression

Linear Regression

Intro to Regression Video

Notes

Notes

Notes

Goals for Lecture

Help

MATH 204 Introduction to Statistics

MATH 204 Introduction to Statistics

Lecture 13: Intro to Statistical Power

JMG

Goals for Lecture

Goals for Lecture

Goals for Lecture

Power Video

Recall Error Types

Recall Error Types

Recall Error Types

Type II Probabilities

Type II Probabilities

Type II Probabilities

Type II Probabilities

Statistical Power

Statistical Power

Statistical Power

Effect Size Examples

Effect Size Examples

Effect Size Examples

Effect Size Examples

Sample Size and Effects

Sample Size and Effects

Sample Size and Effects

Sample Size and Effects

Sample Size and Effects

p-hacking

p-hacking

p-hacking