+ - 0:00:00
Notes for current slide
Notes for next slide

MATH 204 Introduction to Statistics

Lecture 13: Intro to Statistical Power

JMG

1

Goals for Lecture

  • Introduce the notion of statistical power.
2

Goals for Lecture

  • Introduce the notion of statistical power.

  • Explain the notion of effect size.

3

Goals for Lecture

  • Introduce the notion of statistical power.

  • Explain the notion of effect size.

  • Demonstrate the relationship between sample size and type II error.

4

Power Video

5

Recall Error Types

When conducting a hypothesis test, there are four possible scenarios to consider:

6

Recall Error Types

When conducting a hypothesis test, there are four possible scenarios to consider:

  • The significance level (denoted by αα) previously defined tells us the probability of making a Type 1 Error.
7

Recall Error Types

When conducting a hypothesis test, there are four possible scenarios to consider:

  • The significance level (denoted by αα) previously defined tells us the probability of making a Type 1 Error.

  • What about the probability of making a Type II Error?

8

Type II Probabilities

  • Consider a hypothesis test for

H0: μ=μ0H0: μ=μ0

versus

HA: μμ0HA: μμ0

9

Type II Probabilities

  • Consider a hypothesis test for

H0: μ=μ0H0: μ=μ0

versus

HA: μμ0HA: μμ0

  • For the null hypothesis, there is a single value for the parameter.
10

Type II Probabilities

  • Consider a hypothesis test for

H0: μ=μ0H0: μ=μ0

versus

HA: μμ0HA: μμ0

  • For the null hypothesis, there is a single value for the parameter.

  • For the alternative, there is a range of values for the parameter.

11

Type II Probabilities

  • Consider a hypothesis test for

H0: μ=μ0H0: μ=μ0

versus

HA: μμ0HA: μμ0

  • For the null hypothesis, there is a single value for the parameter.

  • For the alternative, there is a range of values for the parameter.

  • The point is, computing probabilities for type II error is more complicated than computing probabilities for type I error.

12

Statistical Power

  • Statistical power is defined to be 1β1β, where ββ is the probability of making a type II error, that is, failing to reject a false null hypothesis.
13

Statistical Power

  • Statistical power is defined to be 1β1β, where ββ is the probability of making a type II error, that is, failing to reject a false null hypothesis.

  • Statistical power is important because it permits us to decide an appropriate sample size for detecting a desired effect size before collecting data.

14

Statistical Power

  • Statistical power is defined to be 1β1β, where ββ is the probability of making a type II error, that is, failing to reject a false null hypothesis.

  • Statistical power is important because it permits us to decide an appropriate sample size for detecting a desired effect size before collecting data.

  • An effect size is a degree of difference in values of a parameter from that of the null hypothesis.

15

Effect Size Examples

  • Suppose we want to know if a coin is fair or not. That is, we test

H0: p=0.5H0: p=0.5

versus

HA: p0.5HA: p0.5

16

Effect Size Examples

  • Suppose we want to know if a coin is fair or not. That is, we test

H0: p=0.5H0: p=0.5

versus

HA: p0.5HA: p0.5

  • Question: Suppose that the coin truly is not fair. How much do we care if p=0.51p=0.51 as opposed to p=0.5p=0.5
17

Effect Size Examples

  • Suppose we want to know if a coin is fair or not. That is, we test

H0: p=0.5H0: p=0.5

versus

HA: p0.5HA: p0.5

  • Question: Suppose that the coin truly is not fair. How much do we care if p=0.51p=0.51 as opposed to p=0.5p=0.5

  • This illustrates the concept of effect size.

18

Effect Size Examples

  • Suppose we want to know if a coin is fair or not. That is, we test

H0: p=0.5H0: p=0.5

versus

HA: p0.5HA: p0.5

  • Question: Suppose that the coin truly is not fair. How much do we care if p=0.51p=0.51 as opposed to p=0.5p=0.5

  • This illustrates the concept of effect size.

  • As another example, suppose we are testing to see if a certain drug reduces the duration of headaches. We could use a two-sample t-test to compare a difference in the average length of headaches between a treatment and a control group. How much does it matter if the medication only reduces the duration of headache by two minutes? Especially, if the drug is very expensive.

19

Sample Size and Effects

  • If you go looking for an effect, you will find one.
20

Sample Size and Effects

  • If you go looking for an effect, you will find one.

  • As our previous examples show, just because there is an effect doesn't mean that it matters in practice.

21

Sample Size and Effects

  • If you go looking for an effect, you will find one.

  • As our previous examples show, just because there is an effect doesn't mean that it matters in practice.

  • It is also the case that small effects are easier to detect with large sample sizes.

22

Sample Size and Effects

  • If you go looking for an effect, you will find one.

  • As our previous examples show, just because there is an effect doesn't mean that it matters in practice.

  • It is also the case that small effects are easier to detect with large sample sizes.

  • Important: You should always determine an appropriate sample size before you collect data and especially before you conduct a hypothesis test. A power analysis is the statistical technique used to do so.

23

Sample Size and Effects

  • If you go looking for an effect, you will find one.

  • As our previous examples show, just because there is an effect doesn't mean that it matters in practice.

  • It is also the case that small effects are easier to detect with large sample sizes.

  • Important: You should always determine an appropriate sample size before you collect data and especially before you conduct a hypothesis test. A power analysis is the statistical technique used to do so.

  • Over the next few slides, we will illustrate why you need to conduct a power analysis before collecting data and explain the logic behind power analyses.

24

p-hacking

  • p-hacking occurs if

    • You conduct a statistical analysis, fail to detect an effect, collect more data, redo the analysis, and continue this until you find an effect.
25

p-hacking

  • p-hacking occurs if

    • You conduct a statistical analysis, fail to detect an effect, collect more data, redo the analysis, and continue this until you find an effect.
  • p-hacking is bad statistical practice and should always be avoided.

26

p-hacking

  • p-hacking occurs if

    • You conduct a statistical analysis, fail to detect an effect, collect more data, redo the analysis, and continue this until you find an effect.
  • p-hacking is bad statistical practice and should always be avoided.

  • To illustrate p-hacking, we will do the following:

    • Take 5 samples from two normal distributions with the same mean and variance (so we know the null hypothesis is true). Use a two-sample t-test to compute a p-value.

    • Do the same thing for samples of size 6, 7, 8, ... to 100.

    • Plot the p-values versus the sample size.

27

p-hacking

  • p-hacking occurs if

    • You conduct a statistical analysis, fail to detect an effect, collect more data, redo the analysis, and continue this until you find an effect.
  • p-hacking is bad statistical practice and should always be avoided.

  • To illustrate p-hacking, we will do the following:

    • Take 5 samples from two normal distributions with the same mean and variance (so we know the null hypothesis is true). Use a two-sample t-test to compute a p-value.

    • Do the same thing for samples of size 6, 7, 8, ... to 100.

    • Plot the p-values versus the sample size.

  • You will see that as long as we keep increasing the sample size, we will eventually reject a null hypothesis we know to be true.

28

p-hacking Simulation

29

Determining Sample Size

  • We have seen that it is essential to choose an appropriate sample size in advance of any data collection and definitely before conducting a hypothesis test.
30

Determining Sample Size

  • We have seen that it is essential to choose an appropriate sample size in advance of any data collection and definitely before conducting a hypothesis test.

  • So, how do we choose a sample size?

31

Determining Sample Size

  • We have seen that it is essential to choose an appropriate sample size in advance of any data collection and definitely before conducting a hypothesis test.

  • So, how do we choose a sample size?

  • A typical method is to use a power analysis as follows:

    • Decide an effect size and a significance level.

    • Take a sample(s) of a fixed size from a distribution(s) that correspond to the difference from the effect size set in the first step.

    • Compute the probability ββ of failing to reject a false null hypothesis.

    • Compute the power by taking 1β1β.

    • A statistical test is typically considered "powerful" if its power is at least 0.80.8 (80%).

32

Determining Sample Size

  • We have seen that it is essential to choose an appropriate sample size in advance of any data collection and definitely before conducting a hypothesis test.

  • So, how do we choose a sample size?

  • A typical method is to use a power analysis as follows:

    • Decide an effect size and a significance level.

    • Take a sample(s) of a fixed size from a distribution(s) that correspond to the difference from the effect size set in the first step.

    • Compute the probability ββ of failing to reject a false null hypothesis.

    • Compute the power by taking 1β1β.

    • A statistical test is typically considered "powerful" if its power is at least 0.80.8 (80%).

  • You can also repeat this process to obtain the power as a function of sample size from which you can determine what is the smallest sample size required to get a sufficiently powerful test.

33

Practical Power Analysis

  • Power analysis can be done via simulation and usually this is the best way to compute power.
34

Practical Power Analysis

  • Power analysis can be done via simulation and usually this is the best way to compute power.

  • For simple tests such as a t-test, R has built-in functions for computing power. For example,

    • power.t.test

    • power.prop.test

35

Practical Power Analysis

  • Power analysis can be done via simulation and usually this is the best way to compute power.

  • For simple tests such as a t-test, R has built-in functions for computing power. For example,

    • power.t.test

    • power.prop.test

  • Let's see some examples of power analysis.

36

Summary

  • It is important to choose an appropriate sample size in advance, that is, before collecting data.
37

Summary

  • It is important to choose an appropriate sample size in advance, that is, before collecting data.

  • Power analysis is the statistical procedure that can be used to determine a good sample size.

38

Summary

  • It is important to choose an appropriate sample size in advance, that is, before collecting data.

  • Power analysis is the statistical procedure that can be used to determine a good sample size.

  • Repeated an analysis with varying sample sizes until observing an effect or significance is p-hacking and this should be avoided.

39

Linear Regression

40

Linear Regression

  • Our next topic in the course is linear regression.

  • Linear regression is one a fundamental method for statistical modeling. Much of advanced statistics builds on linear regression. So, it is important to become proficient in the practice of linear regression.

41

Linear Regression

  • Our next topic in the course is linear regression.

  • Linear regression is one a fundamental method for statistical modeling. Much of advanced statistics builds on linear regression. So, it is important to become proficient in the practice of linear regression.

  • It is also important to have some understanding of what is going on "under the hood" in the construction and analysis of linear models. This is taken up in Chapter 8 of the textbook. To get started, please view the video on the next slide.

42

Intro to Regression Video

43

Notes

44

Notes

45

Notes

46

Goals for Lecture

  • Introduce the notion of statistical power.
2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
sToggle scribble toolbox
Esc Back to slideshow