class: center, middle, inverse, title-slide .title[ # MATH 204 Introduction to Statistics ] .subtitle[ ## Lecture 13: Intro to Statistical Power ] .author[ ### JMG ] --- ## Goals for Lecture - Introduce the notion of statistical **power**. -- - Explain the notion of **effect size**. -- - Demonstrate the relationship between sample size and type II error. --- ## Power Video
--- ## Recall Error Types When conducting a hypothesis test, there are four possible scenarios to consider: .center[ <img src="https://www.dropbox.com/s/p2xl0gksodav0oy/TestingErrorsTable.png?raw=1" width="80%" style="display: block; margin: auto;" /> ] -- - The **significance level** (denoted by `\(\alpha\)`) previously defined tells us the probability of making a Type 1 Error. -- - What about the probability of making a Type II Error? --- ## Type II Probabilities - Consider a hypothesis test for `$$H_{0}:\ \mu = \mu_{0}$$` versus `$$H_{\text{A}}:\ \mu \neq \mu_{0}$$` -- - For the null hypothesis, there is a single value for the parameter. -- - For the alternative, there is a range of values for the parameter. -- - The point is, computing probabilities for type II error is more complicated than computing probabilities for type I error. --- ## Statistical Power - Statistical power is defined to be `\(1-\beta\)`, where `\(\beta\)` is the probability of making a type II error, that is, failing to reject a false null hypothesis. -- - Statistical power is important because it permits us to decide an appropriate sample size for detecting a desired effect size **before collecting data**. -- - An **effect size** is a degree of difference in values of a parameter from that of the null hypothesis. --- ## Effect Size Examples - Suppose we want to know if a coin is fair or not. That is, we test `$$H_{0}:\ p = 0.5$$` versus `$$H_{\text{A}}:\ p \neq 0.5$$` -- - **Question:** Suppose that the coin truly is not fair. How much do we care if `\(p=0.51\)` as opposed to `\(p=0.5\)` -- - This illustrates the concept of effect size. -- - As another example, suppose we are testing to see if a certain drug reduces the duration of headaches. We could use a two-sample t-test to compare a difference in the average length of headaches between a treatment and a control group. How much does it matter if the medication only reduces the duration of headache by two minutes? Especially, if the drug is very expensive. --- ## Sample Size and Effects - If you go looking for an effect, you will find one. -- - As our previous examples show, just because there is an effect doesn't mean that it matters in practice. -- - It is also the case that small effects are easier to detect with large sample sizes. -- - **Important:** You should always determine an appropriate sample size before you collect data and especially before you conduct a hypothesis test. A power analysis is the statistical technique used to do so. -- - Over the next few slides, we will illustrate why you need to conduct a power analysis before collecting data and explain the logic behind power analyses. --- ## p-hacking - p-hacking occurs if - You conduct a statistical analysis, fail to detect an effect, collect more data, redo the analysis, and continue this until you find an effect. -- - p-hacking is bad statistical practice and should always be avoided. -- - To illustrate p-hacking, we will do the following: - Take 5 samples from two normal distributions with the same mean and variance (so we know the null hypothesis is true). Use a two-sample t-test to compute a p-value. - Do the same thing for samples of size 6, 7, 8, ... to 100. - Plot the p-values versus the sample size. -- - You will see that as long as we keep increasing the sample size, we will eventually reject a null hypothesis we know to be true. --- ## p-hacking Simulation <img src="index_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> --- ## Determining Sample Size - We have seen that it is essential to choose an appropriate sample size in advance of any data collection and definitely before conducting a hypothesis test. -- - So, how do we choose a sample size? -- - A typical method is to use a power analysis as follows: - Decide an effect size and a significance level. - Take a sample(s) of a fixed size from a distribution(s) that correspond to the difference from the effect size set in the first step. - Compute the probability `\(\beta\)` of failing to reject a false null hypothesis. - Compute the power by taking `\(1-\beta\)`. - A statistical test is typically considered "powerful" if its power is at least `\(0.8\)` (80%). -- - You can also repeat this process to obtain the power as a function of sample size from which you can determine what is the smallest sample size required to get a sufficiently powerful test. --- ## Practical Power Analysis - Power analysis can be done via simulation and usually this is the best way to compute power. -- - For simple tests such as a t-test, R has built-in functions for computing power. For example, - `power.t.test` - `power.prop.test` -- - Let's see some examples of power analysis. --- ## Summary - It is important to choose an appropriate sample size in advance, that is, before collecting data. -- - Power analysis is the statistical procedure that can be used to determine a good sample size. -- - Repeated an analysis with varying sample sizes until observing an effect or significance is p-hacking and this should be avoided. --- ## Linear Regression - Our next topic in the course is [linear regression](https://en.wikipedia.org/wiki/Linear_regression). -- - Linear regression is one a fundamental method for statistical modeling. Much of advanced statistics builds on linear regression. So, it is important to become proficient in the practice of linear regression. -- - It is also important to have some understanding of what is going on "under the hood" in the construction and analysis of linear models. This is taken up in Chapter 8 of the textbook. To get started, please view the video on the next slide. --- # Intro to Regression Video
--- ## Notes --- ## Notes --- ## Notes