MATH 204 Introduction to Statistics

class: center, middle, inverse, title-slide

.title[
# MATH 204 Introduction to Statistics
]
.subtitle[
## Lecture 11: Foundations for Inference
]
.author[
### JMG
]

---

## Goals for Lecture

We introduce the basic concepts for statistical inference:

- parameter estimates, or point estimates
  
--

- confidence intervals
  
--

- tests of statistical hypotheses
  
--

- Note that all of these concepts are covered in Chapter 5 of the textbook. The videos included in the next few slides are recommended.

---

## Central Limit Theorem Video

- You are encouraged to watch this video on point estimates and the central limit theorem.

---

## Confidence Intevals Video

- You are encouraged to watch this video on confidence intervals.

---

## Hypothesis Testing Video

- You are encouraged to watch this video on hypothesis testing.

---

## Statistical Inference

- The goal of statistical inference is to use sample data to infer some information about a population.

- There is a particular flow of logic that we will follow in this course as we conduct various specific statistical analyses.

- The goal of this lecture is to explain this flow of logic and to define the important terms and concepts that we use to talk about statistical inference.

---

## Some Motivation

- Random variables and probability distributions provide models for random processes and help us to compute the probability of outcomes for a random process.

- For example, if we toss a coin 100 times and we know that the coin is fair, then we can model this process by a binomial random variable and use a binomial distribution with `$n=100$` and `$p=\frac{1}{2}$` to compute the probability of getting some number of heads, say `$k$`, out of 100 tosses.

- There is a kind of inverse question though. Given a coin or type of coin, how do we know if it is a fair coin or not. It seems reasonable to assume that the number of heads `$k$` out of `$n$` tosses follows a binomial distribution for some value `$p$` for the probability of getting heads, but what specific value of `$p$`?

- What we can do is to take a sample, that is, to flip the coin some number of times and then take the proportion `$\hat{p}$` of the heads that appear out of the total number of tosses. Then, we expect `$\hat{p}$` to be a good indicator of the value of `$p$`. In other words, we use sample data and a sample statistic to estimate the value of an unknown population parameter.

---

## Estimation and Hypothesis Testing

- So, we use data to estimate parameter values.

- However, each time we repeat our estimation procedure we are likely to obtain a different estimate `$\hat{p}$` due to natural variability. Thus, our estimate `$\hat{p}$` is really a random variable called the **sample proportion**.

- It's not enough to have just an estimate, we also need to know or estimate how much variability to reasonably expect for our estimate `$\hat{p}$` as a random variable.

- Furthermore, one can ask a related but slightly different question. Maybe we don't require an accurate estimate for the probability `$p$` of getting heads, maybe all we want to know is if the coin is fair ( `$p = \frac{1}{2}$` ) or not ( `$p \neq  \frac{1}{2}$` ). Addressing this type of question is known as hypothesis testing.

- In statistical hypothesis testing, we state two mutually exclusive hypotheses (*e.g.*, the coin is fair or it is not). Then we collect evidence in the form of sample data. Finally, we ask how likely is it to observe the sample data if we assume that  one of the specific mutually exclusive hypotheses is in fact true.

---

## Next Steps

- In order to get a feeling for statistical inference, we look at a particular example situation that we hope is fairly intuitive to understand.

- In the context of our example, we study

- point estimation, then
  
--

- confidence intervals, and finally
 
--

- hypothesis testing.
 
--

- We will start with a computational example to get a feel for what is going on.

---

## A Computational Example

- Suppose we want to estimate the probability of getting heads (success) `$p$` for a coin. In order to do so, we start by tossing the coin 100 times, adding up the number of heads, and dividing the total number of tosses `$n=100$`. Then we get an observed sample proportion:

`$$\hat{p} = \frac{\text{num. heads after 100 tosses}}{100}$$`

- For example, suppose we get `$\hat{p} = \frac{47}{100}=0.47$`.

- Let's repeat this process a large number of times and record the outcomes. The first few rows of our data looks as follows.

```
## # A tibble: 6 × 1
##   p_estimate
##        <dbl>
## 1       0.48
## 2       0.43
## 3       0.51
## 4       0.43
## 5       0.5 
## 6       0.48
```

---

## Histogram of Data for Example

- Let's look at a histogram of our data.

- This histogram provides insight into what the distribution of the sample proportion `$\hat{p}$` might be. We refer to this distribution as **the sampling distribution of the sample proportion**. It turns out to be very close to normal, but with what `$\mu$` and `$\sigma$`?

---

## The Central Limit Theorem for Proportion

- The **central limit theorem** for the sample proportion says the following:

> For a sample of size `$n$`, the sampling distribution for the sample proportion is very close to `$N(\mu=p,\sigma=\sqrt{\frac{p(1-p)}{n}})$`, where `$p$` is the true value of the population proportion. This is provided the sample size `$n$` is sufficiently large. 
--

- In the last lecture, we saw that the sampling distribution of the sample mean is also very close to normal. Thus, there is also a central limit theorem for the sample mean.

- The point here is that, when the central limit theorem applies, we can use a normal distribution to assess the uncertainty in our point estimates. For example, to estimate **standard error** and obtain confidence intervals.

- Furthermore, when the central limit theorem applies, we can use a normal distribution for conducting hypothesis tests. This will be discussed later.

- Before we go on, let's explore the concepts of standard error and confidence intervals.

---

## Standard Error

- The standard deviation of a sampling distribution is called **standard error**.

- For example, the standard deviation of the sample proportion `$\hat{p}$` would be the standard error for `$\hat{p}$`. By the central limit theorem, the standard error for `$\hat{p}$` is (very nearly)

`$$SE_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}},$$`
where `$p$` is the true population proportion.

- The central limit theorem is a powerful theoretical result. However, in practice we do not know the value for the true population proportion `$p$`. If we did, there would be no need for inference.

- In practice, we use the "plug-in principle" to estimate `$SE_{\hat{p}}$` by substituting our estimate `$\hat{p}$` for the true population proportion to obtain an estimate

`$$SE_{\hat{p}} \approx \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}.$$`

---

## When Does CLT Apply?

- For a proportion, the central limit theorem (CLT) applies when the sample size `$n$` is such that

`$$np \geq 10, \ \text{ and } \ n(1-p) \geq 10$$`

- Let's consider Example 5.3 from the textbook. In this case our sampling distribution for the sample proportion is binomial with `$p=0.88$` and `$n=1000$`, then `$np =$` 880, and `$n(1-p) =$` 120. Both of these values are greater or equal to 10 so the CLT applies.

- Now that we know that the CLT applies, we can use it to compute the theoretical mean and standard error. The theoretical mean is `$\mu_{\hat{p}} = 0.88$`, and the standard error is 
`$$SE_{\hat{p}} = \sqrt{\frac{0.88(1-0.88)}{1000}}=0.01$$`

---

# Confidence Intervals for a Proportion

- The sample proportion `$\hat{p}$` provides a plausible value for the population proportion `$p$`. However, there is some standard error associated with it.

- When stating an estimate for the population proportion, it is better practice to provide a plausible *range of values* instead of just a point estimate alone.

- The standard error represents the standard deviation of the point estimate, and when the central limit theorem applies, the point estimate closely follows a normal distribution. In a normal distribution, 95% of the data is within 1.96 standard deviations of the mean.

- In the case when the central limit theorem applies, we call the formula

`$$\text{point estimate} \pm 1.96 \times SE$$`
a **95% confidence interval**.

- We will work out some examples together soon. Before doing so, let's try to understand the meaning of 95%.

---

## Meaning of 95 Percent CI

---

## Confidence Intervals for a Normal r.v.

- If `$X\sim N(\mu,\sigma)$` is a normal random variable, and if `$z^{\ast}$` is the `$Z$`-score value for `$X$` such that `$\theta$` % of the area under the standard normal density curve falls between `$-z^{\ast}$` and `$z^{\ast}$`, then a ** `$\theta$` % confidence interval** for `$X$` is

`$$\mu \pm z^{\ast} \sigma$$`

- We call `$z^{\ast} \sigma$` the **margin of error**

- For example, suppose we know that `$X\sim N(0.887,0.01)$`. Let's construct a 90% CI for `$X$`. To get `$z^{\ast}$`, we do

.panelset[
.panel[.panel-name[Result]

```
## [1] -1.644854
```
]

.panel[.panel-name[Code]

```r
(z_ast <- qnorm(0.05))
```
]
]

Then the confidence interval is

.panelset[
.panel[.panel-name[Result]

```
## [1] 0.8705515 0.9034485
```

]

.panel[.panel-name[Code]

```r
(CI <- 0.887 + c(1,-1)*z_ast*0.01)
```
]
]

---

## Break for Examples

- Let's take a break from the slides and work some examples for the concepts that we have covered so far.

---

## Hypothesis Testing

- Consider the question, is a coin fair? Then there are two mutually exclusive competing hypothesis:

- The coin is fair, that is, the probability of success `$p=\frac{1}{2}$`.
  
--

- The coin is not fair, that is, the probability of success `$p \neq \frac{1}{2}$`.
  
--

- We refer to the first hypothesis (*i.e.*, the coin is fair) as the **null hypothesis** and denote it by `$H_{0}$`.

- We refer to the second hypothesis (*i.e.*, the coin is not fair) as the **alternative hypothesis** and denote it by `$H_{A}$`.

- Typically, the null hypothesis corresponds to a situation of "no difference", or "no effect". The null hypothesis is the status quo.

---

## Testing Procedure

- In hypothesis testing we collect evidence, and make a decision whether to reject the null hypothesis or fail to reject the null hypothesis. Note that failing to reject the null hypothesis is not the same as accepting it as true.

- Failing to reject the null hypothesis only means that the evidence we have collected does not provide sufficient support for rejecting the null hypothesis.

- We also do not accept an alternative hypothesis although some people do use that phrase in hypothesis testing.

- How do we decide, based on our data whether to reject or fail to reject the null hypothesis?

- We use an appropriate probability distribution that corresponds to the null hypothesis (that is, a **null distribution**) and use it to compute the probability of observing a **test statistic** value that is as or more extreme that the value of our test statistic obtained from the data. 
  
--

- If we obtain from the last step a probability value that is sufficiently small then we reject the null hypothesis, otherwise, we fail to reject the null hypothesis.

---

## Example

- Suppose we toss a coin 100 times and get `$54$` heads. We want to test the hypothesis

- `$H_{0}:$` the coin is fair ( `$p= \frac{1}{2}$` ), versus
  
  - `$H_{A}:$` the coin is not fair ( `$p\neq \frac{1}{2}$` ).
  
--

- An appropriate probability distribution corresponding to the null hypothesis is a binomial distribution with size `$n=100$` and probability of success `$p=\frac{1}{2}$`.

- Now we compute the probability for getting 54 or more heads, or 54 or more tails (which is 46 or fewer heads):

.panelset[
.panel[.panel-name[Result]

```
## [1] 0.4841184
```

]

.panel[.panel-name[Code]

```r
pbinom(46,100,0.5) + (1-pbinom(53,100,0.5))
```
]
]

- This tells us that under the null hypothesis (that is, the assumption that the coin is fair) we expect to observe an outcome that is as or more extreme than getting 54 heads about 48% of the time.

---

## Rejecting the Null Hypothesis

- In the example on the last slide, a result that we expect to occur about 48% of the time when assuming the null hypothesis is not a rare event. So in that example we would fail to reject the null hypothesis.

- We only reject the null hypothesis when assuming the null hypothesis results in a very small probability of observing an outcome such as what is observed in our data.

- How small of a probability is small enough, or how rare of an event should lead us to reject the null hypothesis?

- Since we are dealing with inherently random processes, it is possible to make  an incorrect decision in a statistical hypothesis test, that is, an error. There are two particular types of error. The question we need to ask ourselves when setting a decision criteria for hypothesis testing is, what kind of error are we most concerned with avoiding?

---

## Decision Errors

- Here is an analogy to think about. After a criminal trial sometimes an innocent person is convicted and sometimes an guilty person walks free. If we are more strict about how strong evidence must be for a conviction, then more guilty persons will go free. On the other hand, if we are more lax regarding how strong evidence must be for a conviction, then more innocent persons will be convicted.

- When conducting a hypothesis test, there are four possible scenarios to consider:

.center[
<img src="https://www.dropbox.com/s/p2xl0gksodav0oy/TestingErrorsTable.png?raw=1" width="80%" />
]

- In a US court, the defendant is either innocent ( `$H_{0}$` ) or guilty ( `$H_{A}$` ). What does a Type 1 Error represent in this context? What does a Type 2 Error represent?

---

## Significance Level

- As a principle, we do not reject `$H_{0}$` unless we have strong evidence.

- The **significance level** (denoted by `$\alpha$`) specifies our level of comfort in making a Type 1 Error, that is, with incorrectly rejecting the null hypothesis. The lower the significance level, the more conservative we are being with regard to our decision to reject the null hypothesis.

- A common value for a significance level is `$\alpha = 0.05$`. This says, that for those cases where the null hypothesis is actually true, we do not want to incorrectly reject `$H_{0}$` more than 5% of the time.

> So, given hypotheses `$H_{0}$` and `$H_{A}$`, we will only reject the null hypothesis at a level of significance `$\alpha$` if under the assumption that the null hypothesis is true, the probability of observing a test statistic value that is as or more extreme that what we observe from the data is at most `$\alpha$`.
---

## An Example

- Suppose we toss a coin 100 times and get `$65$` heads. We want to test the hypothesis

- `$H_{0}:$` the coin is fair ( `$p= \frac{1}{2}$` ), versus
  
  - `$H_{A}:$` the coin is not fair ( `$p\neq \frac{1}{2}$` ).
  
--

- A probability distribution corresponding to the null hypothesis is a binomial distribution with size `$n=100$` and probability of success `$p=\frac{1}{2}$`.

- Now we compute the probability for getting 65 or more heads, or 65 or more tails (which is 35 or fewer heads):

.panelset[
.panel[.panel-name[Result]

```
## [1] 0.003517642
```

]

.panel[.panel-name[Code]

```r
pbinom(35,100,0.5) + (1-pbinom(64,100,0.5))
```
]
]

- This tells us that under the null hypothesis (that is, the assumption that the coin is fair) we expect to observe an outcome that is as or more extreme than getting 65 heads about 0.35% of the time. If `$\alpha=0.05$`, then we would reject the null hypothesis at the 0.05 significance level.

---

## Another Approach

- Reconsider the question of whether we have a fair coin. Instead of counting the number `$k$` of heads out of `$n$` tosses, we could compute the proportion `$\hat{p} = \frac{k}{n}$`. The point here is that if the sample size `$n$` is large enough, then the test statistic `$\hat{p}$`, which is the sample proportion has (very nearly) a `$N\left(p,\sqrt{\frac{p(1-p)}{n}}\right)$` distribution by the CLT.

- Again, suppose we obtain `$\hat{p} = \frac{54}{100}=0.54$`. Then

`$$n\hat{p} = 54 \geq 10, \ \text{ and } \ n(1-p)=46 \geq 10$$`

so the CLT applies.

- A 95% confidence interval is obtained using a normal distribution as follows:

.panelset[
.panel[.panel-name[Result]

```
## [1] 0.4423141 0.6376859
```

]

.panel[.panel-name[Code]

```r
(CI <- 0.54 + 1.96*c(-1,1)*sqrt((0.54*(1-0.54))/100))
```
]
]

- Our null hypothesis `$H_{0}$` is `$p=0.5$`. Since 0.5 is well within the CI `$(0.44,0.63)$` we fail to reject the null hypothesis.

---

## p-values

- Continuing from the last slide, we can use a normal distribution to obtain a so-called **p-value**. (We will define this carefully soon.)

- We begin by assuming the null hypothesis holds. In this case, applying the CLT gives null distribution as (very nearly) `$N\left(\mu=0.5,\sigma=\sqrt{\frac{0.5(1-0.5)}{100}}\right)$`.

- Next, we use the null distribution to compute the probability of observing a value that is as or more extreme than `$\hat{p}=0.54$` under the assumption that the null hypothesis is true. In order to do this, we first need to obtain a `$Z$`-score:

`$$z = \frac{0.54 - 0.5}{\sqrt{\frac{0.5(1-0.5)}{100}}} \approx 0.8$$`

- Now we compute the tail areas under the normal curve lying to the left of `$-0.8$` and to the right of `$0.8$`. (This is an example of a two-tailed test.)

---

## Continuing

.panelset[
.panel[.panel-name[Result]

```
## [1] 0.4237108
```

]

.panel[.panel-name[Code]

```r
pnorm(-0.8) + (1-pnorm(0.8))
```
]
]

- We obtained that about 42% of the time we expect to observe a proportion of heads that is as or more extreme than 0.54 under the assumption that the true proportion is `$p=0.5$`.

- This leads us to fail to reject the null hypothesis.

- In the last slide, we obtained a p-value of about 0.86.

> The p-value is the probability of observing data at least as favorable to the alternative hypothesis as our current data set, if the null hypothesis were true. We typically use a summary statistic (a so-call test statistic) of the data, *e.g.*, sample proportion, to help compute the p-value and evaluate the hypotheses. 
--

- We compare a p-value with a significance level to decide whether to reject or fail to reject the null hypothesis.

---

## More Examples

- We take a break from the slides to work out some more examples of testing hypotheses.

---

## Summary

- When the central limit theorem applies, it provides us with a very accurate estimate for the sampling distribution of a test statistic such as the sample proportion or the sample mean.

- In such cases, we know, or can easily estimate via the plug-in principle, values for the mean and standard error of the sampling distribution in order to obtain point estimates and confidence intervals.

- A hypothesis test states a null ( `$H_{0}$` ) and alternative hypothesis ( `$H_{A}$` ), and then collects evidence in the form of sample data in order to decide whether to reject or fail to reject `$H_{0}$`.

- One sets a confidence level `$\alpha$`, then computes a p-value and compares it  with `$\alpha$`. One rejects the null hypothesis at confidence level `$\alpha$` only if  the p-value is less than `$\alpha$`.

- Over the next few lectures, we will see the methods of inferential statistics summarized here in a variety of different but commonly occurring situations.

---

## Next Time

- In the next lecture, we discuss inference for a single proportion and the difference of two proportions. To get a head start, please watch this video: