+ - 0:00:00
Notes for current slide
Notes for next slide

MATH 204 Introduction to Statistics

Lecture 12: Further Inference

JMG

1

Goals for Lecture

  • In this lecture, we introduce statistical inference for
2

Goals for Lecture

  • In this lecture, we introduce statistical inference for

    • a single proportion (6.1)
3

Goals for Lecture

  • In this lecture, we introduce statistical inference for

    • a single proportion (6.1)

    • a difference of two proportions (6.2)

4

Goals for Lecture

  • In this lecture, we introduce statistical inference for

    • a single proportion (6.1)

    • a difference of two proportions (6.2)

    • one-sample means (7.1)

5

Goals for Lecture

  • In this lecture, we introduce statistical inference for

    • a single proportion (6.1)

    • a difference of two proportions (6.2)

    • one-sample means (7.1)

    • paired data (7.3)

6

Goals for Lecture

  • In this lecture, we introduce statistical inference for

    • a single proportion (6.1)

    • a difference of two proportions (6.2)

    • one-sample means (7.1)

    • paired data (7.3)

    • a difference of two means (7.3)

7

Goals for Lecture

  • In this lecture, we introduce statistical inference for

    • a single proportion (6.1)

    • a difference of two proportions (6.2)

    • one-sample means (7.1)

    • paired data (7.3)

    • a difference of two means (7.3)

  • After we discuss these topics along with power calculations and simple ordinary least squares regression (7.4 and Chapter 8), we will return to discuss goodness of fit (6.3) and testing for independence (6.4).

8

Learning Objectives

  • After this lecture, you should be able to conduct and apply point estimates, interval estimates, and hypothesis tests for
9

Learning Objectives

  • After this lecture, you should be able to conduct and apply point estimates, interval estimates, and hypothesis tests for

    • proportions, difference of proportions, one sample means, paired data, and difference of two means.
10

Learning Objectives

  • After this lecture, you should be able to conduct and apply point estimates, interval estimates, and hypothesis tests for

    • proportions, difference of proportions, one sample means, paired data, and difference of two means.
  • This includes knowing when to use which type of hypothesis test, and

11

Learning Objectives

  • After this lecture, you should be able to conduct and apply point estimates, interval estimates, and hypothesis tests for

    • proportions, difference of proportions, one sample means, paired data, and difference of two means.
  • This includes knowing when to use which type of hypothesis test, and

  • what is the appropriate R command(s) to use.

12

Learning Objectives

  • After this lecture, you should be able to conduct and apply point estimates, interval estimates, and hypothesis tests for

    • proportions, difference of proportions, one sample means, paired data, and difference of two means.
  • This includes knowing when to use which type of hypothesis test, and

  • what is the appropriate R command(s) to use.

  • Many of our interval estimates and hypothesis tests will rely on a central limit theorem. We quickly review these.

13

Central Limit Theorems

  • Recall our CLT for sample proportions:

When observations are independent and sample size is sufficiently large, then the sampling distribution for the sample proportion ˆp^p is approximately normal with mean μˆp=pμ^p=p and standard error SEˆp=p(1p)nSE^p=p(1p)n.

14

Central Limit Theorems

  • Recall our CLT for sample proportions:

When observations are independent and sample size is sufficiently large, then the sampling distribution for the sample proportion ˆp^p is approximately normal with mean μˆp=pμ^p=p and standard error SEˆp=p(1p)nSE^p=p(1p)n.

  • We also have a CLT for the sample mean:

When we collect samples of sufficiently large size nn from a population with mean μμ and standard deviation σσ, then the sampling distribution for the sample mean ˉx¯x is approximately normal with mean μˉx=μμ¯x=μ and standard error SEˉx=σnSE¯x=σn.

15

Central Limit Theorems

  • Recall our CLT for sample proportions:

When observations are independent and sample size is sufficiently large, then the sampling distribution for the sample proportion ˆp^p is approximately normal with mean μˆp=pμ^p=p and standard error SEˆp=p(1p)nSE^p=p(1p)n.

  • We also have a CLT for the sample mean:

When we collect samples of sufficiently large size nn from a population with mean μμ and standard deviation σσ, then the sampling distribution for the sample mean ˉx¯x is approximately normal with mean μˉx=μμ¯x=μ and standard error SEˉx=σnSE¯x=σn.

  • Later we will return to discuss precisely what we mean by "samples of sufficiently large size."
16

Central Limit Theorems

  • Recall our CLT for sample proportions:

When observations are independent and sample size is sufficiently large, then the sampling distribution for the sample proportion ˆp^p is approximately normal with mean μˆp=pμ^p=p and standard error SEˆp=p(1p)nSE^p=p(1p)n.

  • We also have a CLT for the sample mean:

When we collect samples of sufficiently large size nn from a population with mean μμ and standard deviation σσ, then the sampling distribution for the sample mean ˉx¯x is approximately normal with mean μˉx=μμ¯x=μ and standard error SEˉx=σnSE¯x=σn.

  • Later we will return to discuss precisely what we mean by "samples of sufficiently large size."
  • We use sampling distributions for obtaining confidence intervals and conducting hypothesis tests.
17

Videos for CLT

  • Central limit theorems are reviewed in the videos included on the next few slides.
18

CLT Video 1

19

CLT Video 2

20

CLT Video 3

21

CI for a Proportion

  • Once you've determined a one-proportion CI would be helpful for an application, there are four steps to constructing the interval:
22

CI for a Proportion

  • Once you've determined a one-proportion CI would be helpful for an application, there are four steps to constructing the interval:

    • Identify the point estimate ˆp^p and the sample size nn, and determine what confidence level you want.
23

CI for a Proportion

  • Once you've determined a one-proportion CI would be helpful for an application, there are four steps to constructing the interval:

    • Identify the point estimate ˆp^p and the sample size nn, and determine what confidence level you want.

    • Verify the conditions to ensure ˆp^p is nearly normal, that is, check that nˆp10n^p10 and n(1ˆp)10n(1^p)10.

24

CI for a Proportion

  • Once you've determined a one-proportion CI would be helpful for an application, there are four steps to constructing the interval:

    • Identify the point estimate ˆp^p and the sample size nn, and determine what confidence level you want.

    • Verify the conditions to ensure ˆp^p is nearly normal, that is, check that nˆp10n^p10 and n(1ˆp)10n(1^p)10.

    • If the conditions hold, estimate the standard error SESE using ˆp^p, find the appropriate zz, and compute ˆp±z×SE^p±z×SE .

25

CI for a Proportion

  • Once you've determined a one-proportion CI would be helpful for an application, there are four steps to constructing the interval:

    • Identify the point estimate ˆp^p and the sample size nn, and determine what confidence level you want.

    • Verify the conditions to ensure ˆp^p is nearly normal, that is, check that nˆp10n^p10 and n(1ˆp)10n(1^p)10.

    • If the conditions hold, estimate the standard error SESE using ˆp^p, find the appropriate zz, and compute ˆp±z×SE^p±z×SE .

    • Interpret the CI in the context of the problem.

26

Proportion CI Example

  • Consider the following problem:
27

Proportion CI Example

  • Consider the following problem:

Is parking a problem on campus? A randomly selected group of 89 faculty and staff are asked whether they are satisfied with campus parking or not. Of the 89 individuals surveyed, 23 indicated that they are satisfied. What is the proportion pp of faculty and staff that are satisfied with campus parking?

28

Proportion CI Example

  • Consider the following problem:

Is parking a problem on campus? A randomly selected group of 89 faculty and staff are asked whether they are satisfied with campus parking or not. Of the 89 individuals surveyed, 23 indicated that they are satisfied. What is the proportion pp of faculty and staff that are satisfied with campus parking?

  • Explain why this problem is suitable for a one-proportion CI. Is this CLT applicable? If so, obtain a 95% CI for a point estimate.
29

Proportion CI Video

  • Confidence intervals for a proportion are reviewed in this video:
30

Hypothesis Test for a Proportion

  • Once you've determined a one-proportion hypothesis test is the correct test for a problem, there are four steps to completing the test:
31

Hypothesis Test for a Proportion

  • Once you've determined a one-proportion hypothesis test is the correct test for a problem, there are four steps to completing the test:

    • Identify the parameter of interest, list hypotheses, identify the significance level, and identify ˆp^p and nn.
32

Hypothesis Test for a Proportion

  • Once you've determined a one-proportion hypothesis test is the correct test for a problem, there are four steps to completing the test:

    • Identify the parameter of interest, list hypotheses, identify the significance level, and identify ˆp^p and nn.

    • Verify conditions to ensure ˆp^p is nearly normal under H0H0. For one-proportion hypothesis tests, use the null value p0p0 to check the conditions np010np010 and n(1p0)10n(1p0)10.

33

Hypothesis Test for a Proportion

  • Once you've determined a one-proportion hypothesis test is the correct test for a problem, there are four steps to completing the test:

    • Identify the parameter of interest, list hypotheses, identify the significance level, and identify ˆp^p and nn.

    • Verify conditions to ensure ˆp^p is nearly normal under H0H0. For one-proportion hypothesis tests, use the null value p0p0 to check the conditions np010np010 and n(1p0)10n(1p0)10.

    • If the conditions hold, compute the standard error, again using p0p0, compute the ZZ-score, and identify the p-value.

34

Hypothesis Test for a Proportion

  • Once you've determined a one-proportion hypothesis test is the correct test for a problem, there are four steps to completing the test:

    • Identify the parameter of interest, list hypotheses, identify the significance level, and identify ˆp^p and nn.

    • Verify conditions to ensure ˆp^p is nearly normal under H0H0. For one-proportion hypothesis tests, use the null value p0p0 to check the conditions np010np010 and n(1p0)10n(1p0)10.

    • If the conditions hold, compute the standard error, again using p0p0, compute the ZZ-score, and identify the p-value.

    • Evaluate the hypothesis test by comparing the p-value to the significance level αα, and provide a conclusion in the context of the problem.

35

Hypotheses for One Proportion Tests

  • The null hypothesis for a one-proportion test is typically stated as H0:p=p0H0:p=p0 where p0p0 is a hypothetical value for pp.
36

Hypotheses for One Proportion Tests

  • The null hypothesis for a one-proportion test is typically stated as H0:p=p0H0:p=p0 where p0p0 is a hypothetical value for pp.

  • The corresponding alternative hypothesis is then one of

37

Hypotheses for One Proportion Tests

  • The null hypothesis for a one-proportion test is typically stated as H0:p=p0H0:p=p0 where p0p0 is a hypothetical value for pp.

  • The corresponding alternative hypothesis is then one of

    • HA:pp0HA:pp0 (two-sided),
38

Hypotheses for One Proportion Tests

  • The null hypothesis for a one-proportion test is typically stated as H0:p=p0H0:p=p0 where p0p0 is a hypothetical value for pp.

  • The corresponding alternative hypothesis is then one of

    • HA:pp0HA:pp0 (two-sided),

    • HA:p<p0HA:p<p0 (one-sided less than), or

39

Hypotheses for One Proportion Tests

  • The null hypothesis for a one-proportion test is typically stated as H0:p=p0H0:p=p0 where p0p0 is a hypothetical value for pp.

  • The corresponding alternative hypothesis is then one of

    • HA:pp0HA:pp0 (two-sided),

    • HA:p<p0HA:p<p0 (one-sided less than), or

    • HA:p>p0HA:p>p0 (one-sided greater than)

40

Proportion Hypothesis Test Example

Is parking a problem on campus? A randomly selected group of 89 faculty and staff are asked whether they are satisfied with campus parking or not. Of the 89 individuals surveyed, 23 indicated that they are satisfied.

41

Proportion Hypothesis Test Example

Is parking a problem on campus? A randomly selected group of 89 faculty and staff are asked whether they are satisfied with campus parking or not. Of the 89 individuals surveyed, 23 indicated that they are satisfied.

  • Let's study this problem using a hypothesis testing framework.
42

Proportion Hypothesis Test Example

Is parking a problem on campus? A randomly selected group of 89 faculty and staff are asked whether they are satisfied with campus parking or not. Of the 89 individuals surveyed, 23 indicated that they are satisfied.

  • Let's study this problem using a hypothesis testing framework.
  • Here is relevant R code:
## [1] 44.5 44.5
## [1] -4.557991
## [1] 5.164528e-06
p_hat <- 23/89
n <- 89
p0 <- 0.5
(c(n*p0,n*(1-p0))) # success-failure condition
SE <- sqrt((p0*(1-p0))/n)
(z_val <- (p_hat - p0)/SE)
(p_value <- 2*(pnorm(z_val)))
43

R Command(s) for Proportion Hypothesis Test

  • In cases where the CLT applies, we can use a normal distribution to directly compute p-values with the pnorm function.
44

R Command(s) for Proportion Hypothesis Test

  • In cases where the CLT applies, we can use a normal distribution to directly compute p-values with the pnorm function.

  • Alternatively, R has a built-in function prop.test that automates one-proportion hypothesis testing.

45

R Command(s) for Proportion Hypothesis Test

  • In cases where the CLT applies, we can use a normal distribution to directly compute p-values with the pnorm function.

  • Alternatively, R has a built-in function prop.test that automates one-proportion hypothesis testing.

  • So, in our parking problem example we would use

##
## 1-sample proportions test without continuity correction
##
## data: 23 out of 89, null probability 0.5
## X-squared = 20.775, df = 1, p-value = 5.165e-06
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.1788154 0.3580294
## sample estimates:
## p
## 0.258427
prop.test(23,89,p=0.5,correct = FALSE)
46

Problems to Practice

  • Let's take a minute to do some practice problems.
47

Problems to Practice

  • Let's take a minute to do some practice problems.

  • Try problems 6.7 and 6.13 from the textbook.

48

Problems Involving Difference of Proportions

  • Consider the following problem:

An online poll on January 10, 2017 reported that 97 out of 120 people in Virginia between the ages of 18 and 29 believe that marijuana should be legal, while 84 out of 111 who are 30 and over held this belief. Is there a difference between the proportion of young people who favor marijuana leagalization as compared to people who are older?

49

Problems Involving Difference of Proportions

  • Consider the following problem:

An online poll on January 10, 2017 reported that 97 out of 120 people in Virginia between the ages of 18 and 29 believe that marijuana should be legal, while 84 out of 111 who are 30 and over held this belief. Is there a difference between the proportion of young people who favor marijuana leagalization as compared to people who are older?

  • Think about how this problem is different than problems for a single proportion. The basic point here is that there are two groups, and we want to compare proportions across the two groups.
50

Problems Involving Difference of Proportions

  • Consider the following problem:

An online poll on January 10, 2017 reported that 97 out of 120 people in Virginia between the ages of 18 and 29 believe that marijuana should be legal, while 84 out of 111 who are 30 and over held this belief. Is there a difference between the proportion of young people who favor marijuana leagalization as compared to people who are older?

  • Think about how this problem is different than problems for a single proportion. The basic point here is that there are two groups, and we want to compare proportions across the two groups.
  • Data for the problem stated above might look as follows (only the first few rows are shown).
## # A tibble: 6 × 2
## support_legalize age_group
## <chr> <fct>
## 1 Yes >30
## 2 No 18-29
## 3 Yes >30
## 4 Yes >30
## 5 Yes >30
## 6 Yes 18-29
51

Plotting Grouped Proportion Data

  • We can used grouped bar plots or a mosaic plot to visualize data related to grouped proportions:

52

Sampling Distribution for Difference of Proportions

The difference ˆp1ˆp2^p1^p2 can be modeled using a normal distribution when

  • The data are independent within and between the two groups. Generally this is satisfied if the data come from two independent random samples or if the data come from a randomized experiment.

  • The success-failure condition holds for both groups, where we check successes and failures in each group separately.

53

Sampling Distribution for Difference of Proportions

The difference ˆp1ˆp2^p1^p2 can be modeled using a normal distribution when

  • The data are independent within and between the two groups. Generally this is satisfied if the data come from two independent random samples or if the data come from a randomized experiment.

  • The success-failure condition holds for both groups, where we check successes and failures in each group separately.

  • When these conditions are satisfied, the standard error of ˆp1ˆp2^p1^p2 is

SE=p1(1p1)n1+p2(1p2)n2,SE=p1(1p1)n1+p2(1p2)n2, where p1p1 and p2p2 represent the population proportions, and n1n1 and n2n2 represent the sample sizes.

54

CI for Difference of Proportions

  • To construct a CI for a difference of proportion, we can apply the formulas

SE=p1(1p1)n1+p2(1p2)n2,  CI=point estimate±z×SESE=p1(1p1)n1+p2(1p2)n2,  CI=point estimate±z×SE

55

CI for Difference of Proportions

  • To construct a CI for a difference of proportion, we can apply the formulas

SE=p1(1p1)n1+p2(1p2)n2,  CI=point estimate±z×SESE=p1(1p1)n1+p2(1p2)n2,  CI=point estimate±z×SE

  • If necessary, we can use the plug-in principle to estimate the standard error.

SEˆp1(1ˆp1)n1+ˆp2(1ˆp2)n2SE^p1(1^p1)n1+^p2(1^p2)n2

56

CI for Difference of Proportions Examples

  • Let's obtain a 90% CI for the difference of proportions in the marijuana legalization question. Here is the relevant R code:
## [1] 97 23 84 27
## [1] -0.03775321 0.14090636
n1 <- 120
n2 <- 111
p1 <- 97/n1
p2 <- 84/n2
p_diff <- p1 - p2
(c(n1*p1,n1*(1-p1),n2*p2,n2*(1-p2))) # success-failure conditions
SE <- sqrt(((p1*(1-p1))/n1) + ((p2*(1-p2))/n2))
z_90 <- -qnorm(0.05) # find the appropriate z-value for 90% CI
(CI <- p_diff + z_90 * c(-1,1) * SE) # 90% CI
57

CI for Difference of Proportions Examples

  • Let's obtain a 90% CI for the difference of proportions in the marijuana legalization question. Here is the relevant R code:
## [1] 97 23 84 27
## [1] -0.03775321 0.14090636
n1 <- 120
n2 <- 111
p1 <- 97/n1
p2 <- 84/n2
p_diff <- p1 - p2
(c(n1*p1,n1*(1-p1),n2*p2,n2*(1-p2))) # success-failure conditions
SE <- sqrt(((p1*(1-p1))/n1) + ((p2*(1-p2))/n2))
z_90 <- -qnorm(0.05) # find the appropriate z-value for 90% CI
(CI <- p_diff + z_90 * c(-1,1) * SE) # 90% CI
  • Let's try some more examples together.
58

Hypotheses for Difference of Proportion Tests

  • The null hypothesis for a difference of proportions test is typically stated as H0:p1p2=p0H0:p1p2=p0 where p0p0 is a hypothetical value for pp.
59

Hypotheses for Difference of Proportion Tests

  • The null hypothesis for a difference of proportions test is typically stated as H0:p1p2=p0H0:p1p2=p0 where p0p0 is a hypothetical value for pp.

  • The corresponding alternative hypothesis is then one of

60

Hypotheses for Difference of Proportion Tests

  • The null hypothesis for a difference of proportions test is typically stated as H0:p1p2=p0H0:p1p2=p0 where p0p0 is a hypothetical value for pp.

  • The corresponding alternative hypothesis is then one of

    • HA:p1p2p0HA:p1p2p0 (two-sided),
61

Hypotheses for Difference of Proportion Tests

  • The null hypothesis for a difference of proportions test is typically stated as H0:p1p2=p0H0:p1p2=p0 where p0p0 is a hypothetical value for pp.

  • The corresponding alternative hypothesis is then one of

    • HA:p1p2p0HA:p1p2p0 (two-sided),

    • HA:p1p2<p0HA:p1p2<p0 (one-sided less than), or

62

Hypotheses for Difference of Proportion Tests

  • The null hypothesis for a difference of proportions test is typically stated as H0:p1p2=p0H0:p1p2=p0 where p0p0 is a hypothetical value for pp.

  • The corresponding alternative hypothesis is then one of

    • HA:p1p2p0HA:p1p2p0 (two-sided),

    • HA:p1p2<p0HA:p1p2<p0 (one-sided less than), or

    • HA:p1p2>p0HA:p1p2>p0 (one-sided greater than)

63

Hypotheses for Difference of Proportion Tests

  • The null hypothesis for a difference of proportions test is typically stated as H0:p1p2=p0H0:p1p2=p0 where p0p0 is a hypothetical value for pp.

  • The corresponding alternative hypothesis is then one of

    • HA:p1p2p0HA:p1p2p0 (two-sided),

    • HA:p1p2<p0HA:p1p2<p0 (one-sided less than), or

    • HA:p1p2>p0HA:p1p2>p0 (one-sided greater than)

  • When p0=0p0=0 we need to use the pooled proportion.

64

Pooled Proportion

  • When the null hypothesis is that the proportions are equal, use the pooled proportion ˆppooled^ppooled, where

ˆppooled=number of "successes"number of cases=ˆp1n1+ˆp2n2n1+n2^ppooled=number of "successes"number of cases=^p1n1+^p2n2n1+n2

65

Pooled Proportion

  • When the null hypothesis is that the proportions are equal, use the pooled proportion ˆppooled^ppooled, where

ˆppooled=number of "successes"number of cases=ˆp1n1+ˆp2n2n1+n2^ppooled=number of "successes"number of cases=^p1n1+^p2n2n1+n2

  • The pooled proportion ˆppooled^ppooled is used to check the success-failure condition and to estimate the standard error.
66

Hypothesis Test for Difference of Proportions

  • Let's apply the hypothesis testing framework to this problem:

An online poll on January 10, 2017 reported that 97 out of 120 people in Virginia between the ages of 18 and 29 believe that marijuana should be legal, while 84 out of 111 who are 30 and over held this belief. Is there a difference between the proportion of young people who favor marijuana leagalization as compared to people who are older?

## [1] 69.73593 19.26407
## [1] 0.9510127
## [1] 0.3415979
n1 <- 120; n2 <- 111
p1 <- 97/120; p2 <- 84/111; p_diff <- p1 - p2
p_pooled <- (p1*n1+p2*n2)/(n1+n2)
(c(n*p_pooled,n*(1-p_pooled)))
SE <- sqrt((p_pooled*(1-p_pooled))/n1 + (p_pooled*(1-p_pooled))/n2)
(z_val <- (p_diff - 0.0)/SE)
(p_value <- 2*(1-pnorm(z_val)))
67

R Commands for Difference of Proportion Test

  • The R command prop.test also works for testing a difference of proportions.
68

R Commands for Difference of Proportion Test

  • The R command prop.test also works for testing a difference of proportions.

  • For example, we could solve the marijuana legalization problem with the following:

##
## 2-sample test for equality of proportions without continuity correction
##
## data: c(97, 84) out of c(120, 111)
## X-squared = 0.90443, df = 1, p-value = 0.3416
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.05486643 0.15801958
## sample estimates:
## prop 1 prop 2
## 0.8083333 0.7567568
prop.test(c(97,84),c(120,111),correct = FALSE)
69

Hypothesis Test for Difference of Proportions Examples

  • Let's work some more problems together.
70

Inference for a Sample Mean

  • A potato chip manufacturer claims that there is on average 32 chips per bag for their brand. How can we tell if this is an accurate claim?
71

Inference for a Sample Mean

  • A potato chip manufacturer claims that there is on average 32 chips per bag for their brand. How can we tell if this is an accurate claim?

  • One approach is to take a sample of, say 25 bags of chips for the particular band and compute the sample mean number of chips per bag.

72

Inference for a Sample Mean

  • A potato chip manufacturer claims that there is on average 32 chips per bag for their brand. How can we tell if this is an accurate claim?

  • One approach is to take a sample of, say 25 bags of chips for the particular band and compute the sample mean number of chips per bag.

  • This type of problem is inference for a sample mean.

73

Inference for a Sample Mean

  • A potato chip manufacturer claims that there is on average 32 chips per bag for their brand. How can we tell if this is an accurate claim?

  • One approach is to take a sample of, say 25 bags of chips for the particular band and compute the sample mean number of chips per bag.

  • This type of problem is inference for a sample mean.

  • In principle, we know the sampling distribution for the sample mean is very nearly normal, provided the conditions for the CLT holds.

74

Inference for a Sample Mean

  • A potato chip manufacturer claims that there is on average 32 chips per bag for their brand. How can we tell if this is an accurate claim?

  • One approach is to take a sample of, say 25 bags of chips for the particular band and compute the sample mean number of chips per bag.

  • This type of problem is inference for a sample mean.

  • In principle, we know the sampling distribution for the sample mean is very nearly normal, provided the conditions for the CLT holds.

  • One question is, what are the appropriate conditions to check to make sure the CLT holds for the sample mean?

75

Inference for a Sample Mean

  • A potato chip manufacturer claims that there is on average 32 chips per bag for their brand. How can we tell if this is an accurate claim?

  • One approach is to take a sample of, say 25 bags of chips for the particular band and compute the sample mean number of chips per bag.

  • This type of problem is inference for a sample mean.

  • In principle, we know the sampling distribution for the sample mean is very nearly normal, provided the conditions for the CLT holds.

  • One question is, what are the appropriate conditions to check to make sure the CLT holds for the sample mean?

  • When the conditions for the CLT for the sample mean hold, the expression for the standard error involves the population standard deviation ( σσ ) which we rarely know if practice. So, how do we estimate the standard error for the sample mean?

76

Inference for a Sample Mean

  • A potato chip manufacturer claims that there is on average 32 chips per bag for their brand. How can we tell if this is an accurate claim?

  • One approach is to take a sample of, say 25 bags of chips for the particular band and compute the sample mean number of chips per bag.

  • This type of problem is inference for a sample mean.

  • In principle, we know the sampling distribution for the sample mean is very nearly normal, provided the conditions for the CLT holds.

  • One question is, what are the appropriate conditions to check to make sure the CLT holds for the sample mean?

  • When the conditions for the CLT for the sample mean hold, the expression for the standard error involves the population standard deviation ( σσ ) which we rarely know if practice. So, how do we estimate the standard error for the sample mean?

  • We address these questions in the next few slides.

77

Inference for a Mean Video

  • You are encouraged to watch this video on inference for a mean.
78

Conditions for CLT for Sample Mean

  • Two conditions are required to apply the CLT for a sample mean ˉx¯x:
79

Conditions for CLT for Sample Mean

  • Two conditions are required to apply the CLT for a sample mean ˉx¯x:

Independence: The sample observations must be independent. Simple random samples from a population are independent, as are data from a random process like rolling a die or tossing a coin.

80

Conditions for CLT for Sample Mean

  • Two conditions are required to apply the CLT for a sample mean ˉx¯x:

Independence: The sample observations must be independent. Simple random samples from a population are independent, as are data from a random process like rolling a die or tossing a coin. Normality: When a sample is small, we also require that the sample observations come from a normally distributed population. This condition can be relaxed for larger sample sizes.

81

Conditions for CLT for Sample Mean

  • Two conditions are required to apply the CLT for a sample mean ˉx¯x:

Independence: The sample observations must be independent. Simple random samples from a population are independent, as are data from a random process like rolling a die or tossing a coin. Normality: When a sample is small, we also require that the sample observations come from a normally distributed population. This condition can be relaxed for larger sample sizes.

  • The normality condition is vague but there are some approximate rules that work well in practice.
82

Practical Check for Normality

  • n<30n<30: If the sample size nn is less than 30 and there are no clear outliers, then we can assume the data come from a nearly normal distribution.
83

Practical Check for Normality

  • n<30n<30: If the sample size nn is less than 30 and there are no clear outliers, then we can assume the data come from a nearly normal distribution.

  • n30n30: If the sample size nn is at least 30 and there are no particularly exterme outliers, then we can assume the sampling distribution of ˉx¯x is nearly normal, even if the underlying distribution of individual observations is not.

84

Practical Check for Normality

  • n<30n<30: If the sample size nn is less than 30 and there are no clear outliers, then we can assume the data come from a nearly normal distribution.

  • n30n30: If the sample size nn is at least 30 and there are no particularly exterme outliers, then we can assume the sampling distribution of ˉx¯x is nearly normal, even if the underlying distribution of individual observations is not.

  • Let's consider example 7.1 from the textbook to illustrate these practical rules.

85

Estimating Standard Error

  • When samples are independent and the normality condition is met, then the sampling distribution for the sample mean ˉx¯x is (very nearly) normal with mean μˉx=μμ¯x=μ and standard error SEˉx=σnSE¯x=σn, where μμ is the true population mean of the distribution from which the samples are taken and σσ is the true population from which the samples are taken.
86

Estimating Standard Error

  • When samples are independent and the normality condition is met, then the sampling distribution for the sample mean ˉx¯x is (very nearly) normal with mean μˉx=μμ¯x=μ and standard error SEˉx=σnSE¯x=σn, where μμ is the true population mean of the distribution from which the samples are taken and σσ is the true population from which the samples are taken.

  • In practice we do not know the true values of the population parameters μμ and σσ. But we can use the plug-in principle to obtain estimates:

μˉxˉx,  and  SEˉxsn,μ¯x¯x,  and  SE¯xsn,

where ss is the sample standard deviation.

87

T-score

  • If we take nn samples from a normal distribution N(μ,σ)N(μ,σ), then the quantity

z=ˉxμσnz=¯xμσn will follow a standard normal distribution N(0,1)N(0,1).

88

T-score

  • If we take nn samples from a normal distribution N(μ,σ)N(μ,σ), then the quantity

z=ˉxμσnz=¯xμσn will follow a standard normal distribution N(0,1)N(0,1).

  • On the other hand, the quantity

t=ˉxμsnt=¯xμsn will not quite be N(0,1)N(0,1), especially if nn is small.

89

T-score

  • If we take nn samples from a normal distribution N(μ,σ)N(μ,σ), then the quantity

z=ˉxμσnz=¯xμσn will follow a standard normal distribution N(0,1)N(0,1).

  • On the other hand, the quantity

t=ˉxμsnt=¯xμsn will not quite be N(0,1)N(0,1), especially if nn is small.

  • We call the quantity in the last formula a T-score.
90

T-Score Video

  • Let's watch this video on T-scores.
91

t-distribution

  • T-scores follow a t-distribution with degrees of freedom df=n1df=n1, where nn is the sample size.
92

t-distribution

  • T-scores follow a t-distribution with degrees of freedom df=n1df=n1, where nn is the sample size.

  • We will use t-distributions for inference, that is, for obtaining confidence intervals and hypothesis testing.

93

t-distribution

  • T-scores follow a t-distribution with degrees of freedom df=n1df=n1, where nn is the sample size.

  • We will use t-distributions for inference, that is, for obtaining confidence intervals and hypothesis testing.

  • Let's get a feel for t-distributions.

94

Plotting t-Distributions

  • We can easily plot density curves for t-distributions.

gf_dist("t",df=19)
95

Degrees of Freedom Parameter

  • This plot shows three different t density functions for three different degrees of freedom:

96

A Computational Experiment

  • This plot shows a histogram of T-score values obtained by computing the sample mean with n=8n=8 from N(0,1)N(0,1). We overlay a density curve for both N(0,1)N(0,1) and a t-distribution with df=7df=7:

97

Area Under a t-Distribution

  • We can compute areas under t-distribution density curves in the same way we did for areas under normal distribution density curves. For example, the area

is computed by

pt(-1.0,df=5)
## [1] 0.1816087
98

Middle 95% Under a t Density

  • How do we find the middle 95% of area under a t-distribution? This is an important question because it relates to construting a 95% confidence interval for the sample mean.
99

Middle 95% Under a t Density

  • How do we find the middle 95% of area under a t-distribution? This is an important question because it relates to construting a 95% confidence interval for the sample mean.

  • Suppose we have a t-distribution with degrees of freedom df=10df=10, then to find the value tt so that 95% of the area under the density curve lies between tt and tt, we use the qt command, for example,

## [1] 2.228139
(t_ast <- -qt(0.05/2,df=10))
100

Middle 95% Under a t Density

  • How do we find the middle 95% of area under a t-distribution? This is an important question because it relates to construting a 95% confidence interval for the sample mean.

  • Suppose we have a t-distribution with degrees of freedom df=10df=10, then to find the value tt so that 95% of the area under the density curve lies between tt and tt, we use the qt command, for example,

## [1] 2.228139
(t_ast <- -qt(0.05/2,df=10))
  • We can check our answer:
## [1] 0.95
pt(t_ast,df=10) - pt(-t_ast,df=10)
101

Middle 95% Under a t Density

  • How do we find the middle 95% of area under a t-distribution? This is an important question because it relates to construting a 95% confidence interval for the sample mean.

  • Suppose we have a t-distribution with degrees of freedom df=10df=10, then to find the value tt so that 95% of the area under the density curve lies between tt and tt, we use the qt command, for example,

## [1] 2.228139
(t_ast <- -qt(0.05/2,df=10))
  • We can check our answer:
## [1] 0.95
pt(t_ast,df=10) - pt(-t_ast,df=10)
  • Note that you always have to specify the appropriate value for the degrees of freedom df. The appropriate degrees of freedom is given by df=n1df=n1, where nn is the sample size.
102

One-Sample t CI

Based on a sample of nn independent and nearly normal observations, a confidence interval for the population mean is ˉx±tdf×sn,¯x±tdf×sn,

where nn is the sample size, ˉx¯x is the sample mean, ss is the sample standard deviation.

103

One-Sample t CI

Based on a sample of nn independent and nearly normal observations, a confidence interval for the population mean is ˉx±tdf×sn,¯x±tdf×sn,

where nn is the sample size, ˉx¯x is the sample mean, ss is the sample standard deviation.

  • We determine the appropriate value for tdftdf with the R command

qt((1.0-confidence_level)/2,df=n-1).

104

One-Sample Mean CI Example

  • Suppose we take a random sample of 13 observations from a normally distributed population and determine the sample mean is 8 with a sample standard deviation of 2.5. Then to find a 90% confidence interval, we would do the following:
## [1] 6.764206 9.235794
n <- 13
x_bar <- 8
s <- 2.5
SE <- s/sqrt(n)
t_ast <- -qt((1.0-0.9)/2,df=n-1)
(CI <- x_bar + t_ast * c(-1,1)*SE)

A 95% CI would be

## [1] 6.489265 9.510735
t_ast <- -qt((1.0-0.95)/2,df=n-1)
(CI <- x_bar + t_ast * c(-1,1)*SE)
105

CI for Single Mean Summary

Once you have determined a one-mean confidence interval would be helpful for an application, there are four steps to constructing the interval:

106

CI for Single Mean Summary

Once you have determined a one-mean confidence interval would be helpful for an application, there are four steps to constructing the interval:

  • Identify nn, ˉx¯x, and ss, and determine what confidence level you wish to use.
107

CI for Single Mean Summary

Once you have determined a one-mean confidence interval would be helpful for an application, there are four steps to constructing the interval:

  • Identify nn, ˉx¯x, and ss, and determine what confidence level you wish to use.
  • Verify the conditions to ensure ˉx¯x is nearly normal.
108

CI for Single Mean Summary

Once you have determined a one-mean confidence interval would be helpful for an application, there are four steps to constructing the interval:

  • Identify nn, ˉx¯x, and ss, and determine what confidence level you wish to use.
  • Verify the conditions to ensure ˉx¯x is nearly normal.

  • If the conditions hold, approximate SESE by snsn, find tdftdf, and construct the interval.

109

CI for Single Mean Summary

Once you have determined a one-mean confidence interval would be helpful for an application, there are four steps to constructing the interval:

  • Identify nn, ˉx¯x, and ss, and determine what confidence level you wish to use.
  • Verify the conditions to ensure ˉx¯x is nearly normal.

  • If the conditions hold, approximate SESE by snsn, find tdftdf, and construct the interval.

  • Interpret the confidence interval in the context of the problem.

110

One-Mean Hypothesis Testing

  • The null hypothesis for a one-proportion test is typically stated as H0:μ=μ0H0:μ=μ0 where μ0μ0 is the null value for the population mean.
111

One-Mean Hypothesis Testing

  • The null hypothesis for a one-proportion test is typically stated as H0:μ=μ0H0:μ=μ0 where μ0μ0 is the null value for the population mean.

  • The corresponding alternative hypothesis is then one of

112

One-Mean Hypothesis Testing

  • The null hypothesis for a one-proportion test is typically stated as H0:μ=μ0H0:μ=μ0 where μ0μ0 is the null value for the population mean.

  • The corresponding alternative hypothesis is then one of

    • HA:μμ0HA:μμ0 (two-sided),
113

One-Mean Hypothesis Testing

  • The null hypothesis for a one-proportion test is typically stated as H0:μ=μ0H0:μ=μ0 where μ0μ0 is the null value for the population mean.

  • The corresponding alternative hypothesis is then one of

    • HA:μμ0HA:μμ0 (two-sided),

    • HA:μ<μ0HA:μ<μ0 (one-sided less than), or

114

One-Mean Hypothesis Testing

  • The null hypothesis for a one-proportion test is typically stated as H0:μ=μ0H0:μ=μ0 where μ0μ0 is the null value for the population mean.

  • The corresponding alternative hypothesis is then one of

    • HA:μμ0HA:μμ0 (two-sided),

    • HA:μ<μ0HA:μ<μ0 (one-sided less than), or

    • HA:μ>μ0HA:μ>μ0 (one-sided greater than)

115

One-Mean Hypothesis Test Procedure

  • Once you have determined a one-mean hypothesis test is the correct procedure, there are four steps to completing the test:
116

One-Mean Hypothesis Test Procedure

  • Once you have determined a one-mean hypothesis test is the correct procedure, there are four steps to completing the test:

    • Identify the parameter of interest, list out hypotheses, identify the significance level, and identify nn, ˉx¯x, and ss.
117

One-Mean Hypothesis Test Procedure

  • Once you have determined a one-mean hypothesis test is the correct procedure, there are four steps to completing the test:

    • Identify the parameter of interest, list out hypotheses, identify the significance level, and identify nn, ˉx¯x, and ss.

    • Verify conditions to ensure ˉx¯x is nearly normal.

118

One-Mean Hypothesis Test Procedure

  • Once you have determined a one-mean hypothesis test is the correct procedure, there are four steps to completing the test:

    • Identify the parameter of interest, list out hypotheses, identify the significance level, and identify nn, ˉx¯x, and ss.

    • Verify conditions to ensure ˉx¯x is nearly normal.

    • If the conditions hold, approximate SESE by snsn, compute the T-score using

    T=ˉxμ0sn,T=¯xμ0sn, and compute the p-value.

119

One-Mean Hypothesis Test Procedure

  • Once you have determined a one-mean hypothesis test is the correct procedure, there are four steps to completing the test:

    • Identify the parameter of interest, list out hypotheses, identify the significance level, and identify nn, ˉx¯x, and ss.

    • Verify conditions to ensure ˉx¯x is nearly normal.

    • If the conditions hold, approximate SESE by snsn, compute the T-score using

    T=ˉxμ0sn,T=¯xμ0sn, and compute the p-value.

    • Evaluate the hypothesis test by comparing the p-value to the significance level αα, and provide a conclusion in the context of the problem.
120

A Simple Example

  • We want to know if the following data comes from a normal distribution with mean 0:
## [1] -0.36047565 -0.03017749 1.75870831 0.27050839 0.32928774 1.91506499
## [7] 0.66091621 -1.06506123 -0.48685285 -0.24566197
  • We can apply a hypothesis test corresponding to H0:μ=0.0H0:μ=0.0, versus HA:μ0HA:μ0 as follows
## [1] 0.9105232
## [1] 0.3862831
n <- length(x); x_bar <- mean(x); s <- sd(x)
alpha <- 0.05
SE <- s/sqrt(n)
mu_0 <- 0.0
(T <- (x_bar - mu_0)/SE)
(p_value <- 2*(1-pt(T,df=n-1)))
121

R Command for One-Sample t-test

  • There is a built-in R command, t.test that will conduct a hypothesis test for a sinlg emean for us.
122

R Command for One-Sample t-test

  • There is a built-in R command, t.test that will conduct a hypothesis test for a sinlg emean for us.

  • For example, we can solve our previous problem using

##
## One Sample t-test
##
## data: x
## t = 0.91052, df = 9, p-value = 0.3863
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.4076704 0.9569217
## sample estimates:
## mean of x
## 0.2746256
t.test(x,mu=0.0)
123

R Command for One-Sample t-test

  • There is a built-in R command, t.test that will conduct a hypothesis test for a sinlg emean for us.

  • For example, we can solve our previous problem using

##
## One Sample t-test
##
## data: x
## t = 0.91052, df = 9, p-value = 0.3863
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.4076704 0.9569217
## sample estimates:
## mean of x
## 0.2746256
t.test(x,mu=0.0)
  • Let's work some examples together.
124

Paired Data

Two sets of observations are paired if each observation in one set has a special correspondence or connection with exactly one observation in the other data set.

125

Paired Data

Two sets of observations are paired if each observation in one set has a special correspondence or connection with exactly one observation in the other data set.

  • Common examples of paired data correspond to "before" and "after" trials.
126

Paired Data

Two sets of observations are paired if each observation in one set has a special correspondence or connection with exactly one observation in the other data set.

  • Common examples of paired data correspond to "before" and "after" trials.
  • For example, does a particular study technique work well for increasing one's exam score? To test this, we can ask 25 people to take an exam and record their scores, then we can ask those same people to try the study technique before taking another similar exam.
127

Paired Data

Two sets of observations are paired if each observation in one set has a special correspondence or connection with exactly one observation in the other data set.

  • Common examples of paired data correspond to "before" and "after" trials.
  • For example, does a particular study technique work well for increasing one's exam score? To test this, we can ask 25 people to take an exam and record their scores, then we can ask those same people to try the study technique before taking another similar exam.
  • As another example, suppose it is claimed that among the general population of adults in the US, the average length of the left foot is longer than the average length of the right foot. To test this, we can select 32 people, record the measurement of the left foot of everyone in one column, then record the measurement of the right foot of everyone in a second column. We must make sure that each row of the resulting data frame corresponds with only one person.
128

Paired Data Example

  • Perhaps the data from our last example looks as follows:
## # A tibble: 6 × 3
## left_foot right_foot foot_diff
## <dbl> <dbl> <dbl>
## 1 7.83 8.27 -0.437
## 2 7.93 8.26 -0.332
## 3 8.47 8.25 0.221
## 4 8.02 8.21 -0.185
## 5 8.04 8.17 -0.127
## 6 8.51 7.98 0.533
129

Paired Data Example

  • Perhaps the data from our last example looks as follows:
## # A tibble: 6 × 3
## left_foot right_foot foot_diff
## <dbl> <dbl> <dbl>
## 1 7.83 8.27 -0.437
## 2 7.93 8.26 -0.332
## 3 8.47 8.25 0.221
## 4 8.02 8.21 -0.185
## 5 8.04 8.17 -0.127
## 6 8.51 7.98 0.533
  • Notice that we have added a column that is the difference of the left foot measurement minus the right foot measurement.
130

Hypothesis Test for Paired Data

  • How can we set up a hypothesis testing framework for the foot measurement question?
131

Hypothesis Test for Paired Data

  • How can we set up a hypothesis testing framework for the foot measurement question?

  • Basically, we can apply statistical inference to the difference column in the data.

132

Hypothesis Test for Paired Data

  • How can we set up a hypothesis testing framework for the foot measurement question?

  • Basically, we can apply statistical inference to the difference column in the data.

  • The typical null hypothesis for paired data is that the average difference between the measurements is 0. We write this as

H0:μd=0H0:μd=0

133

Paired Hypothesis Test Procedure

  • Once you have determined a paired hypothesis test is the correct procedure, there are four steps to completing the test:
134

Paired Hypothesis Test Procedure

  • Once you have determined a paired hypothesis test is the correct procedure, there are four steps to completing the test:

    • Determine the significance level, the sample size nn, the mean of the differences ˉd¯d, and the corresponding standard deviation sdsd.
135

Paired Hypothesis Test Procedure

  • Once you have determined a paired hypothesis test is the correct procedure, there are four steps to completing the test:

    • Determine the significance level, the sample size nn, the mean of the differences ˉd¯d, and the corresponding standard deviation sdsd.

    • Verify the conditions to ensure that ˉd¯d is nearly normal.

136

Paired Hypothesis Test Procedure

  • Once you have determined a paired hypothesis test is the correct procedure, there are four steps to completing the test:

    • Determine the significance level, the sample size nn, the mean of the differences ˉd¯d, and the corresponding standard deviation sdsd.

    • Verify the conditions to ensure that ˉd¯d is nearly normal.

    • If the conditions hold, approximate SESE by sdnsdn, compute the T-score using

    T=ˉdsdnT=¯dsdn and compute the p-value.

137

Paired Hypothesis Test Procedure

  • Once you have determined a paired hypothesis test is the correct procedure, there are four steps to completing the test:

    • Determine the significance level, the sample size nn, the mean of the differences ˉd¯d, and the corresponding standard deviation sdsd.

    • Verify the conditions to ensure that ˉd¯d is nearly normal.

    • If the conditions hold, approximate SESE by sdnsdn, compute the T-score using

    T=ˉdsdnT=¯dsdn and compute the p-value.

    • Evaluate the hypothesis test by comparing the p-value to the significance level αα, and provide a conclusion in the context of the problem.
138

First Example

  • Consider our foot measurement data.The sample size is n=32n=32 and a boxplot shows that there are no extreme outliers. Now we compute the necessary quantities:
## [1] -0.6907872
## [1] 0.4948396
n <- 32
d_bar <- mean(foot_df$foot_diff)
s_d <- sd(foot_df$foot_diff)
(t_val <- d_bar/(s_d/sqrt(n)))
(p_val <- 2*pt(t_val,df=n-1))
139

First Example

  • Consider our foot measurement data.The sample size is n=32n=32 and a boxplot shows that there are no extreme outliers. Now we compute the necessary quantities:
## [1] -0.6907872
## [1] 0.4948396
n <- 32
d_bar <- mean(foot_df$foot_diff)
s_d <- sd(foot_df$foot_diff)
(t_val <- d_bar/(s_d/sqrt(n)))
(p_val <- 2*pt(t_val,df=n-1))
  • Here we fail to reject the null hypothesis at the 0.05 significance level. That is, the data does not provide sufficient evidence for rejecting the null hypothesis that there is no significant difference in the length of the left foot versus the right foot.
140

R Command for Paired Hypothesis Test

  • Again, we can use the t.test function. However, now we must use two sets of data and add the paired=TRUE argument.
141

R Command for Paired Hypothesis Test

  • Again, we can use the t.test function. However, now we must use two sets of data and add the paired=TRUE argument.

  • For example, to test the hypothesis for the foot data, one would use

##
## Paired t-test
##
## data: foot_df$left_foot and foot_df$right_foot
## t = -0.69079, df = 31, p-value = 0.4948
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -0.18623480 0.09199711
## sample estimates:
## mean difference
## -0.04711885
t.test(foot_df$left_foot,foot_df$right_foot,paired = TRUE)
142

R Command for Paired Hypothesis Test

  • Again, we can use the t.test function. However, now we must use two sets of data and add the paired=TRUE argument.

  • For example, to test the hypothesis for the foot data, one would use

##
## Paired t-test
##
## data: foot_df$left_foot and foot_df$right_foot
## t = -0.69079, df = 31, p-value = 0.4948
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -0.18623480 0.09199711
## sample estimates:
## mean difference
## -0.04711885
t.test(foot_df$left_foot,foot_df$right_foot,paired = TRUE)
  • Let's work some more examples together.
143

Difference of Two Means

  • Inference for paired data compares two different (but related) measurements on the same population. For example, we might want to study the difference in how individuals sleep before and after consuming a large amount of caffeine.
144

Difference of Two Means

  • Inference for paired data compares two different (but related) measurements on the same population. For example, we might want to study the difference in how individuals sleep before and after consuming a large amount of caffeine.

  • On the other hand, inference for a difference of two means compares the same measurement on two difference populations. For example, we may want to study any difference between how caffeine affects the sleep of individuals with high blood pressure compared to those that do not have high blood pressure.

145

Difference of Two Means

  • Inference for paired data compares two different (but related) measurements on the same population. For example, we might want to study the difference in how individuals sleep before and after consuming a large amount of caffeine.

  • On the other hand, inference for a difference of two means compares the same measurement on two difference populations. For example, we may want to study any difference between how caffeine affects the sleep of individuals with high blood pressure compared to those that do not have high blood pressure.

  • We can still use a t-distribution for inference for a difference of two means but we must compute the two-sample means and standard deviations separately for estimating standard error.

146

Difference of Two Means

  • Inference for paired data compares two different (but related) measurements on the same population. For example, we might want to study the difference in how individuals sleep before and after consuming a large amount of caffeine.

  • On the other hand, inference for a difference of two means compares the same measurement on two difference populations. For example, we may want to study any difference between how caffeine affects the sleep of individuals with high blood pressure compared to those that do not have high blood pressure.

  • We can still use a t-distribution for inference for a difference of two means but we must compute the two-sample means and standard deviations separately for estimating standard error.

We proceed with the details.

147

Confidence Intervals for a Difference in Means

  • The t-distribution can be used for inference when working with the standardized difference of two means if
148

Confidence Intervals for a Difference in Means

  • The t-distribution can be used for inference when working with the standardized difference of two means if

    • The data are independent within and between the two groups, e.g., the data come from independent random samples or from a randomized experiment.
149

Confidence Intervals for a Difference in Means

  • The t-distribution can be used for inference when working with the standardized difference of two means if

    • The data are independent within and between the two groups, e.g., the data come from independent random samples or from a randomized experiment.

    • We check the outliers rules of thumb for each group separately.

150

Confidence Intervals for a Difference in Means

  • The t-distribution can be used for inference when working with the standardized difference of two means if

    • The data are independent within and between the two groups, e.g., the data come from independent random samples or from a randomized experiment.

    • We check the outliers rules of thumb for each group separately.

  • The standard error may be computed as

SE=σ21n1+σ22n2SE=σ21n1+σ22n2

151

Hypothesis Tests for Difference of Means

  • Hypothesis tests for a difference of two means works in a very similar fashion to what we have seen before.
152

Hypothesis Tests for Difference of Means

  • Hypothesis tests for a difference of two means works in a very similar fashion to what we have seen before.

  • To conduct a test for a difference of two means "by hand", use the smaller of the two degrees of freedom.

153

Hypothesis Tests for Difference of Means

  • Hypothesis tests for a difference of two means works in a very similar fashion to what we have seen before.

  • To conduct a test for a difference of two means "by hand", use the smaller of the two degrees of freedom.

  • Use the t.test function without the paired = TRUE argument.

154

Examples of Inference for Difference of Means

  • Let's look at some examples and work some problems together.
155

Statistical Power

  • Our next topic is statistical power, see the following video to get an introduction.
156

Notes

157

Notes

158

Notes

159

Goals for Lecture

  • In this lecture, we introduce statistical inference for
2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
sToggle scribble toolbox
Esc Back to slideshow