+ - 0:00:00
Notes for current slide
Notes for next slide

MATH 204 Introduction to Statistics

Lecture 2: Sampling

JMG

1

Goals for Lecture

  • Introduce basic concepts related to data collection and sampling (textbook Chapter 1):
2

Goals for Lecture

  • Introduce basic concepts related to data collection and sampling (textbook Chapter 1):

    • Observational vs. experimental studies (textbook sections 1.2.5, 1.3.4, and 1.4)

    • Populations and samples (textbook section 1.3.1)

    • Issue of bias (textbook section 1.3.3)

    • Sampling strategies (textbook section 1.3.5)

3

Research Methods

  • The first step in conducting research is to identify topics or questions that are to be investigated.
4

Research Methods

  • The first step in conducting research is to identify topics or questions that are to be investigated.

  • The next step is to collect data.

5

Research Methods

  • The first step in conducting research is to identify topics or questions that are to be investigated.

  • The next step is to collect data.

    • Good research practices dictate a careful consideration of how data is collected.
6

Research Methods

  • The first step in conducting research is to identify topics or questions that are to be investigated.

  • The next step is to collect data.

    • Good research practices dictate a careful consideration of how data is collected.
  • It is important to distinguish observational studies from experimental studies.

7

Research Methods

  • The first step in conducting research is to identify topics or questions that are to be investigated.

  • The next step is to collect data.

    • Good research practices dictate a careful consideration of how data is collected.
  • It is important to distinguish observational studies from experimental studies.

    • In an observational study the data collection process does not interfere with the subjects of the study.
8

Research Methods

  • The first step in conducting research is to identify topics or questions that are to be investigated.

  • The next step is to collect data.

    • Good research practices dictate a careful consideration of how data is collected.
  • It is important to distinguish observational studies from experimental studies.

    • In an observational study the data collection process does not interfere with the subjects of the study.

    • An experimental study involves a manipulation of the study subjects.

9

Observational Studies

  • Suppose that we want to know what University of Scranton students do or do not like about the buildings on campus. We could survey current students with questions that ask for ratings on different aspects of campus buildings.
10

Observational Studies

  • Suppose that we want to know what University of Scranton students do or do not like about the buildings on campus. We could survey current students with questions that ask for ratings on different aspects of campus buildings.

    • The data collected through such a survey is an example of observational data. Why?
11

Observational Studies

  • Suppose that we want to know what University of Scranton students do or do not like about the buildings on campus. We could survey current students with questions that ask for ratings on different aspects of campus buildings.

    • The data collected through such a survey is an example of observational data. Why?
  • Making causal conclusions based on observational data is not recommended.

12

Experimental Studies

  • Suppose that we want know if caffeine consumption influences the exam performance of University of Scranton students. To study this, we give all of the students in one section of BIOL 141 a certain dose of caffeine at the mid-term exam while asking all of the students in another section of BIOL 141 to refrain from consuming caffeine before the mid-term exam and record the mid-term exam scores of both sections.
13

Experimental Studies

  • Suppose that we want know if caffeine consumption influences the exam performance of University of Scranton students. To study this, we give all of the students in one section of BIOL 141 a certain dose of caffeine at the mid-term exam while asking all of the students in another section of BIOL 141 to refrain from consuming caffeine before the mid-term exam and record the mid-term exam scores of both sections.

    • This is an example of experimental data. Why?
14

Experimental Studies

  • Suppose that we want know if caffeine consumption influences the exam performance of University of Scranton students. To study this, we give all of the students in one section of BIOL 141 a certain dose of caffeine at the mid-term exam while asking all of the students in another section of BIOL 141 to refrain from consuming caffeine before the mid-term exam and record the mid-term exam scores of both sections.

    • This is an example of experimental data. Why?

    • Note that there is an experimental group (i.e., those consuming caffeine) and a control group (i.e., those refraining from caffeine).

15

Experimental Studies

  • Suppose that we want know if caffeine consumption influences the exam performance of University of Scranton students. To study this, we give all of the students in one section of BIOL 141 a certain dose of caffeine at the mid-term exam while asking all of the students in another section of BIOL 141 to refrain from consuming caffeine before the mid-term exam and record the mid-term exam scores of both sections.

    • This is an example of experimental data. Why?

    • Note that there is an experimental group (i.e., those consuming caffeine) and a control group (i.e., those refraining from caffeine).

  • In an experimental study there are often both explanatory and response variables.

16

Experimental Studies

  • Suppose that we want know if caffeine consumption influences the exam performance of University of Scranton students. To study this, we give all of the students in one section of BIOL 141 a certain dose of caffeine at the mid-term exam while asking all of the students in another section of BIOL 141 to refrain from consuming caffeine before the mid-term exam and record the mid-term exam scores of both sections.

    • This is an example of experimental data. Why?

    • Note that there is an experimental group (i.e., those consuming caffeine) and a control group (i.e., those refraining from caffeine).

  • In an experimental study there are often both explanatory and response variables.

  • In our example, the mid-term exam score is the (which) variable while caffeine consumption is the (which) variable.
17

Populations

  • In the context of statistics (and research that intends to use statistical methods for data analysis), a population is the overall group (which may be real or hypothetical) that is the focus of a research question.
18

Populations

  • In the context of statistics (and research that intends to use statistical methods for data analysis), a population is the overall group (which may be real or hypothetical) that is the focus of a research question.

    • In our examples for both an observational and an experimental study, our research question is about the population of students at the University of Scranton.
19

Populations

  • In the context of statistics (and research that intends to use statistical methods for data analysis), a population is the overall group (which may be real or hypothetical) that is the focus of a research question.

    • In our examples for both an observational and an experimental study, our research question is about the population of students at the University of Scranton.
  • Populations tend to be very large because if a population is small, then each member of the population can be observed directly and (inferential) statistics is not needed.
20

Populations

  • In the context of statistics (and research that intends to use statistical methods for data analysis), a population is the overall group (which may be real or hypothetical) that is the focus of a research question.

    • In our examples for both an observational and an experimental study, our research question is about the population of students at the University of Scranton.
  • Populations tend to be very large because if a population is small, then each member of the population can be observed directly and (inferential) statistics is not needed.

    • Typically, it is somewhere between inconvenient and impossible to collect data for every case or member in a population.
21

Populations

  • In the context of statistics (and research that intends to use statistical methods for data analysis), a population is the overall group (which may be real or hypothetical) that is the focus of a research question.

    • In our examples for both an observational and an experimental study, our research question is about the population of students at the University of Scranton.
  • Populations tend to be very large because if a population is small, then each member of the population can be observed directly and (inferential) statistics is not needed.

    • Typically, it is somewhere between inconvenient and impossible to collect data for every case or member in a population.

    • As examples, think about the observational study on UofS building preferences and the experimental study on UofS student exam performance and caffeine consumption. In either case, would it be feasible to survey or observe every student?

22

Samples

  • Researchers use statistics to analyze data collected from a sample to make conclusions about a population from which the data were sampled.
23

Samples

  • Researchers use statistics to analyze data collected from a sample to make conclusions about a population from which the data were sampled.

  • Here are some examples:

24

Samples

  • Researchers use statistics to analyze data collected from a sample to make conclusions about a population from which the data were sampled.

  • Here are some examples:

    • A factory makes a lot of items (the population) but randomly selects a few of those items to test for quality.
25

Samples

  • Researchers use statistics to analyze data collected from a sample to make conclusions about a population from which the data were sampled.

  • Here are some examples:

    • A factory makes a lot of items (the population) but randomly selects a few of those items to test for quality.

    • Data from current patients in a hospital (a sample) is used to make conclusions about future patients (the population).

26

Samples

  • Researchers use statistics to analyze data collected from a sample to make conclusions about a population from which the data were sampled.

  • Here are some examples:

    • A factory makes a lot of items (the population) but randomly selects a few of those items to test for quality.

    • Data from current patients in a hospital (a sample) is used to make conclusions about future patients (the population).

  • Why would data from a laboratory experiment typically be a sample?

27

Bias

  • One common goal of statistics is to use a sample to make generalizations about the population to which the sample belongs.
28

Bias

  • One common goal of statistics is to use a sample to make generalizations about the population to which the sample belongs.

    • One would expect that if such a goal is to be achieved, then the sample needs to be representative of the population.
29

Bias

  • One common goal of statistics is to use a sample to make generalizations about the population to which the sample belongs.

    • One would expect that if such a goal is to be achieved, then the sample needs to be representative of the population.
  • A sample that for one reason or another fails to be representative of the target population is said to be a biased sample.

30

Bias

  • One common goal of statistics is to use a sample to make generalizations about the population to which the sample belongs.

    • One would expect that if such a goal is to be achieved, then the sample needs to be representative of the population.
  • A sample that for one reason or another fails to be representative of the target population is said to be a biased sample.

    • Suppose that we want to know about the majors of students at the University of Scranton that take a statistics course. We conduct a study that involves asking every student enrolled in MATH 204 in Fall 2022 what is their major. Is this a good strategy? Why or why not?
31

Bias

  • One common goal of statistics is to use a sample to make generalizations about the population to which the sample belongs.

    • One would expect that if such a goal is to be achieved, then the sample needs to be representative of the population.
  • A sample that for one reason or another fails to be representative of the target population is said to be a biased sample.

    • Suppose that we want to know about the majors of students at the University of Scranton that take a statistics course. We conduct a study that involves asking every student enrolled in MATH 204 in Fall 2022 what is their major. Is this a good strategy? Why or why not?
  • In sampling for statistical purposes, one should always seek to randomly select a sample from a population in order to reduce the risk of bias.

32

Sampling Techniques

  • We describe four techniques for random sampling.
33

Sampling Techniques

  • We describe four techniques for random sampling.

    • Simple random sampling - selects sample members individually in some way that is random. Like drawing names from a hat.
34

Sampling Techniques

  • We describe four techniques for random sampling.

    • Simple random sampling - selects sample members individually in some way that is random. Like drawing names from a hat.

    • Stratified random sampling - groups similar individuals into strata and then randomly samples from each strata. For example, group students into 4 cohorts, then randomly select 10 students from each cohort.

35

Sampling Techniques

  • We describe four techniques for random sampling.

    • Simple random sampling - selects sample members individually in some way that is random. Like drawing names from a hat.

    • Stratified random sampling - groups similar individuals into strata and then randomly samples from each strata. For example, group students into 4 cohorts, then randomly select 10 students from each cohort.

    • Cluster sampling - data are binned into clusters and then a sample of clusters is randomly chosen. Consider all of the different classes running at the University of Scranton in Fall 2022 (the clusters), now randomly choose all students from ten of these classes.

36

Sampling Techniques

  • We describe four techniques for random sampling.

    • Simple random sampling - selects sample members individually in some way that is random. Like drawing names from a hat.

    • Stratified random sampling - groups similar individuals into strata and then randomly samples from each strata. For example, group students into 4 cohorts, then randomly select 10 students from each cohort.

    • Cluster sampling - data are binned into clusters and then a sample of clusters is randomly chosen. Consider all of the different classes running at the University of Scranton in Fall 2022 (the clusters), now randomly choose all students from ten of these classes.

    • Multistage sampling - data are binned into clusters, a sample of clusters is randomly chosen, then a sample of individuals from each cluster is chosen. Consider all of the different classes running at the University of Scranton in Fall 2022 (the clusters), randomly choose all ten of these classes, and finally select at random three students from each of the ten chosen classes.

37

Sampling Techniques

  • We describe four techniques for random sampling.

    • Simple random sampling - selects sample members individually in some way that is random. Like drawing names from a hat.

    • Stratified random sampling - groups similar individuals into strata and then randomly samples from each strata. For example, group students into 4 cohorts, then randomly select 10 students from each cohort.

    • Cluster sampling - data are binned into clusters and then a sample of clusters is randomly chosen. Consider all of the different classes running at the University of Scranton in Fall 2022 (the clusters), now randomly choose all students from ten of these classes.

    • Multistage sampling - data are binned into clusters, a sample of clusters is randomly chosen, then a sample of individuals from each cluster is chosen. Consider all of the different classes running at the University of Scranton in Fall 2022 (the clusters), randomly choose all ten of these classes, and finally select at random three students from each of the ten chosen classes.

  • The next four slides provide visual representations of each sampling method.

38

Simple Random Sampling

39

Stratified Random Sampling

40

Cluster Sampling

41

Multistage Sampling

42

Which Sampling?

  • How do we decide which sampling method to use?
43

Which Sampling?

  • How do we decide which sampling method to use?

  • It is unlikely that any sampling method will be perfect for a particular situation. Always keep in mind that the goal is to reduce bias and obtain a sample that is as representative as possible.

44

Which Sampling?

  • How do we decide which sampling method to use?

  • It is unlikely that any sampling method will be perfect for a particular situation. Always keep in mind that the goal is to reduce bias and obtain a sample that is as representative as possible.

  • Of course, there are sampling methods besides what we have discussed here. The point is to be aware of problems that arise for sampling and to think through a study design in order to minimize these problems as much as possible.

45

Summary

In this lecture, we introduced the essential concepts of data collection:

46

Summary

In this lecture, we introduced the essential concepts of data collection:

  • observational vs. experimental data
47

Summary

In this lecture, we introduced the essential concepts of data collection:

  • observational vs. experimental data

  • populations and samples

48

Summary

In this lecture, we introduced the essential concepts of data collection:

  • observational vs. experimental data

  • populations and samples

  • bias and sampling techniques

49

Next Time

  • In the next lecture, we introduce the basics of R that will be used in Chapter 2 on summarizing data.
50

Next Time

  • In the next lecture, we introduce the basics of R that will be used in Chapter 2 on summarizing data.

  • Good references for R as we will use it in this course include:

51

Next Time

  • In the next lecture, we introduce the basics of R that will be used in Chapter 2 on summarizing data.

  • Good references for R as we will use it in this course include:

52

Notes

53

Notes

54

Notes

55

Goals for Lecture

  • Introduce basic concepts related to data collection and sampling (textbook Chapter 1):
2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
sToggle scribble toolbox
Esc Back to slideshow