Introduce basic concepts related to data collection and sampling (textbook Chapter 1):
Observational vs. experimental studies (textbook sections 1.2.5, 1.3.4, and 1.4)
Populations and samples (textbook section 1.3.1)
Issue of bias (textbook section 1.3.3)
Sampling strategies (textbook section 1.3.5)
The first step in conducting research is to identify topics or questions that are to be investigated.
The next step is to collect data.
The first step in conducting research is to identify topics or questions that are to be investigated.
The next step is to collect data.
The first step in conducting research is to identify topics or questions that are to be investigated.
The next step is to collect data.
It is important to distinguish observational studies from experimental studies.
The first step in conducting research is to identify topics or questions that are to be investigated.
The next step is to collect data.
It is important to distinguish observational studies from experimental studies.
The first step in conducting research is to identify topics or questions that are to be investigated.
The next step is to collect data.
It is important to distinguish observational studies from experimental studies.
In an observational study the data collection process does not interfere with the subjects of the study.
An experimental study involves a manipulation of the study subjects.
Suppose that we want to know what University of Scranton students do or do not like about the buildings on campus. We could survey current students with questions that ask for ratings on different aspects of campus buildings.
Suppose that we want to know what University of Scranton students do or do not like about the buildings on campus. We could survey current students with questions that ask for ratings on different aspects of campus buildings.
Making causal conclusions based on observational data is not recommended.
Suppose that we want know if caffeine consumption influences the exam performance of University of Scranton students. To study this, we give all of the students in one section of BIOL 141 a certain dose of caffeine at the mid-term exam while asking all of the students in another section of BIOL 141 to refrain from consuming caffeine before the mid-term exam and record the mid-term exam scores of both sections.
Suppose that we want know if caffeine consumption influences the exam performance of University of Scranton students. To study this, we give all of the students in one section of BIOL 141 a certain dose of caffeine at the mid-term exam while asking all of the students in another section of BIOL 141 to refrain from consuming caffeine before the mid-term exam and record the mid-term exam scores of both sections.
This is an example of experimental data. Why?
Note that there is an experimental group (i.e., those consuming caffeine) and a control group (i.e., those refraining from caffeine).
Suppose that we want know if caffeine consumption influences the exam performance of University of Scranton students. To study this, we give all of the students in one section of BIOL 141 a certain dose of caffeine at the mid-term exam while asking all of the students in another section of BIOL 141 to refrain from consuming caffeine before the mid-term exam and record the mid-term exam scores of both sections.
This is an example of experimental data. Why?
Note that there is an experimental group (i.e., those consuming caffeine) and a control group (i.e., those refraining from caffeine).
In an experimental study there are often both explanatory and response variables.
Suppose that we want know if caffeine consumption influences the exam performance of University of Scranton students. To study this, we give all of the students in one section of BIOL 141 a certain dose of caffeine at the mid-term exam while asking all of the students in another section of BIOL 141 to refrain from consuming caffeine before the mid-term exam and record the mid-term exam scores of both sections.
This is an example of experimental data. Why?
Note that there is an experimental group (i.e., those consuming caffeine) and a control group (i.e., those refraining from caffeine).
In an experimental study there are often both explanatory and response variables.
In the context of statistics (and research that intends to use statistical methods for data analysis), a population is the overall group (which may be real or hypothetical) that is the focus of a research question.
In the context of statistics (and research that intends to use statistical methods for data analysis), a population is the overall group (which may be real or hypothetical) that is the focus of a research question.
In the context of statistics (and research that intends to use statistical methods for data analysis), a population is the overall group (which may be real or hypothetical) that is the focus of a research question.
Populations tend to be very large because if a population is small, then each member of the population can be observed directly and (inferential) statistics is not needed.
In the context of statistics (and research that intends to use statistical methods for data analysis), a population is the overall group (which may be real or hypothetical) that is the focus of a research question.
Populations tend to be very large because if a population is small, then each member of the population can be observed directly and (inferential) statistics is not needed.
Typically, it is somewhere between inconvenient and impossible to collect data for every case or member in a population.
As examples, think about the observational study on UofS building preferences and the experimental study on UofS student exam performance and caffeine consumption. In either case, would it be feasible to survey or observe every student?
Researchers use statistics to analyze data collected from a sample to make conclusions about a population from which the data were sampled.
Here are some examples:
Researchers use statistics to analyze data collected from a sample to make conclusions about a population from which the data were sampled.
Here are some examples:
Researchers use statistics to analyze data collected from a sample to make conclusions about a population from which the data were sampled.
Here are some examples:
A factory makes a lot of items (the population) but randomly selects a few of those items to test for quality.
Data from current patients in a hospital (a sample) is used to make conclusions about future patients (the population).
Researchers use statistics to analyze data collected from a sample to make conclusions about a population from which the data were sampled.
Here are some examples:
A factory makes a lot of items (the population) but randomly selects a few of those items to test for quality.
Data from current patients in a hospital (a sample) is used to make conclusions about future patients (the population).
Why would data from a laboratory experiment typically be a sample?
One common goal of statistics is to use a sample to make generalizations about the population to which the sample belongs.
One common goal of statistics is to use a sample to make generalizations about the population to which the sample belongs.
A sample that for one reason or another fails to be representative of the target population is said to be a biased sample.
One common goal of statistics is to use a sample to make generalizations about the population to which the sample belongs.
A sample that for one reason or another fails to be representative of the target population is said to be a biased sample.
One common goal of statistics is to use a sample to make generalizations about the population to which the sample belongs.
A sample that for one reason or another fails to be representative of the target population is said to be a biased sample.
In sampling for statistical purposes, one should always seek to randomly select a sample from a population in order to reduce the risk of bias.
We describe four techniques for random sampling.
We describe four techniques for random sampling.
Simple random sampling - selects sample members individually in some way that is random. Like drawing names from a hat.
Stratified random sampling - groups similar individuals into strata and then randomly samples from each strata. For example, group students into 4 cohorts, then randomly select 10 students from each cohort.
We describe four techniques for random sampling.
Simple random sampling - selects sample members individually in some way that is random. Like drawing names from a hat.
Stratified random sampling - groups similar individuals into strata and then randomly samples from each strata. For example, group students into 4 cohorts, then randomly select 10 students from each cohort.
Cluster sampling - data are binned into clusters and then a sample of clusters is randomly chosen. Consider all of the different classes running at the University of Scranton in Fall 2022 (the clusters), now randomly choose all students from ten of these classes.
We describe four techniques for random sampling.
Simple random sampling - selects sample members individually in some way that is random. Like drawing names from a hat.
Stratified random sampling - groups similar individuals into strata and then randomly samples from each strata. For example, group students into 4 cohorts, then randomly select 10 students from each cohort.
Cluster sampling - data are binned into clusters and then a sample of clusters is randomly chosen. Consider all of the different classes running at the University of Scranton in Fall 2022 (the clusters), now randomly choose all students from ten of these classes.
Multistage sampling - data are binned into clusters, a sample of clusters is randomly chosen, then a sample of individuals from each cluster is chosen. Consider all of the different classes running at the University of Scranton in Fall 2022 (the clusters), randomly choose all ten of these classes, and finally select at random three students from each of the ten chosen classes.
We describe four techniques for random sampling.
Simple random sampling - selects sample members individually in some way that is random. Like drawing names from a hat.
Stratified random sampling - groups similar individuals into strata and then randomly samples from each strata. For example, group students into 4 cohorts, then randomly select 10 students from each cohort.
Cluster sampling - data are binned into clusters and then a sample of clusters is randomly chosen. Consider all of the different classes running at the University of Scranton in Fall 2022 (the clusters), now randomly choose all students from ten of these classes.
Multistage sampling - data are binned into clusters, a sample of clusters is randomly chosen, then a sample of individuals from each cluster is chosen. Consider all of the different classes running at the University of Scranton in Fall 2022 (the clusters), randomly choose all ten of these classes, and finally select at random three students from each of the ten chosen classes.
The next four slides provide visual representations of each sampling method.
How do we decide which sampling method to use?
It is unlikely that any sampling method will be perfect for a particular situation. Always keep in mind that the goal is to reduce bias and obtain a sample that is as representative as possible.
How do we decide which sampling method to use?
It is unlikely that any sampling method will be perfect for a particular situation. Always keep in mind that the goal is to reduce bias and obtain a sample that is as representative as possible.
Of course, there are sampling methods besides what we have discussed here. The point is to be aware of problems that arise for sampling and to think through a study design in order to minimize these problems as much as possible.
In this lecture, we introduced the essential concepts of data collection:
In this lecture, we introduced the essential concepts of data collection:
In this lecture, we introduced the essential concepts of data collection:
observational vs. experimental data
populations and samples
In this lecture, we introduced the essential concepts of data collection:
observational vs. experimental data
populations and samples
bias and sampling techniques
In the next lecture, we introduce the basics of R that will be used in Chapter 2 on summarizing data.
Good references for R as we will use it in this course include:
In the next lecture, we introduce the basics of R that will be used in Chapter 2 on summarizing data.
Good references for R as we will use it in this course include:
The swirl course on R programming.
The Intro to R blog by Jenny Sloane
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
o | Tile View: Overview of Slides |
s | Toggle scribble toolbox |
Esc | Back to slideshow |
Introduce basic concepts related to data collection and sampling (textbook Chapter 1):
Observational vs. experimental studies (textbook sections 1.2.5, 1.3.4, and 1.4)
Populations and samples (textbook section 1.3.1)
Issue of bias (textbook section 1.3.3)
Sampling strategies (textbook section 1.3.5)
The first step in conducting research is to identify topics or questions that are to be investigated.
The next step is to collect data.
The first step in conducting research is to identify topics or questions that are to be investigated.
The next step is to collect data.
The first step in conducting research is to identify topics or questions that are to be investigated.
The next step is to collect data.
It is important to distinguish observational studies from experimental studies.
The first step in conducting research is to identify topics or questions that are to be investigated.
The next step is to collect data.
It is important to distinguish observational studies from experimental studies.
The first step in conducting research is to identify topics or questions that are to be investigated.
The next step is to collect data.
It is important to distinguish observational studies from experimental studies.
In an observational study the data collection process does not interfere with the subjects of the study.
An experimental study involves a manipulation of the study subjects.
Suppose that we want to know what University of Scranton students do or do not like about the buildings on campus. We could survey current students with questions that ask for ratings on different aspects of campus buildings.
Suppose that we want to know what University of Scranton students do or do not like about the buildings on campus. We could survey current students with questions that ask for ratings on different aspects of campus buildings.
Making causal conclusions based on observational data is not recommended.
Suppose that we want know if caffeine consumption influences the exam performance of University of Scranton students. To study this, we give all of the students in one section of BIOL 141 a certain dose of caffeine at the mid-term exam while asking all of the students in another section of BIOL 141 to refrain from consuming caffeine before the mid-term exam and record the mid-term exam scores of both sections.
Suppose that we want know if caffeine consumption influences the exam performance of University of Scranton students. To study this, we give all of the students in one section of BIOL 141 a certain dose of caffeine at the mid-term exam while asking all of the students in another section of BIOL 141 to refrain from consuming caffeine before the mid-term exam and record the mid-term exam scores of both sections.
This is an example of experimental data. Why?
Note that there is an experimental group (i.e., those consuming caffeine) and a control group (i.e., those refraining from caffeine).
Suppose that we want know if caffeine consumption influences the exam performance of University of Scranton students. To study this, we give all of the students in one section of BIOL 141 a certain dose of caffeine at the mid-term exam while asking all of the students in another section of BIOL 141 to refrain from consuming caffeine before the mid-term exam and record the mid-term exam scores of both sections.
This is an example of experimental data. Why?
Note that there is an experimental group (i.e., those consuming caffeine) and a control group (i.e., those refraining from caffeine).
In an experimental study there are often both explanatory and response variables.
Suppose that we want know if caffeine consumption influences the exam performance of University of Scranton students. To study this, we give all of the students in one section of BIOL 141 a certain dose of caffeine at the mid-term exam while asking all of the students in another section of BIOL 141 to refrain from consuming caffeine before the mid-term exam and record the mid-term exam scores of both sections.
This is an example of experimental data. Why?
Note that there is an experimental group (i.e., those consuming caffeine) and a control group (i.e., those refraining from caffeine).
In an experimental study there are often both explanatory and response variables.
In the context of statistics (and research that intends to use statistical methods for data analysis), a population is the overall group (which may be real or hypothetical) that is the focus of a research question.
In the context of statistics (and research that intends to use statistical methods for data analysis), a population is the overall group (which may be real or hypothetical) that is the focus of a research question.
In the context of statistics (and research that intends to use statistical methods for data analysis), a population is the overall group (which may be real or hypothetical) that is the focus of a research question.
Populations tend to be very large because if a population is small, then each member of the population can be observed directly and (inferential) statistics is not needed.
In the context of statistics (and research that intends to use statistical methods for data analysis), a population is the overall group (which may be real or hypothetical) that is the focus of a research question.
Populations tend to be very large because if a population is small, then each member of the population can be observed directly and (inferential) statistics is not needed.
Typically, it is somewhere between inconvenient and impossible to collect data for every case or member in a population.
As examples, think about the observational study on UofS building preferences and the experimental study on UofS student exam performance and caffeine consumption. In either case, would it be feasible to survey or observe every student?
Researchers use statistics to analyze data collected from a sample to make conclusions about a population from which the data were sampled.
Here are some examples:
Researchers use statistics to analyze data collected from a sample to make conclusions about a population from which the data were sampled.
Here are some examples:
Researchers use statistics to analyze data collected from a sample to make conclusions about a population from which the data were sampled.
Here are some examples:
A factory makes a lot of items (the population) but randomly selects a few of those items to test for quality.
Data from current patients in a hospital (a sample) is used to make conclusions about future patients (the population).
Researchers use statistics to analyze data collected from a sample to make conclusions about a population from which the data were sampled.
Here are some examples:
A factory makes a lot of items (the population) but randomly selects a few of those items to test for quality.
Data from current patients in a hospital (a sample) is used to make conclusions about future patients (the population).
Why would data from a laboratory experiment typically be a sample?
One common goal of statistics is to use a sample to make generalizations about the population to which the sample belongs.
One common goal of statistics is to use a sample to make generalizations about the population to which the sample belongs.
A sample that for one reason or another fails to be representative of the target population is said to be a biased sample.
One common goal of statistics is to use a sample to make generalizations about the population to which the sample belongs.
A sample that for one reason or another fails to be representative of the target population is said to be a biased sample.
One common goal of statistics is to use a sample to make generalizations about the population to which the sample belongs.
A sample that for one reason or another fails to be representative of the target population is said to be a biased sample.
In sampling for statistical purposes, one should always seek to randomly select a sample from a population in order to reduce the risk of bias.
We describe four techniques for random sampling.
We describe four techniques for random sampling.
Simple random sampling - selects sample members individually in some way that is random. Like drawing names from a hat.
Stratified random sampling - groups similar individuals into strata and then randomly samples from each strata. For example, group students into 4 cohorts, then randomly select 10 students from each cohort.
We describe four techniques for random sampling.
Simple random sampling - selects sample members individually in some way that is random. Like drawing names from a hat.
Stratified random sampling - groups similar individuals into strata and then randomly samples from each strata. For example, group students into 4 cohorts, then randomly select 10 students from each cohort.
Cluster sampling - data are binned into clusters and then a sample of clusters is randomly chosen. Consider all of the different classes running at the University of Scranton in Fall 2022 (the clusters), now randomly choose all students from ten of these classes.
We describe four techniques for random sampling.
Simple random sampling - selects sample members individually in some way that is random. Like drawing names from a hat.
Stratified random sampling - groups similar individuals into strata and then randomly samples from each strata. For example, group students into 4 cohorts, then randomly select 10 students from each cohort.
Cluster sampling - data are binned into clusters and then a sample of clusters is randomly chosen. Consider all of the different classes running at the University of Scranton in Fall 2022 (the clusters), now randomly choose all students from ten of these classes.
Multistage sampling - data are binned into clusters, a sample of clusters is randomly chosen, then a sample of individuals from each cluster is chosen. Consider all of the different classes running at the University of Scranton in Fall 2022 (the clusters), randomly choose all ten of these classes, and finally select at random three students from each of the ten chosen classes.
We describe four techniques for random sampling.
Simple random sampling - selects sample members individually in some way that is random. Like drawing names from a hat.
Stratified random sampling - groups similar individuals into strata and then randomly samples from each strata. For example, group students into 4 cohorts, then randomly select 10 students from each cohort.
Cluster sampling - data are binned into clusters and then a sample of clusters is randomly chosen. Consider all of the different classes running at the University of Scranton in Fall 2022 (the clusters), now randomly choose all students from ten of these classes.
Multistage sampling - data are binned into clusters, a sample of clusters is randomly chosen, then a sample of individuals from each cluster is chosen. Consider all of the different classes running at the University of Scranton in Fall 2022 (the clusters), randomly choose all ten of these classes, and finally select at random three students from each of the ten chosen classes.
The next four slides provide visual representations of each sampling method.
How do we decide which sampling method to use?
It is unlikely that any sampling method will be perfect for a particular situation. Always keep in mind that the goal is to reduce bias and obtain a sample that is as representative as possible.
How do we decide which sampling method to use?
It is unlikely that any sampling method will be perfect for a particular situation. Always keep in mind that the goal is to reduce bias and obtain a sample that is as representative as possible.
Of course, there are sampling methods besides what we have discussed here. The point is to be aware of problems that arise for sampling and to think through a study design in order to minimize these problems as much as possible.
In this lecture, we introduced the essential concepts of data collection:
In this lecture, we introduced the essential concepts of data collection:
In this lecture, we introduced the essential concepts of data collection:
observational vs. experimental data
populations and samples
In this lecture, we introduced the essential concepts of data collection:
observational vs. experimental data
populations and samples
bias and sampling techniques
In the next lecture, we introduce the basics of R that will be used in Chapter 2 on summarizing data.
Good references for R as we will use it in this course include:
In the next lecture, we introduce the basics of R that will be used in Chapter 2 on summarizing data.
Good references for R as we will use it in this course include:
The swirl course on R programming.
The Intro to R blog by Jenny Sloane