class: center, middle, inverse, title-slide .title[ # MATH 204 Introduction to Statistics ] .subtitle[ ## Lecture 5: Relationships Between Numerical Variables ] .author[ ### JMG ] --- ## Goals for Lecture * Introduce graphical and numerical summaries for pairs of numerical data. Textbook sections 2.1.1 and 8.1.4. -- * Introduce scatterplots for pairs of numerical variables. -- * Introduce the correlation concept. --- ## Scatterplots - A **scatterplot** provides a case-by-case view of data for two *numerical* variables. -- - Scatterplots are useful for exploratory purposes to assess if two variables might have an association. --- ## Scatterplot Example For example, in the `epa2021` data set we can try to assess any association between engine size and gas mileage: .panelset[ .panel[.panel-name[Scatterplot] <img src="index_files/figure-html/eg-scatter-1.png" style="display: block; margin: auto;" /> ] .panel[.panel-name[R Code] ```r gf_point(hwy_mpg~engine_displacement,data=epa2021) ``` ] ] --- ## Thinking About Scatterplots A scatterplot provides a case-by-case view of data for two numerical variables. -- - **Question:** What does the scatterplot on the previous slide reveal about the data? How is the plot useful? How are scatter plots useful in general? -- - **Question:** Based on the scatterplot on the previous slide, do you think there is any degree of association between the two variables? If so, is the association positive or negative? --- ## Models Statistical models help us assess variable associations. For example, the curve shown in the following plot is obtained by fitting a type of statistical model to the data: <img src="index_files/figure-html/eg_model-1.png" style="display: block; margin: auto;" /> This model suggests that the relationship between engine size and gas mileage is **nonlinear** since the curve deviates significantly from being a straight line. --- ## Linear Associations - It is difficult to assess and summarize general associations between numerical variables. -- - It is easier to asses and summarize the possibility of a particular type of association. -- - The simplest type of association to look for between two numerical variables is a **linear** association. -- - A linear association is one in which one variable either increases or decreases at a *constant* rate with respect to another variable. --- ## Example <img src="index_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> --- ## Example With Lines <img src="index_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> --- ## Correlation - **Correlation**, which always takes values between -1 and 1 is a statistic that describes the strength of the **linear** relationship between two variables. Correlation is denoted by `\(R\)`. -- - In R, correlation is computed with the `cor` command. For example, the correlation between the engine size and gas mileage variables in the `epa2021` data set is computed as ```r cor(epa2021$engine_displacement,epa2021$hwy_mpg) ``` ``` ## [1] -0.707205 ``` -- - The plot in the next slide shows several scatter plots together with the corresponding correlation value. --- ## Correlation Illustrations .center[ <img src="https://www.dropbox.com/s/jbgilwujdebjwwf/posNegCorPlots.png?raw=1" width="75%" /> ] --- ## Strongly Related Variables with Weak Correlations - It is important to note that two variables may have a strong association even if their correlation is relatively weak. This is because correlation measures **linear** association and variables may have a strong **nonlinear** association. -- .center[ <img src="https://www.dropbox.com/s/rmujuxuoihdg5xb/corForNonLinearPlots.png?raw=1" width="100%" /> ] --- ## Summary In this lecture, we introduced -- - scatterplots as a visual display of the relationship between two numerical variables, and - discussed correlation as a numerical measure of the strength of a linear relationship between two numerical variables. --- ## Next Time Before next time, watch this video corresponding to textbook section 2.2.
--- ## Notes --- ## Notes --- ## Notes