Ch. 12 Linear Regression

Sections covered: 12.1, 12.2, 12.5

12.1 The Simple Linear Regression Model

12.2 Estimating Model Parameters

Formulas to know from p. 498:

\(b_1 = \dfrac{\sum(x_i -\overline{x})(y_i - \overline{y})}{\sum(x_i - \overline{x})^2} = \frac{S_{xy}}{S_{xx}}\) and \(b_0 = \overline{y} - b_1 \overline{x}\)

Formula to know from p. 502:

\(SSE = \sum(y_i - \hat{y_i})^2\)

Formulas to know from p. 504:

\(SST = \sum(y_i - \overline{y})^2\) and \(r^2 = 1 - \frac{SSE}{SST}\)

Formulas to know from p. 505:

\(SSR = \sum(\hat{y_i} - \overline{y})^2\) and \(SSE + SSR = SST\)

Resources

Interactive Visualization: Linear Regression Try fitting the least squares line to a set of random data and check your answer (and another one).

Video: Regression I: What is regression? | SSE, SSR, SST | R-squared | Errors (ε vs. e)

Textbook p. 507 #17

Researchers fitted a simple linear regression model to explain how \(Y=\) porosity (%) is related to \(X=\) unit weight (pcf) in concrete specimens. Consider the following representative data:

x <- c(99.0, 101.1, 102.7, 103.0, 105.4, 107.0, 108.7, 110.8, 112.1, 112.4, 113.6, 113.8, 115.1, 115.4, 120.0)
y <- c(28.8, 27.9, 27.0, 25.2, 22.8, 21.5, 20.9, 19.6, 17.1, 18.9, 16.0, 16.7, 13.0, 13.6, 10.8)

Using R to find:

Model coefficients, \(b_0\) and \(b_1\):

mod <- lm(y ~ x)
mod$coefficients
## (Intercept)           x 
## 118.9099168  -0.9047307

(\(b_0\) is listed under (Intercept) and \(b_1\) is listed under x.)

Residuals

mod$residuals
##          1          2          3          4          5          6          7 
## -0.5415817  0.4583527  1.0059218 -0.5226590 -0.7513055 -0.6037364  0.3343057 
##          8          9         10         11         12         13         14 
##  0.9342401 -0.3896101  1.6818091 -0.1325141  0.7484321 -1.7754181 -0.9039989 
##         15 
##  0.4577621

rounded:

round(mod$residuals, 2)
##     1     2     3     4     5     6     7     8     9    10    11    12    13 
## -0.54  0.46  1.01 -0.52 -0.75 -0.60  0.33  0.93 -0.39  1.68 -0.13  0.75 -1.78 
##    14    15 
## -0.90  0.46

SSR, SSE, SST

SSR (regression sum of squares):

SSR <- sum((mod$fitted.values - mean(y))^2)
SSR
## [1] 426.6185

SSE (error sum of squares):

SSE <- sum(mod$residuals^2)
SSE
## [1] 11.43883

SST (total sum of squares):

SST <- sum((y - mean(y))^2)
SST
## [1] 438.0573

(Check that SSR + SSE = SST)

SSR + SSE  
## [1] 438.0573
  1. What proportion of observed variation in porosity can be attributed to the approximate linear relationship between unit weight and porosity?

Method #1: SSR/SST

SSR/SST
## [1] 0.9738874

Method #2: \(r^2\)

cor(x, y)^2
## [1] 0.9738874

12.5 Correlation

Skip: p. 530 “Inferences About the Population Correlation Coefficient” to the end of the section.

Resources

Interactive visualization: Correlation Coefficient (add and remove points)

Interactive visualization: Interpreting Correlations

R

Sample correlation coefficient \(r\)

# Example 12.15, p. 528
x <- c(2.4, 3.4, 4.6, 3.7, 2.2, 3.3, 4.0, 2.1)
y <- c(1.33, 2.12, 1.80, 1.65, 2.00, 1.76, 2.11, 1.63)

cor(x,y)
## [1] 0.3472602