# Ch. 12 Linear Regression

Sections covered: 12.1, 12.2, 12.5

## 12.2 Estimating Model Parameters

Formulas to know from p. 498:

$$b_1 = \dfrac{\sum(x_i -\overline{x})(y_i - \overline{y})}{\sum(x_i - \overline{x})^2} = \frac{S_{xy}}{S_{xx}}$$ and $$b_0 = \overline{y} - b_1 \overline{x}$$

Formula to know from p. 502:

$$SSE = \sum(y_i - \hat{y_i})^2$$

Formulas to know from p. 504:

$$SST = \sum(y_i - \overline{y})^2$$ and $$r^2 = 1 - \frac{SSE}{SST}$$

Formulas to know from p. 505:

$$SSR = \sum(\hat{y_i} - \overline{y})^2$$ and $$SSE + SSR = SST$$

### Resources

Interactive Visualization: Linear Regression Try fitting the least squares line to a set of random data and check your answer (and another one).

Video: Regression I: What is regression? | SSE, SSR, SST | R-squared | Errors (ε vs. e) [contributed by Lance J.]

### R

Calculating slope and intercept for a sample of (x, y) pairs (p. 498 formulas)

# Example 12.8, p. 503
x <- c(12, 30, 36, 40, 45, 57, 62, 67, 71, 78, 93, 94, 100, 105)
y <- c(3.3, 3.2, 3.4, 3, 2.8, 2.9, 2.7, 2.6, 2.5, 2.6, 2.2, 2, 2.3, 2.1)
lm(y ~ x)  #lm = linear model
##
## Call:
## lm(formula = y ~ x)
##
## Coefficients:
## (Intercept)            x
##     3.62091     -0.01471

Predicted values:

mod <- lm(y~x)
round(mod$fitted.values, 2) ## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ## 3.44 3.18 3.09 3.03 2.96 2.78 2.71 2.64 2.58 2.47 2.25 2.24 2.15 2.08 Residuals: round(mod$residuals, 2)
##     1     2     3     4     5     6     7     8     9    10    11    12    13
## -0.14  0.02  0.31 -0.03 -0.16  0.12 -0.01 -0.04 -0.08  0.13 -0.05 -0.24  0.15
##    14
##  0.02

SSE, SSR

anova(mod)  # anova = analysis of variance
## Analysis of Variance Table
##
## Response: y
##           Df  Sum Sq Mean Sq F value    Pr(>F)
## x          1 2.29469 2.29469  104.92 2.762e-07 ***
## Residuals 12 0.26246 0.02187
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The first row under “Sum Sq” is the SSR, and the second row under “Sum Sq” is the SSE:

SSE = 0.2624565

SSR = 2.2946864

SST = SSE + SSR = 0.2624565 + 2.2946864 = 2.5571

Coefficient of determination $$r^2$$

# Example 12.4, 12.9
x <- c(132, 129, 120, 113.2, 105, 92, 84, 83.2, 88.4, 59, 80, 81.5, 71, 69.2)
y <- c(46, 48, 51, 52.1, 54, 52, 59, 58.7, 61.6, 64, 61.4, 54.6, 58.8, 58)

mod <- lm(y ~ x)

SSR <- anova(mod)$Sum Sq[1] SST <- anova(mod)$Sum Sq[1] + anova(mod)$Sum Sq[2] SSR/SST ## [1] 0.7907602 Or (simply): cor(x,y)^2 ## [1] 0.7907602 (See section 12.5) ## 12.5 Correlation Skip: “Inferences About the Population Correlation Coefficient” (p. 530) to end of section. ### Resources Interactive visualization: Correlation Coefficient (add and remove points) Interactive visualization: Interpreting Correlations [contributed by Dario G.] ### R Sample correlation coefficient $$r$$ # Example 12.15, p. 528 x <- c(2.4, 3.4, 4.6, 3.7, 2.2, 3.3, 4.0, 2.1) y <- c(1.33, 2.12, 1.80, 1.65, 2.00, 1.76, 2.11, 1.63) cor(x,y) ## [1] 0.3472602 ## Practice Exercises 1. (Least squares line) Researchers employed a least squares analysis in studying how $$Y=$$ porosity (%) is related to $$X=$$ unit weight (pcf) in concrete specimens. Consider the following representative data: x <- c(99.0, 101.1, 102.7, 103.0, 105.4, 107.0, 108.7, 110.8, 112.1, 112.4, 113.6, 113.8, 115.1, 115.4, 120.0) y <- c(28.8, 27.9, 27.0, 25.2, 22.8, 21.5, 20.9, 19.6, 17.1, 18.9, 16.0, 16.7, 13.0, 13.6, 10.8) (Textbook 12.17) 1. Obtain the equation of the estimated regression line. lm(y~x) ## ## Call: ## lm(formula = y ~ x) ## ## Coefficients: ## (Intercept) x ## 118.9099 -0.9047 $$y = 118.91 - 0.9047x$$ 1. Calculate the residuals corresponding to the first two observations. mod <- lm(y~x) round(mod$residuals, 2)
##     1     2     3     4     5     6     7     8     9    10    11    12    13
## -0.54  0.46  1.01 -0.52 -0.75 -0.60  0.33  0.93 -0.39  1.68 -0.13  0.75 -1.78
##    14    15
## -0.90  0.46

Or alternatively, use R as a calculator

pred <- 118.9099 - 0.9047*x
res <- y - pred
res[1]
## [1] -0.5446
res[2]
## [1] 0.45527
1. Calculate a point estimate of $$\sigma$$.
sig2 <- sum((res)^2)/(length(x)-2)
sqrt(sig2)
## [1] 0.938042
1. What proportion of observed variation in porosity can be attributed to the approximate linear relationship between unit weight and porosity?
cor(x, y)^2
## [1] 0.9738874
1. Calculate the SSE and SST.
anova(mod) # analsis of variance
## Analysis of Variance Table
##
## Response: y
##           Df Sum Sq Mean Sq F value    Pr(>F)
## x          1 426.62  426.62  484.84 1.125e-11 ***
## Residuals 13  11.44    0.88
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
SSE <- anova(mod)$Sum Sq[1] SST <- anova(mod)$Sum Sq[1] + anova(mod)$Sum Sq[2] c(SSE, SST) ## [1] 426.6185 438.0573 Or alternatively, use R as a calculator. Notice that the same results are produced. SSE1 <- sum((mod$residual)^2)
SST1 <- sum((y-mean(y))^2)
SSR1 <- sum((mod\$fitted.values - mean(y))^2)
c(SSE1, SST1, SSE1+SSR1)
## [1]  11.43883 438.05733 438.05733