# Ch. 12 Linear Regression

Sections covered: 12.1, 12.2, 12.5

## 12.2 Estimating Model Parameters

### Resources

Interactive Visualization: Linear Regression Try fitting the least squares line to a set of random data and check your answer (and another one).

Video: Regression I: What is regression? | SSE, SSR, SST | R-squared | Errors (ε vs. e) [contributed by Lance J.]

### R

Calculating slope and intercept for a sample of (x, y) pairs (p. 498 formulas)

# Example 12.8, p. 503
x <- c(12, 30, 36, 40, 45, 57, 62, 67, 71, 78, 93, 94, 100, 105)
y <- c(3.3, 3.2, 3.4, 3, 2.8, 2.9, 2.7, 2.6, 2.5, 2.6, 2.2, 2, 2.3, 2.1)
lm(y ~ x)  #lm = linear model
##
## Call:
## lm(formula = y ~ x)
##
## Coefficients:
## (Intercept)            x
##     3.62091     -0.01471

Predicted values:

mod <- lm(y~x)
round(mod$fitted.values, 2) ## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ## 3.44 3.18 3.09 3.03 2.96 2.78 2.71 2.64 2.58 2.47 2.25 2.24 2.15 2.08 Residuals: round(mod$residuals, 2)
##     1     2     3     4     5     6     7     8     9    10    11    12    13
## -0.14  0.02  0.31 -0.03 -0.16  0.12 -0.01 -0.04 -0.08  0.13 -0.05 -0.24  0.15
##    14
##  0.02

SSE, SSR

anova(mod)  # anova = analysis of variance
## Analysis of Variance Table
##
## Response: y
##           Df  Sum Sq Mean Sq F value       Pr(>F)
## x          1 2.29469 2.29469  104.92 0.0000002762 ***
## Residuals 12 0.26246 0.02187
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The first row under “Sum Sq” is the SSR, and the second row under “Sum Sq” is the SSE:

SSE = 0.2624565

SSR = 2.2946864

SST = SSE + SSR = 0.2624565 + 2.2946864 = 2.5571

coefficient of determination $$r^2$$

# Example 12.4, 12.9
x <- c(132, 129, 120, 113.2, 105, 92, 84, 83.2, 88.4, 59, 80, 81.5, 71, 69.2)
y <- c(46, 48, 51, 52.1, 54, 52, 59, 58.7, 61.6, 64, 61.4, 54.6, 58.8, 58)

mod <- lm(y ~ x)

SSE <- anova(mod)$Sum Sq SST <- anova(mod)$Sum Sq + anova(mod)$Sum Sq 1 - (SSE/SST) ##  0.2092398 Or (simply): cor(x,y)^2 ##  0.7907602 (See section 12.5) ## 12.5 Correlation Skip: “Inferences About the Population Correlation Coefficient” (p. 530) to end of section. ### Resources Interactive visualization: Correlation Coefficient (add and remove points) Interactive visualization: Interpreting Correlations [contributed by Dario G.] ### R sample correlation coefficient $$r$$ # Example 12.15, p. 528 x <- c(2.4, 3.4, 4.6, 3.7, 2.2, 3.3, 4.0, 2.1) y <- c(1.33, 2.12, 1.80, 1.65, 2.00, 1.76, 2.11, 1.63) cor(x,y) ##  0.3472602 ## Practice Exercises 1. (Least squares line)** Researchers employed a least squares analysis in studying how $$Y=$$ porosity (%) is related to $$X=$$ unit weight (pcf) in concrete specimens. Consider the following representative data: x <- c(99.0, 101.1, 102.7, 103.0, 105.4, 107.0, 108.7, 110.8, 112.1, 112.4, 113.6, 113.8, 115.1, 115.4, 120.0) y <- c(28.8, 27.9, 27.0, 25.2, 22.8, 21.5, 20.9, 19.6, 17.1, 18.9, 16.0, 16.7, 13.0, 13.6, 10.8) (Textbook 12.17) 1. Obtain the equation of the estimated regression line. lm(y~x) ## ## Call: ## lm(formula = y ~ x) ## ## Coefficients: ## (Intercept) x ## 118.9099 -0.9047 $$y = 118.91 - 0.9047x$$ 1. Calculate the residuals corresponding to the first two observations. mod <- lm(y~x) round(mod$residuals, 2)
##     1     2     3     4     5     6     7     8     9    10    11    12    13
## -0.54  0.46  1.01 -0.52 -0.75 -0.60  0.33  0.93 -0.39  1.68 -0.13  0.75 -1.78
##    14    15
## -0.90  0.46

Or alternatively, use R as a calculator

pred <- 118.9099 - 0.9047*x
res <- y - pred
res
##  -0.5446
res
##  0.45527
1. Calculate a point estimate of $$\sigma$$.
sig2 <- sum((res)^2)/(length(x)-2)
sqrt(sig2)
##  0.938042
1. What proportion of observed variation in porosity can be attributed to the approximate linear relationship between unit weight and porosity?
cor(x, y)^2
##  0.9738874
1. Calculate the SSE and SST.
anova(mod) # analsis of variance
## Analysis of Variance Table
##
## Response: y
##           Df Sum Sq Mean Sq F value           Pr(>F)
## x          1 426.62  426.62  484.84 0.00000000001125 ***
## Residuals 13  11.44    0.88
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
SSE <- anova(mod)$Sum Sq SST <- anova(mod)$Sum Sq + anova(mod)$Sum Sq c(SSE, SST) ##  426.6185 438.0573 Or alternatively, use R as a calculator. Notice that the same results are produced. SSE1 <- sum((mod$residual)^2)
SST1 <- sum((y-mean(y))^2)
SSR1 <- sum((mod\$fitted.values - mean(y))^2)
c(SSE1, SST1, SSE1+SSR1)
##   11.43883 438.05733 438.05733