Applied Linear Regression (STAT 235)
Time and location
 Mondays, Wednesdays, Fridays 10:00–10:50 (January 22 – April 29).
 NICELY 322
Textbook and software
Applied Linear Regression,
by Sanford Weisberg, 4th Edition.
(Make sure to check the accompanying website for the book.)
We will be using R throughout this course.
 Installing R.
 Installing RStudio (open source version; recommended).
 Installing alr4 package
(includes the data used in the book).
Instructor
 Siamak Taati
 Email: siamak dot taati at gmail dot com
 Office: Bliss Hall 312B
Office hours
 Tuesdays and Wednesdays 14:00–15:30
Course log
 [23042020] A new video posted (see below).
 [20042020] A new video posted (see below).
 [14042020]
Homework assignment 5 posted (see below).
 [14042020] A new video posted (see below).
 [08042020] A new video posted (see below).
 [22032020] A new video posted (see below).
 [20032020] A new video posted (see below).
 [17032020] Two new video posted (see below).
 [17032020] The deadline for the 4th assignment is set (see below).
 [17032020] The time and format of the midterm exam updated (see below).
 [09032020] The time and location of the midterm exam posted (see below).
 [09032020] Two new videos posted (see below)
 [09032020] The missing data file for homework assignment 4 is added (see below).
 [06032020] Two new videos posted (see below).
 [05032020] Since the classes are suspended in order to prevent the spread of the novel coronavirus,
we will have to continue the course remotely.
For now, I plan to record videos on the material and post them here.
The first two videos are now posted (see below). More videos will appear gradually as I prepare them.
 [01032020] Homework assignment 4 posted (see below).
 [01032020] Note: Due to illness, the class on Monday, Mar 2 is cancelled.
 [01032020] The two scripts
(fuel and
water)
used in the class on Friday, Feb 28.
 [26022020] Homework assignment 3 updated (added optional question).
 [26022020] The two scripts
(fuel and
girls)
used in the class on Wednesday, Feb 26.
 [23022020] The script (fuel > prediction)
for the class on Monday, Feb 24.
 [23022020] Homework assignment 3 posted (see below).
 [16022020] The script (fuel)
for the classes on Monday, Feb 17 and Wednesday, Feb 19.
 [11022020] The script (UN)
for the class on Wednesday, Feb 12.
 [09022020] The extensions [script1,
script2,
script3] of the last three scripts
for the class on Monday, Feb 10.
 [06022020] Three scripts [script1,
script2,
script3] for the class on Friday, Feb 7.
 [06022020] Rest of homework assignment 2 posted (with updated deadline; see below).
 [29012020] Part of homework assignment 2 posted (see below).
 [28012020] The script
for the class on Wednesday, Jan 29.
 [27012020] Homework assignment 1 posted (see below).
 [26012020] The script
and the datasets [Heights, Forbes]
for the class on Monday, Jan 27.
(The datasets are from the textbook.)
 [23012020] Note: Due to unforeseen circumstances, the class on Friday, Jan 24 is cancelled.
Videos
 The effect of dropping regressors (section 4.2)
 Lurking variables (section 4.3.1)
 Idealized scenario: normal population (section 4.4)
 Review of multivariate normal distribution
(For some reason the simulation on R was not recorded.
Here is a script for the simulation. Do try it on your own.)
 More on the coefficient of determination $R^2$ (section 4.5,
The script for the experiment)
 Complex regressors: factors (section 5.1,
script)
 Complex regressors: more on factors (sections 5.1 and 5.2,
script,
data)
 Complex regressors: polynomial regression (section 5.3,
script1, script2)
 Complex regressors: splines (section 5.4,
script)
 Principal components (section 5.5,
script)
 Testing linear models: Ftest (section 6.1,
forbesscript,
simpsonscript,
simpsondata)
 More on testing: Wald tests (section 6.5)
 Weighted least squares regression (section 7.1,
script1, script2)
 GaussMarkov theorem: optimality of OLS/WLS
Homework
Note: The questions marked with * are optional. You can hand in your assignments
electronically or in paper.
 Assignment 1 [due: Wed, Feb 5]: Problems 1.1, 1.2 and 1.5* from chapter 1.
 Assignment 2 [due: Mon, Feb 17]:
 (Review) Find two random variables that are not independent but are uncorrelated.
 (Review) Let $X_1,X_2,\ldots,X_n$ be (possibly dependent but) uncorrelated samples from
a distribution with mean $\mu$ and variance $\sigma^2$.
Verify that the sample variance $s^2 = \frac{1}{n1}\sum_{k=1}^n (X_k  \overline{X})^2$ is
an unbiased estimator for $\sigma^2$.
Is the sample standard deviation $s=\sqrt{s^2}$ an unbiased estimator of $\sigma=\sqrt{\sigma^2}$?
 (*) Consider the linear model $Y=\beta_0 + \beta_1 X + \varepsilon$ for a pair $(X, Y)$ of random variables,
where $\varepsilon$ denotes the error in the prediction of $Y$ from $X$.
Under the assumption that $\mathbb{E}[\varepsilon\,\,X=x]=0$ for each $x$, identify the parameters
$\beta_0$ and $\beta_1$ in terms of (the statistics of) $X$ and $Y$.
 Problems 2.2, 2.3*, 2.7, 2.8, 2.9*, 2.18, 2.19, 2.20 from chapter 2.
 Assignment 3 [due: Wed, Mar 4]:
 Problems 3.2, 3.4, 3.6 from chapter 3.
 (*) In Section 3.1 of the book, it is claimed that the OLS estimate
for the coefficient of $X_2$ in the regression of $Y$ on $X_1$ and $X_2$
coincides with the slope of the addedvariable plot
of $Y$ against $X_2$ adjusted for $X_1$. Justify this claim!
Namely, consider
 (model 1) the OLS regression of $Y$ on $X_1$,
 (model 2) the OLS regression $X_2$ on $X_1$,
 (model 3) the OLS regression of the residuals of model 1 on the residuals of model 2,
As we saw in class, the combination of these three models
leads to a linear model for $Y$ in terms of $X_1$ and $X_2$ (model 4).
Observe that the coefficient of $X_2$ in model 4 is simply the slope in model 3.
Argue that model 4 coincides with the OLS regression of $Y$ on $X_1$ and $X_2$.
Hint: Show that the vector of residuals in model 4 is orthogonal
to the column space of the data matrix $\mathbf{X}$, and recall that this property
is equivalent to the OLS criterion.
 Assignment 4 [due: Wed, Mar 25]:
Problems 4.1*, 4.2 [data file], 4.6, 4.8, 4.10, 4.11*, 4.12 from chapter 4.
 Assignment 5 [due: April 29]:
 Problems 5.2, 5.8, 5.10, 5.12, 6.3*, 6.10, 6.14.
 Suppose that $\underline{Y}=(Y_1,Y_2,\ldots,Y_m)^\mathsf{T}$ is a vector of jointly normal
random variables with mean $0$ and covariance matrix $\Sigma$.
Show that the quadratic form $\underline{Y}^\mathsf{T}\Sigma^{1}\underline{Y}$
has distribution $\chi^2(m)$.
 (*) Show that for the two definitions of the $F$statistics for testing linear models
as discussed in the videos on testing are equivalent.
With different notation, these are (6.3) and (6.21) in the book, with a particular choice
of matrix $\mathbf{L}$.
Grading
 60% Homework assignments
 20% Midterm:
Wednesday, March 18, at 18:00, in NICELY 321 (pending the status of the university)
 Time: Monday, March 30 at 12pm till Wednesday, April 1 at 6pm
 Given the current situation, the exam will be a takehome exam.
 The exam questions will be sent to you by email on Monday, and you are asked to
submit your answers as a single pdf file (again by email) before the deadline on Wednesday.
 The exam will be about the first 5 chapters of the book and the related material
which we discussed in the class.
 20% Final
 Time: Friday, May 15 at 12pm till Sunday, May 17 at 6pm
 The exam will again be a takehome exam, in the same fashion as in midterm.
