home/teaching/STAT235/

Applied Linear Regression (STAT 235)


Time and location

  • Mondays, Wednesdays, Fridays 10:00–10:50 (January 22 – April 29).
  • NICELY 322

Textbook and software

Applied Linear Regression, by Sanford Weisberg, 4th Edition.
(Make sure to check the accompanying website for the book.)

We will be using R throughout this course.
  • Installing R.
  • Installing RStudio (open source version; recommended).
  • Installing alr4 package (includes the data used in the book).

Instructor

  • Siamak Taati
  • Email: siamak dot taati at gmail dot com
  • Office: Bliss Hall 312B

Office hours

  • Tuesdays and Wednesdays 14:00–15:30

Course log

  • [23-04-2020] A new video posted (see below).
  • [20-04-2020] A new video posted (see below).
  • [14-04-2020] Homework assignment 5 posted (see below).
  • [14-04-2020] A new video posted (see below).
  • [08-04-2020] A new video posted (see below).
  • [22-03-2020] A new video posted (see below).
  • [20-03-2020] A new video posted (see below).
  • [17-03-2020] Two new video posted (see below).
  • [17-03-2020] The deadline for the 4th assignment is set (see below).
  • [17-03-2020] The time and format of the midterm exam updated (see below).
  • [09-03-2020] The time and location of the midterm exam posted (see below).
  • [09-03-2020] Two new videos posted (see below)
  • [09-03-2020] The missing data file for homework assignment 4 is added (see below).
  • [06-03-2020] Two new videos posted (see below).
  • [05-03-2020] Since the classes are suspended in order to prevent the spread of the novel coronavirus, we will have to continue the course remotely. For now, I plan to record videos on the material and post them here. The first two videos are now posted (see below). More videos will appear gradually as I prepare them.
  • [01-03-2020] Homework assignment 4 posted (see below).
  • [01-03-2020] Note: Due to illness, the class on Monday, Mar 2 is cancelled.
  • [01-03-2020] The two scripts (fuel and water) used in the class on Friday, Feb 28.
  • [26-02-2020] Homework assignment 3 updated (added optional question).
  • [26-02-2020] The two scripts (fuel and girls) used in the class on Wednesday, Feb 26.
  • [23-02-2020] The script (fuel -> prediction) for the class on Monday, Feb 24.
  • [23-02-2020] Homework assignment 3 posted (see below).
  • [16-02-2020] The script (fuel) for the classes on Monday, Feb 17 and Wednesday, Feb 19.
  • [11-02-2020] The script (UN) for the class on Wednesday, Feb 12.
  • [09-02-2020] The extensions [script1, script2, script3] of the last three scripts for the class on Monday, Feb 10.
  • [06-02-2020] Three scripts [script1, script2, script3] for the class on Friday, Feb 7.
  • [06-02-2020] Rest of homework assignment 2 posted (with updated deadline; see below).
  • [29-01-2020] Part of homework assignment 2 posted (see below).
  • [28-01-2020] The script for the class on Wednesday, Jan 29.
  • [27-01-2020] Homework assignment 1 posted (see below).
  • [26-01-2020] The script and the datasets [Heights, Forbes] for the class on Monday, Jan 27.
    (The datasets are from the textbook.)
  • [23-01-2020] Note: Due to unforeseen circumstances, the class on Friday, Jan 24 is cancelled.

Videos


Homework

Note: The questions marked with * are optional. You can hand in your assignments electronically or in paper.
  • Assignment 1 [due: Wed, Feb 5]: Problems 1.1, 1.2 and 1.5* from chapter 1.
  • Assignment 2 [due: Mon, Feb 17]:
    • (Review) Find two random variables that are not independent but are uncorrelated.
    • (Review) Let $X_1,X_2,\ldots,X_n$ be (possibly dependent but) uncorrelated samples from a distribution with mean $\mu$ and variance $\sigma^2$. Verify that the sample variance $s^2 = \frac{1}{n-1}\sum_{k=1}^n (X_k - \overline{X})^2$ is an unbiased estimator for $\sigma^2$. Is the sample standard deviation $s=\sqrt{s^2}$ an unbiased estimator of $\sigma=\sqrt{\sigma^2}$?
    • (*) Consider the linear model $Y=\beta_0 + \beta_1 X + \varepsilon$ for a pair $(X, Y)$ of random variables, where $\varepsilon$ denotes the error in the prediction of $Y$ from $X$. Under the assumption that $\mathbb{E}[\varepsilon\,|\,X=x]=0$ for each $x$, identify the parameters $\beta_0$ and $\beta_1$ in terms of (the statistics of) $X$ and $Y$.
    • Problems 2.2, 2.3*, 2.7, 2.8, 2.9*, 2.18, 2.19, 2.20 from chapter 2.
  • Assignment 3 [due: Wed, Mar 4]:
    • Problems 3.2, 3.4, 3.6 from chapter 3.
    • (*) In Section 3.1 of the book, it is claimed that the OLS estimate for the coefficient of $X_2$ in the regression of $Y$ on $X_1$ and $X_2$ coincides with the slope of the added-variable plot of $Y$ against $X_2$ adjusted for $X_1$. Justify this claim!
      Namely, consider
      1. (model 1) the OLS regression of $Y$ on $X_1$,
      2. (model 2) the OLS regression $X_2$ on $X_1$,
      3. (model 3) the OLS regression of the residuals of model 1 on the residuals of model 2,
      As we saw in class, the combination of these three models leads to a linear model for $Y$ in terms of $X_1$ and $X_2$ (model 4). Observe that the coefficient of $X_2$ in model 4 is simply the slope in model 3. Argue that model 4 coincides with the OLS regression of $Y$ on $X_1$ and $X_2$.
      Hint: Show that the vector of residuals in model 4 is orthogonal to the column space of the data matrix $\mathbf{X}$, and recall that this property is equivalent to the OLS criterion.
  • Assignment 4 [due: Wed, Mar 25]: Problems 4.1*, 4.2 [data file], 4.6, 4.8, 4.10, 4.11*, 4.12 from chapter 4.
  • Assignment 5 [due: April 29]:
    • Problems 5.2, 5.8, 5.10, 5.12, 6.3*, 6.10, 6.14.
    • Suppose that $\underline{Y}=(Y_1,Y_2,\ldots,Y_m)^\mathsf{T}$ is a vector of jointly normal random variables with mean $0$ and covariance matrix $\Sigma$. Show that the quadratic form $\underline{Y}^\mathsf{T}\Sigma^{-1}\underline{Y}$ has distribution $\chi^2(m)$.
    • (*) Show that for the two definitions of the $F$-statistics for testing linear models as discussed in the videos on testing are equivalent. With different notation, these are (6.3) and (6.21) in the book, with a particular choice of matrix $\mathbf{L}$.

Grading

  • 60% Homework assignments
  • 20% Midterm: Wednesday, March 18, at 18:00, in NICELY 321 (pending the status of the university)
    • Time: Monday, March 30 at 12pm till Wednesday, April 1 at 6pm
    • Given the current situation, the exam will be a take-home exam.
    • The exam questions will be sent to you by email on Monday, and you are asked to submit your answers as a single pdf file (again by email) before the deadline on Wednesday.
    • The exam will be about the first 5 chapters of the book and the related material which we discussed in the class.
  • 20% Final
    • Time: Friday, May 15 at 12pm till Sunday, May 17 at 6pm
    • The exam will again be a take-home exam, in the same fashion as in midterm.

Last Update: May 12, 2020