Multiple linear regression (MLR)

Interaction Terms

Prof. Eric Friedlander

Oct 02, 2024

Announcements

  • Update eval: false to eval: true at the tops of HW’s 03, 04, 05.
  • Resubmit PDFs from 03 and 04
  • Open AE-10

Computational setup

# load packages
library(tidyverse)
library(broom)
library(mosaic)
library(ISLR2)
library(patchwork)
library(knitr)
library(kableExtra)
library(scales)

# set default theme and larger font size for ggplot2
ggplot2::theme_set(ggplot2::theme_minimal(base_size = 16))

Last time

Data: Credit Cards

The data is from the Credit data set in the ISLR2 R package. It is a simulated data set of 400 credit card customers.

Rows: 400
Columns: 11
$ Income    <dbl> 14.891, 106.025, 104.593, 148.924, 55.882, 80.180, 20.996, 7…
$ Limit     <dbl> 3606, 6645, 7075, 9504, 4897, 8047, 3388, 7114, 3300, 6819, …
$ Rating    <dbl> 283, 483, 514, 681, 357, 569, 259, 512, 266, 491, 589, 138, …
$ Cards     <dbl> 2, 3, 4, 3, 2, 4, 2, 2, 5, 3, 4, 3, 1, 1, 2, 3, 3, 3, 1, 2, …
$ Age       <dbl> 34, 82, 71, 36, 68, 77, 37, 87, 66, 41, 30, 64, 57, 49, 75, …
$ Education <dbl> 11, 15, 11, 11, 16, 10, 12, 9, 13, 19, 14, 16, 7, 9, 13, 15,…
$ Own       <fct> No, Yes, No, Yes, No, No, Yes, No, Yes, Yes, No, No, Yes, No…
$ Student   <fct> No, Yes, No, No, No, No, No, No, No, Yes, No, No, No, No, No…
$ Married   <fct> Yes, Yes, No, No, Yes, No, No, No, No, Yes, Yes, No, Yes, Ye…
$ Region    <fct> South, West, West, West, South, South, East, West, South, Ea…
$ Balance   <dbl> 333, 903, 580, 964, 331, 1151, 203, 872, 279, 1350, 1407, 0,…

Variables

Features (another name for predictors)

  • Income: Annual income (in 1000’s of US dollars)
  • Rating: Credit Rating

Outcome

  • Limit: Credit limit

Multiple linear regression

The multiple linear regression model assumes

\[ Y|X_1, X_2, \ldots, X_p \sim N(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_p X_p, \sigma_{\epsilon}^2) \]

Multiple linear regression

At any combination of the predictors, the mean value of the response \(Y\), is

\[ \mu_{Y|X_1, \ldots, X_p} = \beta_0 + \beta_1 X_{1} + \beta_2 X_2 + \dots + \beta_p X_p \]

Using multiple linear regression, we can estimate the mean response for any combination of predictors

\[ \hat{Y} = \hat{\beta}_0 + \hat{\beta}_1 X_{1} + \hat{\beta}_2 X_2 + \dots + \hat{\beta}_p X_{p} \]

Multiple linear regression (MLR)

Based on out analysis goals, we will use a multiple linear regression model of the following form

\[ \begin{aligned}\hat{\text{Limit}} ~ = \hat{\beta}_0 & + \hat{\beta}_1 \text{Rating} + \hat{\beta}_2 \text{Income} \end{aligned} \]

Model fit

lim_fit <- lm(Limit ~ Rating   + Income,
      data = Credit)

tidy(lim_fit) |>
  kable(digits = 3)
term estimate std.error statistic p.value
(Intercept) -532.471 24.173 -22.028 0.000
Rating 14.771 0.096 153.124 0.000
Income 0.557 0.423 1.316 0.189

Model equation

\[ \begin{align}\hat{\text{Limit}} = -532.471 &+14.771 \times \text{Rating}\\ & -0.557 \times \text{Income} \end{align} \]

Visualizing Model

Interaction terms

Interaction terms

  • Sometimes the relationship between a predictor variable and the response depends on the value of another predictor variable.
  • This is an interaction effect.
  • To account for this, we can include interaction terms in the model.
  • We want a model of the form:

\[ \begin{aligned}\hat{\text{Limit}} ~ = \hat{\beta}_0 & + \hat{\beta}_1 \text{Rating} + \hat{\beta}_2 \text{Income} + \hat{\beta}_3\text{Rating}\times\text{Income} \end{aligned} \]

Interpreting interaction terms

  • What the interaction means: The effect of annual income on the credit limit depends on the borrowers credit rating

Visualizing Interaction Model Exaggerated

Visualizing Interaction Model

Application Exercise

Complete Activity

Model Fit

term estimate std.error statistic p.value
(Intercept) -579.85561 37.48489 -15.46905 0.00000
Income 1.81240 0.86962 2.08414 0.03779
Rating 14.87125 0.11375 130.73507 0.00000
Income:Rating -0.00221 0.00134 -1.65140 0.09945

\[ \begin{aligned}\hat{\text{Limit}} ~ = & -579.85561 + 14.87125~\text{Rating} + 1.81240~\text{Income}\\ & \qquad- 0.00221~\text{Rating}\times\text{Income} \end{aligned} \]

Interpreting the interaction term

  • For a fixed Rating the slope of Income is \((1.81240 - 0.0021\times\text{Rating})\)
  • For a fixed Income the slope of Rating is \((14.87125 - 0.0021\times\text{Income})\)

What’s actually happening:

Credit_int <- Credit |> 
  mutate(Interaction = Income * Rating) 

Credit_int |> 
  select(Limit, Income, Rating, Interaction) |> 
  head() |> 
  kable()
Limit Income Rating Interaction
3606 14.891 283 4214.153
6645 106.025 483 51210.075
7075 104.593 514 53760.802
9504 148.924 681 101417.244
4897 55.882 357 19949.874
8047 80.180 569 45622.420

What’s actually happening:

lm(Limit ~ Income + Rating + Interaction, data = Credit_int) |> 
  tidy() |> 
  kable(digits = 5)
term estimate std.error statistic p.value
(Intercept) -579.85561 37.48489 -15.46905 0.00000
Income 1.81240 0.86962 2.08414 0.03779
Rating 14.87125 0.11375 130.73507 0.00000
Interaction -0.00221 0.00134 -1.65140 0.09945

Wrap up

Recap

  • Introduced multiple linear regression

  • Interpreted coefficients in the multiple linear regression model

  • Calculated predictions and associated intervals for multiple linear regression models

  • Used interaction terms