Interaction Terms
Oct 02, 2024
eval: false to eval: true at the tops of HW’s 03, 04, 05.The data is from the Credit data set in the ISLR2 R package. It is a simulated data set of 400 credit card customers.
Rows: 400
Columns: 11
$ Income <dbl> 14.891, 106.025, 104.593, 148.924, 55.882, 80.180, 20.996, 7…
$ Limit <dbl> 3606, 6645, 7075, 9504, 4897, 8047, 3388, 7114, 3300, 6819, …
$ Rating <dbl> 283, 483, 514, 681, 357, 569, 259, 512, 266, 491, 589, 138, …
$ Cards <dbl> 2, 3, 4, 3, 2, 4, 2, 2, 5, 3, 4, 3, 1, 1, 2, 3, 3, 3, 1, 2, …
$ Age <dbl> 34, 82, 71, 36, 68, 77, 37, 87, 66, 41, 30, 64, 57, 49, 75, …
$ Education <dbl> 11, 15, 11, 11, 16, 10, 12, 9, 13, 19, 14, 16, 7, 9, 13, 15,…
$ Own <fct> No, Yes, No, Yes, No, No, Yes, No, Yes, Yes, No, No, Yes, No…
$ Student <fct> No, Yes, No, No, No, No, No, No, No, Yes, No, No, No, No, No…
$ Married <fct> Yes, Yes, No, No, Yes, No, No, No, No, Yes, Yes, No, Yes, Ye…
$ Region <fct> South, West, West, West, South, South, East, West, South, Ea…
$ Balance <dbl> 333, 903, 580, 964, 331, 1151, 203, 872, 279, 1350, 1407, 0,…
Features (another name for predictors)
Income: Annual income (in 1000’s of US dollars)Rating: Credit RatingOutcome
Limit: Credit limitThe multiple linear regression model assumes
\[ Y|X_1, X_2, \ldots, X_p \sim N(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_p X_p, \sigma_{\epsilon}^2) \]
At any combination of the predictors, the mean value of the response \(Y\), is
\[ \mu_{Y|X_1, \ldots, X_p} = \beta_0 + \beta_1 X_{1} + \beta_2 X_2 + \dots + \beta_p X_p \]
Using multiple linear regression, we can estimate the mean response for any combination of predictors
\[ \hat{Y} = \hat{\beta}_0 + \hat{\beta}_1 X_{1} + \hat{\beta}_2 X_2 + \dots + \hat{\beta}_p X_{p} \]
Based on out analysis goals, we will use a multiple linear regression model of the following form
\[ \begin{aligned}\hat{\text{Limit}} ~ = \hat{\beta}_0 & + \hat{\beta}_1 \text{Rating} + \hat{\beta}_2 \text{Income} \end{aligned} \]
\[ \begin{align}\hat{\text{Limit}} = -532.471 &+14.771 \times \text{Rating}\\ & -0.557 \times \text{Income} \end{align} \]
\[ \begin{aligned}\hat{\text{Limit}} ~ = \hat{\beta}_0 & + \hat{\beta}_1 \text{Rating} + \hat{\beta}_2 \text{Income} + \hat{\beta}_3\text{Rating}\times\text{Income} \end{aligned} \]
Complete Activity
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -579.85561 | 37.48489 | -15.46905 | 0.00000 |
| Income | 1.81240 | 0.86962 | 2.08414 | 0.03779 |
| Rating | 14.87125 | 0.11375 | 130.73507 | 0.00000 |
| Income:Rating | -0.00221 | 0.00134 | -1.65140 | 0.09945 |
\[ \begin{aligned}\hat{\text{Limit}} ~ = & -579.85561 + 14.87125~\text{Rating} + 1.81240~\text{Income}\\ & \qquad- 0.00221~\text{Rating}\times\text{Income} \end{aligned} \]
Rating the slope of Income is \((1.81240 - 0.0021\times\text{Rating})\)Income the slope of Rating is \((14.87125 - 0.0021\times\text{Income})\)Credit_int <- Credit |>
mutate(Interaction = Income * Rating)
Credit_int |>
select(Limit, Income, Rating, Interaction) |>
head() |>
kable()| Limit | Income | Rating | Interaction |
|---|---|---|---|
| 3606 | 14.891 | 283 | 4214.153 |
| 6645 | 106.025 | 483 | 51210.075 |
| 7075 | 104.593 | 514 | 53760.802 |
| 9504 | 148.924 | 681 | 101417.244 |
| 4897 | 55.882 | 357 | 19949.874 |
| 8047 | 80.180 | 569 | 45622.420 |
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -579.85561 | 37.48489 | -15.46905 | 0.00000 |
| Income | 1.81240 | 0.86962 | 2.08414 | 0.03779 |
| Rating | 14.87125 | 0.11375 | 130.73507 | 0.00000 |
| Interaction | -0.00221 | 0.00134 | -1.65140 | 0.09945 |
Introduced multiple linear regression
Interpreted coefficients in the multiple linear regression model
Calculated predictions and associated intervals for multiple linear regression models
Used interaction terms