Oct 07, 2024
Today’s data is a sample of 50 loans made through a peer-to-peer lending club. The data is in the loan50 data frame in the openintro R package.
# A tibble: 50 × 3
annual_income verified_income interest_rate
<dbl> <fct> <dbl>
1 59000 Not Verified 10.9
2 60000 Not Verified 9.92
3 75000 Verified 26.3
4 75000 Not Verified 9.92
5 254000 Not Verified 9.43
6 67000 Source Verified 9.92
7 28800 Source Verified 17.1
8 80000 Not Verified 6.08
9 34000 Not Verified 7.97
10 80000 Source Verified 12.6
# ℹ 40 more rows
Predictors:
annual_income: Annual incomeverified_income: Whether borrower’s income source and amount have been verified (Not Verified, Source Verified, Verified)Response: interest_rate: Interest rate for the loan
interest_rate| min | median | max | iqr |
|---|---|---|---|
| 5.31 | 9.93 | 26.3 | 5.755 |
Complete Exercises 1 and 2.
Suppose there is a categorical variable with \(K\) categories (levels)
We can make \(K\) indicator variables - one indicator for each category
An indicator variable takes values 1 or 0
verified_incomeloan50 <- loan50 |>
mutate(
not_verified = if_else(verified_income == "Not Verified", 1, 0),
source_verified = if_else(verified_income == "Source Verified", 1, 0),
verified = if_else(verified_income == "Verified", 1, 0)
)
loan50 |>
select(verified_income, not_verified, source_verified, verified) |>
slice(1, 3, 6)# A tibble: 3 × 4
verified_income not_verified source_verified verified
<fct> <dbl> <dbl> <dbl>
1 Not Verified 1 0 0
2 Verified 0 0 1
3 Source Verified 0 1 0
Complete Exercises 3 & 4
verified_income| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 9.541 | 1.006 | 9.487 | 0.000 |
| verified_incomeSource Verified | 2.224 | 1.440 | 1.544 | 0.129 |
| verified_incomeVerified | 6.312 | 1.836 | 3.437 | 0.001 |

verified_income and interest_rate correlated?03:00
\[ \begin{align}\hat{\text{interest_rate}} = 9.541 &+ 2.224 \times \text{source_verified}\\ &+ 6.312 \times \text{verified} \end{align} \]
Complete Exercise 5-6.
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 11.388 | 1.352 | 8.423 | 0.000 |
| annual_income_k | -0.022 | 0.011 | -1.974 | 0.054 |
| verified_incomeSource Verified | 2.171 | 1.398 | 1.553 | 0.127 |
| verified_incomeVerified | 6.792 | 1.799 | 3.776 | 0.000 |
annual_income_k is -0.022 regardless of verified_income levelverified_income corresponds to a shift in the intercept
Not Verified is 11.388Source Verified shift intercept up 2.171
Verified shift intercept up 6.792 from Not Verified
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 10.303 | 1.897 | 5.432 | 0.000 |
| annual_income_k | -0.009 | 0.019 | -0.471 | 0.640 |
| verified_incomeSource Verified | 3.423 | 2.534 | 1.351 | 0.184 |
| verified_incomeVerified | 9.788 | 3.652 | 2.680 | 0.010 |
| annual_income_k:verified_incomeSource Verified | -0.015 | 0.026 | -0.591 | 0.558 |
| annual_income_k:verified_incomeVerified | -0.031 | 0.033 | -0.961 | 0.342 |
annual_income_k depends on verified_income levelverified_income\[ \begin{aligned} \hat{interest\_rate} &= 910.303 - 0.009 \times annual\_income\_k \\ &+ 3.423 \times SourceVerified + 9.788 \times Verified \\ &- 0.015 \times annual\_income\_k \times SourceVerified\\ &- 0.031 \times annual\_income\_k \times Verified \end{aligned} \]
Complete Exercise 7