Selection + Transformations
Dec 02, 2024
📋 AE 24 - Logistic Regression: Selection and Comparison
Investigating interaction terms in logistic regression models
Comparing logistic regression models
Choosing logistic regression models
Transformations in logistic regression
This data set is from an ongoing cardiovascular study on residents of the town of Framingham, Massachusetts. We want to examine the relationship between various health characteristics and the risk of having heart disease.
TenYearCHD:
age: Age at exam time (in years)
education: 1 = Some High School, 2 = High School or GED, 3 = Some College or Vocational School, 4 = College
heart_disease <- read_csv("data/framingham.csv") |>
select(TenYearCHD, age, education, male, diaBP) |>
drop_na() |>
mutate(education = as.factor(education),
male = as.factor(male))
heart_disease |> head() |> kable()| TenYearCHD | age | education | male | diaBP |
|---|---|---|---|---|
| 0 | 39 | 4 | 1 | 70 |
| 0 | 46 | 2 | 0 | 81 |
| 0 | 48 | 1 | 1 | 80 |
| 1 | 61 | 3 | 0 | 95 |
| 0 | 46 | 3 | 0 | 84 |
| 0 | 43 | 2 | 0 | 110 |
Intuitively: There is an interaction between two variables when the influence of one variables on the log-odds depends on what the value of the other variable is.
How do we interpret this?
Complete Exercise 2.
Based on this plot, do you think there is an interaction between age and education?
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -6.358 | 0.428 | -14.865 | 0.000 |
| age | 0.085 | 0.008 | 10.883 | 0.000 |
| male1 | 1.308 | 0.583 | 2.245 | 0.025 |
| age:male1 | -0.014 | 0.011 | -1.342 | 0.180 |
Two model’s in one!
Complete Exercises 3 and 4.
\[ \begin{align} &AIC = - 2 * \log L - {\color{purple}n\log(n)}+ 2(p +1)\\[5pt] &BIC =- 2 * \log L - {\color{purple}n\log(n)} + \log(n)\times(p+1) \end{align} \]
glance() functionLet’s look at the AIC for model we fit earlier
Let’s compare a few models:
age_model <- glm(TenYearCHD ~ age, data = heart_disease, family = "binomial")
age_male_model <- glm(TenYearCHD ~ age+male, data = heart_disease, family = "binomial")
age_education_model <- glm(TenYearCHD ~ age+education, data = heart_disease, family = "binomial")
age_male_education_model <- glm(TenYearCHD ~ age+ male + education, data = heart_disease, family = "binomial")
age_education_interaction_model <- glm(TenYearCHD ~ age + education + age* education, data = heart_disease, family = "binomial")
all_interaction_model <- glm(TenYearCHD ~ (age + education + male)^2, data = heart_disease, family = "binomial")Let’s compare a few models:
aics <- c(glance(age_model)$AIC,
glance(age_male_model)$AIC,
glance(age_male_interaction_model)$AIC,
glance(age_education_model)$AIC,
glance(age_male_education_model)$AIC,
glance(age_education_interaction_model)$AIC,
glance(all_interaction_model)$AIC)
aics[1] 3310.680 3276.964 3277.159 3310.135 3279.342 3314.815 3285.196
Based on AIC, which model would you choose? Raise you hand if your AIC is better.
Let’s compare the models using BIC
bics <- c(glance(age_model)$BIC,
glance(age_male_model)$BIC,
glance(age_male_interaction_model)$BIC,
glance(age_education_model)$BIC,
glance(age_male_education_model)$BIC,
glance(age_education_interaction_model)$BIC,
glance(all_interaction_model)$BIC)
bics[1] 3323.334 3295.945 3302.468 3341.772 3317.305 3365.433 3367.450
Based on BIC, which model would you choose? Raise your hand if your BIC is better.
Many departures from linearity can be solved with power transformations (e.g. \(X^{power}\))
Look at empirical log-odds plot
Concave down pattern \(\Rightarrow\) transform down (i.e. \(power < 1\))
Concave up pattern \(\Rightarrow\) transform up (i.e. \(power > 1\))
Concave up or down?
Interaction terms in logistic regression models
Comparing logistic regression models
Choosing logistic regression models
Transformations in logistic regression
Can only transform \(X\)
Concave [up/down] \(\Rightarrow\) power-transform [up/down]