Nov 13, 2024
📋 AE 21 - Inference for Logistic Regression Models
This data set is from an ongoing cardiovascular study on residents of the town of Framingham, Massachusetts. We want to examine the relationship between various health characteristics and the risk of having heart disease.
TenYearCHD:
age: Age at exam time (in years)
What’s wrong with this code?
Using age:
Hypotheses: \(H_0: \beta_1 = 0 \hspace{2mm} \text{ vs } \hspace{2mm} H_a: \beta_1 \neq 0\)
Test Statistic: \[z = \frac{\hat{\beta}_1 - 0}{SE_{\hat{\beta}_1}}\]
\(z\) is sometimes called a Wald statistic and this test is sometimes called a Wald Hypothesis Test.
P-value: \(P(|Z| > |z|)\), where \(Z \sim N(0, 1)\), the Standard Normal distribution
We can calculate the C% confidence interval for \(\beta_1\) as the following:
\[ \Large{\hat{\beta}_1 \pm z^* SE_{\hat{\beta}_1}} \]
where \(z^*\) is calculated from the \(N(0,1)\) distribution
Note
This is an interval for the change in the log-odds for every one unit increase in \(x\)
The change in odds for every one unit increase in \(x_1\).
\[ \Large{\exp\{\hat{\beta}_1 \pm z^* SE_{\hat{\beta}_1}\}} \]
Interpretation: We are \(C\%\) confident that for every one unit increase in \(x_1\), the odds multiply by a factor of \(\exp\{\hat{\beta}_1 - z^* SE_{\hat{\beta}_1}\}\) to \(\exp\{\hat{\beta}_1 + z^* SE_{\hat{\beta}_1}\}\), holding all else constant.
age| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | -5.561 | 0.284 | -19.599 | 0 | -6.124 | -5.011 |
| age | 0.075 | 0.005 | 14.178 | 0 | 0.064 | 0.085 |
Hypotheses:
\[ H_0: \beta_{age} = 0 \hspace{2mm} \text{ vs } \hspace{2mm} H_a: \beta_{age} \neq 0 \]
age| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | -5.561 | 0.284 | -19.599 | 0 | -6.124 | -5.011 |
| age | 0.075 | 0.005 | 14.178 | 0 | 0.064 | 0.085 |
Test statistic:
\[z = \frac{0.0747 - 0}{0.00527} \approx 14.178\]
Note: rounding errors!
age| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | -5.561 | 0.284 | -19.599 | 0 | -6.124 | -5.011 |
| age | 0.075 | 0.005 | 14.178 | 0 | 0.064 | 0.085 |
P-value:
\[ P(|Z| > |14.178|) \approx 0 \]
age| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | -5.561 | 0.284 | -19.599 | 0 | -6.124 | -5.011 |
| age | 0.075 | 0.005 | 14.178 | 0 | 0.064 | 0.085 |
Conclusion:
The p-value is very small, so we reject \(H_0\). The data provide sufficient evidence that age is a statistically significant predictor of whether someone will develop heart disease in the next 10 years.
age| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | -5.561 | 0.284 | -19.599 | 0 | -6.124 | -5.011 |
| age | 0.075 | 0.005 | 14.178 | 0 | 0.064 | 0.085 |
We are 95% confident that for each additional year of age, the change in the log-odds of someone developing heart disease in the next 10 years is between 0.064 and 0.085.
We are 95% confident that for each additional year of age, the odds of someone developing heart disease in the next 10 years will increase by a factor of \(\exp(0.064) \approx 1.077\) to \(\exp(0.085)\approx 1.089\).
Complete Exercises 1-4.