library(tidyverse)
library(broom)
library(ggformula)
library(mosaic)
library(knitr)
heart_disease <- read_csv("data/framingham.csv") |>
select(totChol, TenYearCHD, age, BMI, cigsPerDay, heartRate) |>
drop_na()AE 19: Logistic regression introduction
Open RStudio and create a subfolder in your AE folder called “AE-19”.
Go to the Canvas and locate your
AE-19assignment to get started.Upload the
ae-19.qmdandframingham.csvfiles into the folder you just created. The.qmdand PDF responses are due in Canvas. You can check the due date on the Canvas assignment.
Packages
Data: Framingham study
This data set is from an ongoing cardiovascular study on residents of the town of Framingham, Massachusetts. We want to predict if a randomly selected adult is high risk for heart disease in the next 10 years.
Response variable
TenYearCHD:- 1: Patient developed heart disease within 10 years of exam
- 0: Patient did not develop heart disease within 10 years of exam
What’s my predictor variable?
Based on your group, use the following as your predictor variable.
- Group 1 -
totChol: total cholesterol (mg/dL) - Group 2 -
BMI: patient’s body mass index - Group 3 -
cigsPerDay: number of cigarettes patient smokes per day - Group 4 -
heartRate: Heart rate (beats per minute)
Exercise 0
Generate a plot of TenYearCHD vs. your groups predictor variable. Based on this plot, what do you think the relationship between this variable and TenYearCHD is?
Exercise 1
Fit a logistic regression model predicting the probability of developing heart disease within the next 10 years from your assigned predictor. Have your reporter write your model on the white board in both the logistic and probability form. Interpret the coefficient of your predictor in context.
Exercise 2
Look at the first row in heart_disease, what log-odds and probability would you predict for this observation. Find your response variable and plug it into the model you just wrote down. Only use the exp function along with addition, subtraction, multiplication, and division to compute your estimate.
Exercise 3
Recompute the probability from Exercise 2 but ADD 1 to your explanatory variable. Do it again, but ADD an additional 1 (total of 2) to your explanatory variable. How much did the probability change in the first time? What about the second time? What does this tell us about the interpretation of our coefficients as they relate to the predicted probabilities?
Exercise 4
- Use
predictto generate a vector of predicted probability for the whole data set. - Use
mutateto add this vector of predicted probabilities to the original data set. - Plot the predicted probabilities against your explanatory variable.
Submission
To submit the AE:
- Render the document to produce the PDF with all of your work from today’s class.
- Upload your QMD and PDF files to the Canvas assignment.