library(tidyverse)
library(broom)
library(ggformula)
library(mosaic)
library(knitr)
heart_disease <- read_csv("framingham.csv") |>
select(totChol, TenYearCHD) |>
drop_na() |>
mutate(high_risk = TenYearCHD) |>
select(totChol, high_risk)AE 18: Logistic regression introduction
Open RStudio and create a subfolder in your AE folder called “AE-18”.
Go to the Canvas and locate your
AE-18assignment to get started.Upload the
ae-18.qmdandframingham.csvfiles into the folder you just created. The.qmdand PDF responses are due in Canvas. You can check the due date on the Canvas assignment.
Packages
Data: Framingham study
This data set is from an ongoing cardiovascular study on residents of the town of Framingham, Massachusetts. We want to use the total cholesterol to predict if a randomly selected adult is high risk for heart disease in the next 10 years.
high_risk:- 1: High risk of having heart disease in next 10 years
- 0: Not high risk of having heart disease in next 10 years
totChol: total cholesterol (mg/dL)
Exercise 1
Generate a plot, table of high_risk, and a plot to visualize the relationship between high_risk and totChol. Hint: none of these should be scatterplots.
Exercise 2
Generate a scatterplot of totChol vs. high_risk. Use gf_lm() to add a line to your plot. Do you think this is a good model? Why or why not?
Exercise 3
State whether a linear regression model or logistic regression model is more appropriate for each scenario.
Use age and education to predict if a randomly selected person will vote in the next election.
Use budget and run time (in minutes) to predict a movie’s total revenue.
Use age and sex to calculate the probability a randomly selected adult will visit St. Lukes in the next year.
Exercise 4
Using the table you generated in Exercise 1:
Based on our data, what is considered “success”?
What is the probability a randomly selected person in the study is not high risk for heart disease in the next 10 years?
What are the odds a randomly selected person in the study is not high risk for heart disease in the next 10 years?
What are the log-odds a randomly selected person in the study is not high risk for heart disease in the next 10 years?
Submission
To submit the AE:
- Render the document to produce the PDF with all of your work from today’s class.
- Upload your QMD and PDF files to the Canvas assignment.