AE 18: Logistic regression introduction

Important
  • Open RStudio and create a subfolder in your AE folder called “AE-18”.

  • Go to the Canvas and locate your AE-18 assignment to get started.

  • Upload the ae-18.qmd and framingham.csv files into the folder you just created. The .qmd and PDF responses are due in Canvas. You can check the due date on the Canvas assignment.

Packages

library(tidyverse)
library(broom)
library(ggformula)
library(mosaic)
library(knitr)

heart_disease <- read_csv("framingham.csv") |>
  select(totChol, TenYearCHD) |>
  drop_na() |>
  mutate(high_risk = TenYearCHD) |>
  select(totChol, high_risk)

Data: Framingham study

This data set is from an ongoing cardiovascular study on residents of the town of Framingham, Massachusetts. We want to use the total cholesterol to predict if a randomly selected adult is high risk for heart disease in the next 10 years.

  • high_risk:
    • 1: High risk of having heart disease in next 10 years
    • 0: Not high risk of having heart disease in next 10 years
  • totChol: total cholesterol (mg/dL)

Exercise 1

Generate a plot, table of high_risk, and a plot to visualize the relationship between high_risk and totChol. Hint: none of these should be scatterplots.

Exercise 2

Generate a scatterplot of totChol vs. high_risk. Use gf_lm() to add a line to your plot. Do you think this is a good model? Why or why not?

Exercise 3

State whether a linear regression model or logistic regression model is more appropriate for each scenario.

  1. Use age and education to predict if a randomly selected person will vote in the next election.

  2. Use budget and run time (in minutes) to predict a movie’s total revenue.

  3. Use age and sex to calculate the probability a randomly selected adult will visit St. Lukes in the next year.

Exercise 4

Using the table you generated in Exercise 1:

  1. Based on our data, what is considered “success”?

  2. What is the probability a randomly selected person in the study is not high risk for heart disease in the next 10 years?

  3. What are the odds a randomly selected person in the study is not high risk for heart disease in the next 10 years?

  4. What are the log-odds a randomly selected person in the study is not high risk for heart disease in the next 10 years?

Submission

Important

To submit the AE:

  • Render the document to produce the PDF with all of your work from today’s class.
  • Upload your QMD and PDF files to the Canvas assignment.