library(tidyverse)
library(ggformula)
library(yardstick)
library(Stat2Data)
library(mosaic)
library(broom)
library(knitr)
library(patchwork) #arrange plots in a gridAE 08: Transformations
County Health
Data
The data set for this assignment is from the Stat2Data R package which is the companion package for this course’s textbook. The data was originally generated by the American Medical Association and concerns the availability of health care in counties in the United States. You can find information here by searching for the County Health Resources dataset.
data("CountyHealth") # Loads the data from the packageIt is relatively easy to count the number of hospitals a county has, whereas counting the number of doctors is much more difficult. We’d like to build a linear model to predict the number of doctors, contained the variable MDs, from the number of hospitals, Hospitals.
Exercise 1
Use a visualization and to argue why a simple linear regression model would not be appropriate for the data if we do not make any transformations.
Exercise 2
Use visualizations to find the best model that involves at least one log transformation.
Exercise 3
Fit that model and write the equation that describes it.
Exercise 4
Use visualizations to find the best model the involves at least one square root transformation. Hint: you can create square root transformations in the same way you can make log transformations by replacing log with sqrt.
Exercise 5
Fit that model and write the equation that describes it.
Exercise 6 (Time Permitting)
Is the model you created in exercise 3 or exercise 5 a “better” model? Use residuals and evaluation metrics to make your argument. Hint: you want to look at the residuals of the transformed models but look at the evaluation metrics when you transform your predictions back to the original scale.
To submit the AE:
- Render the document to produce the PDF with all of your work from today’s class.
- Upload your QMD and PDF files to the Canvas assignment.