library(tidyverse)
library(ggformula)
library(yardstick)
library(Stat2Data)
library(mosaic)
library(broom)
library(knitr)
library(patchwork) #arrange plots in a gridAE 09: Outliers
Baseball game times
Data
The data set for this assignment is from the Stat2Data R package which is the companion package for this course’s textbook. The data contains data from all MLB games played on August, 11, 2017. On this day there were no extra-innings games or rain delays. You can find information here by searching for the Baseball Game TImes of One Day in 2017 dataset.
data("BaseballTimes2017") # Loads the data from the packageWe are interested in predicting Time the time in minutes to play the game, from either Runs, the number of runs scores by the two teams combined, or Pitchers, the number of pitchers sued total for the two teams.
Exercise 1
Argue that Runs is a better predictor of Time than Pitchers.
Exercise 2
Argue whether you think the CIN-MIL game would be considered a high leverage and/or high influence point.
Exercise 3
Remove the CIN-MIL game from the data set. Which model is better now?
To submit the AE:
- Render the document to produce the PDF with all of your work from today’s class.
- Upload your QMD and PDF files to the Canvas assignment.