02:00
Sep 18, 2024
What are the advantages of using simulation-based inference methods? What are the advantages of using inference methods based on mathematical models?
Under what scenario(s) would you prefer to use simulation-based methods? Under what scenario(s) would you prefer to use methods based on mathematical models?
02:00
# load packages
library(tidyverse) # for data wrangling and visualization
library(ggformula) # for plotting using formulas
library(broom) # for formatting model output
library(scales) # for pretty axis labels
library(knitr) # for pretty tables
library(patchwork) # arrange plots
# HEB Dataset
heb <- read_csv("data/HEBIncome.csv") |>
mutate(Avg_Income_K = Avg_Household_Income/1000)
# set default theme and larger font size for ggplot2
ggplot2::theme_set(ggplot2::theme_bw(base_size = 20))Letโs think about variation:
| term | df | sumsq | meansq | statistic | p.value |
|---|---|---|---|---|---|
| Avg_Income_K | 1 | 17175.06 | 17175.0595 | 56.32026 | 0 |
| Residuals | 35 | 10673.37 | 304.9535 | NA | NA |
\[ r = \frac{\sum(x_i - \bar{x})(y_i-\bar{y})}{\sqrt{\sum(x_i-\bar{x})^2\sum(y_i-\bar{y})^2}} = \frac{\sum(x_i - \bar{x})(y_i-\bar{y})}{s_xs_y} \]
What indicates a good model fit? Higher or lower \(R^2\)? Higher or lower RMSE?
rsq() from yardstick package using the augmented data:๐ณ๏ธ Discussion
The \(R^2\) of the model for Number_Organic from Average_Income_K is 61.7%. Which of the following is the correct interpretation of this value?
Avg_Income_K correctly predicts 61.7% of Number_Organic in San Antontio HEBs.Number_Organic can be explained by Avg_Income_K.Avg_Income_K can be explained by Number_Organic.Number_Organic can be predicted by Avg_Income_K.In groups, at the board, design a simulation-based procedure for producing a p-value for the following hypothesis test.
Ranges between 0 (perfect predictor) and infinity (terrible predictor)
Same units as the response variable
Calculate with rmse() from yardstick package using the augmented data:
# A tibble: 1 ร 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 rmse standard 17.0