Inference Summary Sheet
Choosing which statistical test/CI to use based on variable type
| Variable type | Parameter of interest | Null hypothesis | Test name | Function for test |
|---|---|---|---|---|
| One quantitative variable | \(\mu\) | \(\mu = \mu_0\) | One-sample t-test | t.test(~variable1, mu = mu0, data = data_name) |
| One binary variable | \(p\) or \(\pi\) | \(p = p_0\) | Z-test for one proportion | prop.test(~variable1, p = p0, data = data_name) |
| One quantitative and one binary variable with independent samples | \(\mu_1-\mu_2\) | \(\mu_1 = \mu_2\) | Two-sample t-test | t.test(variable1~variable2, data = data_name) |
| One quantitative and one binary variable with paired samples | \(\mu_d\) | \(\mu_d = 0\) | Paired t-test | t.test(variable1~variable2, paried = TRUE, data = data_name) |
| Two binary variables | \(p_1-p_2\) or \(\pi_1-\pi_2\) | \(p_1=p_2\) | Z-test for a difference in proportions | prop.test(Variable1~variable2, data = data_name) |
| Two quantitative variables | correlation, \(\rho\) or slope \(\beta\) | \(\rho = 0\) or \(\beta = 0\) | Correlation or Regression | cor.test(variable1~variable2, data = data_name) or lm(variable1 ~ variable2, data = data_name) |
| Two categorical variables | NA | 2 variables are independent | Chi-squared test | chisq.test(variable1~variable2, data = data_name) |
For all tests, specify direction of \(H_a\) with the argument: alternative = "greater", "less", or "two.sided".
To construct a Confidence Interval: remove the null hypothesis, and specify conf.level=number_here
Useful code for creating basic figures (requires ggformula)
- Histogram:
gf_histogram( ~ variable, data = data_name) - Boxplot:
gf_boxplot( ~ variable, data = data_name) - Dotplot:
gf_dotplot( ~ variable, data = data_name) - Barchart:
gf_bar( ~ variable, data = data_name) - Side-by-side boxplot:
gf_boxplot( variable1 ~ variable2, data = data_name) - Side-by-side barchart:
gf_bar( ~ variable1, fill = ~ variable2, data = state_data, position=position_dodge()) - Scatterplot:
gf_point(variable1 ~ variable2, data = data_name) - Scatterplot with a third variable as color:
gf_point(variable1 ~ variable2, color = ~ variable3, data = data_name)
Useful code for calculating summary statistics: (requires mosaic)
- Means:
mean( ~ variable, data = data_name)ormean(variable1 ~ variable2, data = data_name) - Counts/Tables:
tally( ~ variable, data = data_name)ortally(variable1 ~ variable2, data = data_name) - 5-number-summary etc:
favstats( ~ variable, data = data_name)orfavstats(variable1 ~ variable2, data = data_name)