library(tidyverse)
library(ggformula)
library(broom)
library(knitr)
library(patchwork) #arrange plots in a gridAE 06: Model Conditions
Songs on Spotify
Open RStudio and create a subfolder in your AE folder called “AE-06”
Go to the Canvas and locate your
AE 06assignment to get started.Upload the
ae-06.qmdandspotify-popular.csvfiles into the folder you just created. The.qmdand PDF responses are due in Canvas no later than Monday, September 16 at 11:59pm.
Introduction
This is a continuation of AE-05. The Data section below is the same as in that exercise. Feel free to skip it if you feel you remember everything about the data set or simply use it as a reference when needed.
Data
The data set for this assignment is a subset from the Spotify Songs Tidy Tuesday data set. The data were originally obtained from Spotify using the spotifyr R package.
It contains numerous characteristics for each song. You can see the full list of variables and definitions here. This analysis will focus specifically on the following variables:
| variable | class | description |
|---|---|---|
| track_id | character | Song unique ID |
| track_name | character | Song Name |
| track_artist | character | Song Artist |
| track_popularity | double | Song Popularity (0-100) where higher is better |
| energy | double | Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy. |
| valence | double | A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). |
spotify <- read_csv("data/spotify-popular.csv")Are high energy songs more positive? To answer this question, we’ll analyze data on some of the most popular songs on Spotify, i.e. those with track_popularity >= 80. We’ll use linear regression to fit a model to predict a song’s positiveness (valence) based on its energy level (energy).
Below are plots as part of the exploratory data analysis.
p1 <- gf_histogram(~valence, data = spotify) |>
gf_labs(title = "Distribution of Valence",
subtitle = " for Popular songs on Spotify",
x = "Valence")
p2 <- gf_histogram(~energy, data = spotify) |>
gf_labs(title = "Distribution of Energy",
subtitle = "for Popular songs on Spotify",
x = "Energy")
p1 + p2 # The patchwork package will arrange your plots for you
gf_point(valence ~ energy, data = spotify) |>
gf_labs(title = "Valence vs. Energy",
subtitle = "Popular songs on Spotify",
x = "Energy",
y = "Valence")
Exercise 0
Fit a model using the energy of a song to predict its valence.
## add codeExercise 1
Last time, we were too hasty in performing statistical inference. We need to make sure you data satifies the four conditions of inference. Let’s check those model conditions now. Fill in the code below to use the augment() function to create a new data frame containing the residuals and fitted values (among other information)/
Note: Remove #|eval: false from the code chunk after you have filled in the code.
spotify_aug <- augment(_____)Exercise 3
Make a plot of the residual vs. fitted values.
# add code hereExercise 4
Fill in the code to make a histogram of the residuals and a normal QQ-plot.
resid_hist <- gf_histogram(~____, data = ____)) |>
gf_labs(x = "_____",
y = "_____",
title = "____")
resid_qq <- gf_qq(~____, data = ____) |>
gf_qqline() |>
gf_labs(x = "_____",
y = "_____",
title = "____")
resid_hist + resid_qqExercise 5
Assess the four model conditions. Use the plots from the previous exercises to help make the assessment.
- Linearity
- Constant variance
- Normality
- Independence
To submit the AE:
- Render the document to produce the PDF with all of your work from today’s class.
- Upload your QMD and PDF files to the Canvas assignment.