CONCLUSION

The overall aim of this project was to explore different ways of modelling musical preferences. Music is undeniably an important part of human culture. Our musical preferences can be quite important components and reflections of our identities and sense of self. However, the main ways in which we tend to characterize musical preferences (i.e., using genre labels) are extremely subjective and volatile. Scientific research in particular would benefit from quantitative, genre-free ways of operationalizing and describing individual musical preferences. The introduction of this project gives an overview of two alternative models of musical preference which attempt to do just that. The first describes musical preferences in terms of three dimensions: valence, arousal, and depth. The second uses five dimensions: mellow, unpretentious, sophisticated, intense, and contemporary. This project draws from this prior literature and works utilizing these models to explore the characterization of user musical preferences from publicly available music listening data.

The data for this project were acquired from two main sources. The first source was the Taste Profiles subset of the Million Song Dataset (MSD). This dataset includes over 48 million observations of user-song-playcount data. This was an interesting dataset to use to explore musical preferences because it includes over 1 million unique users and almost 400,000 unique songs. Additionally, the "play counts" variable may be used as a naturalistic analogue for a preference measure. This dataset was supplemented by pulling the song metadata from the Spotify Web API. The Spotify Web API provided a large and descriptive variety of audio features for each track. This included basic metadata information like song duration, year of release, and song and artist popularity scores. It also provided quantitative variables describing the style of the song, such as its valence, energy level, danceability, how instrumental it sounded, and others. This information was one of the main reasons to use the Spotify API to supplement the MSD data. Some of the variables provided by the Spotify API were quite similar to the dimensions of the alternative preference models previously mentioned and would be helpful when transforming the data out of the genre space.

One of the main challenges faced throughout this project was data wrangling and cleaning, and the quantity of data that was collected. To start, due to project time constraints and the rate limit of the Spotify API, Spotify data was not collected for the entire MSD subset. Spotify data were collected for about 10 million out of the 48 million observations in the MSD. While it would have been nice to have the whole dataset, this wasn't necessarily a problem because most of the machine learning methods attempted would not run on a regular laptop with this amount of data anyways. Data had to be subsampled substantially in order to produce results within a reasonable timeframe. Additionally, some of the data from the Spotify API was difficult to use. The genre information returned for a song was an entire list of applicable genres, instead of just the one or two most relevant. This also wasn't taxonomized in any way, so it was often difficult to relate niche subgenres to their overarching style for the purposes of comparison or finding broader patterns across the data. Despite these challenges, the goals of this project were to try to compare different ways of characterizing the users' preferences, comparing the various attributes gathered in the data. Specific research questions and answers can be found at the bottom of the introduction section.

The first half of the project used methods to investigate underlying patterns or associations in the data. The first main goal was to reveal some underlying structure in the data through the audio feature variables given by Spotify. The hypothesis was that perhaps these features would reveal an underlying structure or grouping than went beyond just a categorization by genre. The results supported that songs could be grouped together according to similarities/differences in these high-level descriptive attributes. This suggested that music can be effectively described by some set of more objective, high-level characteristics rather than the subjective genre labels that are frequently used. The second goal was to reveal networks in the data based on the different genres that users listened to. For instance, did listening to certain genres imply listening to certain others? Some frequent patterns of style preferences/music tastes were found that occurred across users in the data. However, these results had some limitations. For example, patterns tended to be heavily skewed towards broader (and thus more frequently used) genre labels.

The second half of this project used methods to attempt to distinguish user preferred versus user not-preferred songs. A song's playcount was used to infer if it was preferred by that user. Songs with higher playcounts were considered to be preferred and those with lower playcounts were considered to be not preferred. Attributes from the data, such as the song's popularity, its year of release, key, duration, and other information were explored was a way to predict the preference label. Following the prior work discussed in the introduction, a "preference distance" metric was also calculated. This was a numeric variable that described how similar a given song was to what that user listened to overall. Additionally, various machine learning methods were explored to find the model with the best performance. The preference distance metric, along with the artist and track popularity scores from Spotify were found to be the most useful in distinguishing preferred versus not preferred songs. Specifically, preference distance was found to be a better metric if it took into account more instead of fewer song attributes. Ultimately, a non-linear support vector machine classifier achieved the highest performance of all the models tested. This model was able to predict whether a song was preferred or not with 61% accuracy. While this surpasses the baseline of chance (50%) performance, there is still significant room for improvement of these methods.