Pitchfork Between the Lines

Using Natural Language Processing to Analyze the Artists Most Frequently Mentioned in Pitchfork Reviews 

What is Pitchfork?

Pitchfork is an online music magazine that was launched in 1995. Originally focused on newly released independent music, Pitchfork has grown to become one of the biggest, and arguably, relevant voices in music journalism covering all types of popular music. The site primarily features critical reviews of new and reissued albums. When written well these reviews can provide additional insight into the artist such as the artists influences, historical context surrounding the music as well as how the album fits into broader cultural trends. The review authors will often reference other artists or albums indicating the reference is influential or culturally significant. Using Natural Language Processing this project will measure the artists and albums Pitchfork reviewers reference the most to determine works of cultural significance.

Pitchfork Ratings Overview

I used this Pitchfork review dataset which captures the Pitchfork reviews from 1999 to 2017 as well as additional information around the reviews such as the author of the review, genre classification of the album being reviewed, label the album was released on, year the review was published, etc. As part of the critical review, the author assigns albums a rating on a scale of 1 – 10.

I’ve prepared the below overview of the Pitchfork review dataset. Through analysis of these ratings can be found elsewhere but for the purposes of analyzing the references it’s important to note Pitchfork is primarily focused on Rock music.

 

Identifying Entities in Pitchfork Reviews

I used the Google Natural Language (NL) API to extract entities from the reviews where the text is inspected for proper nouns such as names of individuals or titles of specific pieces of art, as well as common nouns such as music or violin. 

Using Wikipedia URLs to filter for Proper Nouns

Because the focus of this analysis is on proper nouns I needed a method to remove the common nouns as most of the entities returned by the NL API were entities like “guitar” or “album”. If a Wikipedia URL exists for the entity, the NL API returns the URL as part of its results. I found entities which contained a Wikipedia URL tended to be the proper nouns I was looking for so this analysis only includes entities with a Wikipedia URL in the NL API results.

Note: Those familiar with the NL API know that the information also includes a ‘type’ indicator supposedly used to classify if the entity is a person, location, consumer good, work of art, etc. I found the type indicator to be inconsistent in how certain entities were classified so it was not used.

Additional filters were added to remove other items not relevant to this analysis:

  • Artist entities identified in an album review by the same artist. So if Eminem was mentioned in a review for an Eminem album that isn’t included in this analysis.
  • Location entities with “type” location were removed. I planned on doing an additional view with genres and the locations frequently mentioned in reviews but it wasn’t straightforward due to the variety of location types mentioned like entire countries, cities, or individual locations like a venue. Popular cities included Chicago, New York, Brooklyn, London, as well as Detroit for electronic music.
  • Genre entities as they didn’t seem relevant to the analysis. Notable trends included:

↑ Increase in Contemporary R&B references across all genres from 2010 onward.

↓ Decrease in Intelligent Dance Music references in Electronic music reviews from 2007 onward.

 

Comparing Artist Mentions to Rating Scores

Historically the Pitchfork rating scores have been used to measure the artists most favored by Pitchfork. By comparing the rating scores to the artist references we can determine if there is a correlation between how frequently an artist is mentioned and their rating score. I assigned a rank to each artist based on how the artists mentions and scores compared to the average measures across all artists referenced in Pitchfork reviews:

 

Observations on the Artist Mentions to Ratings Comparison

Consistency

  • Approximately half of the artists are consistent in how frequently they were mentioned as well as their average album rating.
  • The majority of the inconsistencies are in the ‘disregarded / acclaimed’ classification where artists received positive reviews but don’t have a high number of mentions. I suspect the largest factor is pitchfork primarily reviews new artists that haven’t been around long enough to earn a reputation as a reference point. It’s perhaps worth looking further into artists with more than 1 or 2 releases to identify the true ‘disregarded’ population.

The ‘significant’ designation is more exclusive than acclaimed

Unsurprisingly significant artists only make up approximately 26% indicating this is a more exclusive set than the ‘acclaimed’ group.

Pitchfork’s Most Referenced Artists Playlists

To further explore I’ve created two playlists:

Pitchfork’s most referenced: features a sample track from all of the artists with above average mentions

Pitchfork’s most unappreciated: features potentially unappreciated artist’s who were not referenced frequently in reviews but whose average album score is 7+.

Copy of the File Used for Analysis

Google Drive Download