Monday, June 06, 2016

A methodology for analysis of pop lyrics - problems

Song lyrics are typically divided into blocks of text, with the blocks corresponding to musical sections of the song – verses, chorus, bridge. These typically consist of more than one sentence, and thus are analogous in rank scale to paragraphs. The first, and possibly most obvious, problem, in analysing song lyrics is repetition - the repetitiveness is one of the distinctive features of song lyrics. It has the effect of significantly increasing the word count without adding to the semantic content. It also has the potential to distort and bias word frequencies, particularly in the case of the most repetitive songs. Previous researchers have adopted different approaches to the issue of repetition. Kreyer andMukherjee chose to keep them all; Petrie,Pennebaker and Sivertsen chose to eliminate the third and subsequent occurences. (Incidentally, that last one is a really interesting paper - "A linguistic analysis of the Beatles".) In my project, I downloaded lyrics from lyric databases, and the files I downloaded generally included all repetitions, as they are written in such a way as to permit people to follow the song from beginning to end. However, I decided to produce a corpus in which I deleted blocks of text that were repeated unchanged in their entirety. Where there were changes, both blocks were kept. The effect of this was to reduce the average number of words per "song" from 343 to 220, a reduction of 35%. The justification that underlay this was that I was interested in exploring linguistic features. Having noted the scale of the repetition, there was little need to explore it further.

A second issue is that the language used in song lyrics often diverges from "standard English", both in terms of word choice and grammar. This means that it's necessary to come to a decision about how to write it down. Should I write "ooh" or "oooh" or "oooooooooh"? Should I stick with the official version, and end up with a range of different versions of a word that is functionally the same? I didn't come to an answer in my original work. I think the best approach is to attempt to standardise as far as possible, but record the extent to which changes were made.

Another issue is that of "definitiveness". The definitive versions of lyrics are likely to be found either on a band's website, or if not electronically, on an album sleeve notes. It is much more common for bands to make their lyrics available in this way than it was thirty years ago. The downside of this is that it takes substantially more research to get hold of the lyrics, compared with raiding a lyrics database. However, most such databases are "widely collaborative" enterprises, with people contributing lyrics as they see fit, which may or may not be subsequently corrected by other people if they contain errors. For my project, I used a specific lyric database as a starting point. However, if the lyrics were unconvincing, I would then check it against other databases or the band's own website if one was available.

