Monday, October 14, 2013

Language stuff - lexical density

Words in a text can be divided into content words and function words. Content words refer to some object, action or other non-linguistic meaning. The sort of words that are content words are nouns, adjectives, adverbs and most verbs. These are "open classes" of words - that is, it's possible to invent more content words. If I said "The abdef ghijk was mnoping stuvly over there", even though you'd never come across a sentence like that, you'd probably conclude that "ghijk" was a thing (noun), of which you can get "abdef" ones (adjective) "to mnope" was a verb which it is possible to do "stuvly" (adverb). If someone feels like drawing a picture of this happening, I'll include it in this post!

Function words are words that have little meaning in themselves, but express grammatical relationships with other words in the sentence. In the sentence above, "the", "was" and "over there" are all function words. They include conjunctions, prepositions, modal and auxiliary verbs, and pronouns. These are all "closed classes" - that is, there isn't "space" in the English language to add new ones, Dr. Dan Streetmentioner notwithstanding. All the new words that come along are content words, not function words.

It is possible to work out the proportion of content words compared to the total number of words. This is the lexical density. Different sorts of texts will have different lexical densities. On our OU course, we used the Longman Student Grammar of Spoken and Written English. This is a descriptive grammar (in other words, it described how English was used, rather than saying how it ought to be used), and analysed four different styles of discourse. Based on large corpora, it gave the lexical density of different sorts of discourse as:

  • Conversation - 35%
  • Fiction - 47%
  • Academic prose - 51%
  • News - 54%
Conversation is low for several reasons. The first is that, unlike written discourse, in most conversations, there is a shared context. This means that it's possible to use pronouns to a greater extent than nouns, for example. Also, conversation is improvised to a greater extent than written discourse. This means that there are likely to be dysfluencies - such as hesitators and repetition - which have the effect of decreasing the lexical density. As part of the OU course, I did my own analysis of lyrics from pop records. Their lexical density turns out to be almost exactly the same as that of fiction.


Unknown said...

When you say there isn't space in English to add new functional words, is that a peculiar feature of English? Because it has function words that weren't in Latin ("the", for example). Is it functionally complete in a way that Latin wasn't?

And Greek had moods that seem to be evaporating from English (subjunctive, optative); was Greek functionally over-endowed?

Also, there's an actively perceived pronoun gap in English, for a third person singular pronoun which doesn't require you to assign gender.

Paul Fernandez said...

It's not so much that you can't conceive of extra words - it's more that there are a finite number of functional slots. The precise function words of a language are, I guess, unique - though I can't say I'm familiar enough with other languages to say what they are elsewhere, or if the same pattern holds good.

Conversely, for adverbs, lexical verbs, nouns and adjectives, you can keep on inventing indefinitely.