Monday, October 28, 2013

"Ender's Game"

We went to watch the film Ender's Game on Friday. I had been looking forward to the film for a long time.

Orson Scott Card published the book Ender's Game in 1985. To get things in historical perspective, this was the year that Windows 1.0 was released. It looked, according to Wikipedia, like this.

The internet was not commercialised until ten years later. In this context, Orson Scott Card was talking about something which behaved like the internet - not just from the point of view of technology, but as a medium of cultural exchange - and also virtual reality.

It would be easy to assume that large sections of the film were 2013 glosses of the 1985 book - but the fact is, Card's view of the future as portrayed in his books was close enough to where we are thirty years later that it is basically recognisable - if you read the book today, there's nothing terribly anachronistic about the world it portrays, and there's much SF that hasn't aged as well as that.

Other reasons that I was impressed by the book were:
  • Rather than a bland philosophical naturalist, or artificially pantheistic world in which everyone lacks any sort of cultural identity, in Card's world, people have national and religious identities, within the context of what seems to be something like a world government. This is reflected in the film.
  • Children are portrayed as morally complex. Probably a bit too morally complex - the idea of Ender being a small child (i.e. about 6!) doesn't make it into the film - I guess he's supposed to be closer to 12. And the language and imagery that shapes the children in the book is omitted as well - the alien race, the "buggers" become "formics", for example. But the book portrays a childhood world that is much closer to reality than what tends to be presented by children to adults. This in itself makes Ender's Game an interesting subject for children's literature.
  • Some of the ideas in the book are really life shaping. Look up Demosthenes' "hierarchy of foreignness." Again, think of how xenophobic and insular the world was in 1985 - Europe was still divided by the Iron Curtain. In this context, Card was making the case that, if other people shared our humanity, or even if inhuman but communication was possible (!), then there ought to be an alternative to war and destruction.
Orson Scott Card is a controversial figure, due to his opinions on homosexuality. How should you deal with a writer who has opinions about issues which you strongly disagree with? Obviously when people are saying things in public, it's reasonable to disagree with them in public - and they need to be prepared to defend their beliefs. However, ought we to disregard someone's teaching on childhood and education because they persuaded the mother of multiple illegitimate children that he fathered to put them into a foundling hospital? And yet, look at the influence Rousseau continues to have on education. Many of Card's perspectives are hugely positive. Is it really not possible to take and affirm the good, whilst opposing the things you consider to be bad?

So, what of the film? Technically, very good. It was a real shame that the complex political and social aspects of the relationship between Valentine and Peter aren't developed - but you have to sacrifice lots when you go from your average book to your average film. And I'm sure the people who read the book as children or teens would love to have seen more games in the battle room. However, what was brought out much more strongly in the film was the climax - not the final battle, but what happens after. I'm trying to avoid spoilers here - but suffice it to say that Card's message about the need for tolerance and almost pacifism, is made clearer in the film than it was in the book.

Friday, October 25, 2013

Agent de-emphasis and naturalism

In my previous post, I talked about three different ways in which English could be used to draw attention away from the subject of a verb - the agent that is carrying out a particular process. These are:
  • short passive verbs;
  • nominalisation;
  • ergative verbs.
I guess my aim in highlighting this is that I'd like to think that an awareness of this would become part of more people's critical thinking repertoire - "It was said..." By whom? "Research has shown..." Who did the research?

Naturalism is, according to the Oxford English Dictionary Online, "the idea or belief that only natural (as opposed to supernatural or spiritual) laws and forces operate in the world." It says that everything in the universe is the outcome of time and chance - the universe itself has no designer; the contents of the universe (including animals and us) don't have a designer either. This is a little bit problematic, because lots of things in the universe look designed. Richard Dawkins coined the term "designoid" to refer to complex objects which are neither simple nor, he believes, designed - or rather not designed by an intelligent agent.

Another way of thinking about naturalism is to talk about telos - a word that comes from Greek, meaning "ultimate purpose or aim". The universe of the naturalist is atelic - it has no ultimate purpose or aim. Specifically, evolution, to a naturalist, is atelic. Any particular outcome of the evolutionary process - whether it's humans, multicellular life or antibiotic resistance - isn't designed, it just happens to arise.

This causes problems when it comes to language use in the context of evolutionary processes. The sort of processes that change stuff in the world are material processes. I listed the possible participants in material processes as being "actor, goal, scope, attribute, client, recipient" - with the key ones being the actor (the participant carrying out the process) and the goal (the participant affected by the process). So:
  • Sam (participant, actor) eats (process, material) some sushi (participant, goal).
But when we come to considering evolutionary processes, grammar really struggles. In Darwin's Doubt, Stephen Meyer gives examples of the way in which neo-Darwinist writers use a "word salad" to make up scientific-sounding phrases effectively as "just-so stories" to explain how evolution must have occurred. But it's worthwhile looking at these phrases from a grammar point of view as well.

Meyer gives examples of people talking about exons being "recruited" or "donated". These are short passives - remember above that the short passive is a form that allows agent de-emphasis. So, who or what is the actor associated with these processes? The same with "radical change in the structure" - here we have a nominalisation ("change") - again, the question that is begged is who or what has changed the structure? The actor can only really be "evolution":

  • exons (participant, goal) were recruited (process, material) (by evolution - participant, actor, de-emphasised)
But evolution is not allowed telos. In other contexts, people would squirm if we talked about evolution "doing" something - evolution just happens. And yet, through agent de-emphasis, we can slip in the concept of evolution as the actor in material processes.

The effect of this is that neo-Darwinism smuggles in the idea, and the categories, of purposeful, telic activity through agent de-emphasis. I would suggest that this is misleading - it is difficult to talk about evolutionary processes as being blind and purposeless: however, it's also wrong to use purposeful categories for something which has been defined as purposeless. If it is impossible to work on the basis that evolution is genuinely atelic, then maybe this belief was wrong in the first place.


Thursday, October 24, 2013

Language stuff - process types

Verbs are "doing words". However, as I suggested in my discussion of lexical density, not every verb is a doing word all the time - sometimes verbs behave as function words. When a verb is a lexical verb - that is, when it's describing something that is actually happening, it can also be referred to as a process. In fact, we can divide clauses up into processes, participants and circumstances - and, with a clause being effectively an indivisible quantum of meaning, it usually has exactly one process.

Blerk again. What does that mean? Let's take some of the clauses above and break them down.
  • Verbs (participant) are (process) "doing words". (participant)
  • However (circumstance) not every verb (participant) is (process) a doing word (participant) all the time (circumstance)
  • sometimes (circumstance) verbs (participant) behave (process) as function words. (circumstance)
What happened to my suggestion, you may be asking? We can see, since it has exactly one process, that it is a clause itself:
  • as (conjunction) I (participant) suggested (process) in my discussion of lexical density (circumstance)
However, it doesn't make any sense without the context of the second clause above which surrounded it - hence it is a subordinate clause.

Also, why is "as function words" a circumstance, not a participant? Effectively the preposition followed by the noun is behaving like an adverb - it is describing how the verbs behave, not what they are.

There are different types of processes. In the OU course, we divided processes into five sorts:
  • material - a participant acts upon the material world or is acted upon in some way ("I ate sushi");
  • mental - processes of consciousness and cognition ("We thought it didn't matter");
  • verbal - processes of communication ("I told him so.");
  • relational - being, having, consisting of, locating ("He has no father.");
  • existential - indicating the existence of an entity ("There is a problem").
In grammatical terms, we can talk about subjects, direct and indirect objects and so forth. However, these different types of processes have been assigned different types of participants - it seems to make the whole thing pretty complicated, but in actual fact, when we reflect on what is going on in a sentence, the types of participant associated with a process help to clarify the sort of process we are looking at in some cases. This summary comes from here:

  • Material - actor, goal, scope, attribute, client, recipient
  • Mental - sensor, phenomenon
  • Verbal - sayer, receiver, verbiage
  • Relational - token, value
This provides us with a more comprehensive way of analysing processes.
  • Verbs (participant, token) are (process, relational) "doing words". (participant, value)
  • as (conjunction) (participant, sayer) suggested (process, verbal) to you (participant, receiver) in my discussion of lexical density (circumstance)
As the page I just linked to makes clear, it's also possible to go into more detail about different types of circumstance - but that's quite enough for one blog post!

Wednesday, October 23, 2013

Language stuff - modality

Prior to studying E303, my experience of modal verbs had basically come from learning foreign languages - most specifically, the verbs pouvoir, devoir and vouloir which we learnt in O-level French. I had never been given grammatical categories for the same things in English, although obviously I could see how il peut mapped onto he can; voulez-vous onto do you want, and so on. They work as forms of auxiliary verbs - that is, they don't function as the main process in a sentence.

There are two categories of modal verbs - epistemic, which are modal verbs that relate to the likelihood of something being true, and deontic, which are modal verbs relating to possibility or necessity of action. They can be ranked according to their strength - O'Halloran, in the E303 textbooks, offers the following scale of epistemic modal verbs, from strongest to weakest:
  • will
  • would
  • must (in the sense of "he must be there" - "surely he's there")
  • may
  • might
  • could
  • can
and of deontic modal verbs:
  • has to
  • must (in the sense of "he must do it" - "if he doesn't do it, he's doomed")
  • had better
  • ought
  • should
  • needs to
  • is supposed to
Modal verbs are used differently in different forms of discourse. If we consider conversation, we tend to hedge - that is, we tend to make statements less assertively than if we were writing them down. Strong modality tends to come across as being forceful, and thus rude. There are other means of toning down the modality - for example, by personalising statements - I don't think that's true or even I'm sure that's not true both have less strong modality than That's not true.

In song lyrics, the dominant epistemic modal verbs in the 33000 word corpus I constructed were:
  • will (also counting 'll, I'll, won't) (307 occurrences)
  • can (can't) (290 occurrences)
  • would (I'd, wouldn't) (101 occurrences)
  • could (couldn't) (59 occurrences)
The use of deontic modality is much less common, and the most common verbs were:
  • had to (have to, has to, got to) (51 occurrences)
  • should (33 occurrences)
  • need to (needed to, needs to) (18 occurrences)
  • must (15 occurrences)
The frequency of use of strong deontic modality was very similar to what is found in the fiction corpus. However, in fiction, the use of the verb "must" is much more common than it is in song lyrics, which lean much more on "need to" and "have to".

Tuesday, October 22, 2013

Language stuff - agent de-emphasis

Normally when we think about describing an event, we think in terms of who or what is actually doing it - that is, the agent.
David broke the plate.
We may, for various reasons, not wish to draw attention to the agent. English language allows us several options for doing this. The most obvious one is to use a "short passive":
The plate was broken. 
The passive voice is used, and the person who actually does the breaking is not specified. It is possible to include the agent when using the passive voice:
The plate was broken by David.
but there may be reasons for de-emphasising the agent by omitting it - for example, when the parents come downstairs to discover the reason for a loud noise, a child might choose to draw attention to the fact that the vase has been broken, rather than admitting that it was him rather than the dog that did it.

Another option for agent de-emphasis is the use of nominalisation. This is converting a verb into a noun. I have to come up with a more complicated sentence now, as nominalisation of "to break" will leave it without a process (verb).
David broke the plate. We glued it back together.
We can de-emphasise David by nominalising the verb:
After the breakage of the plate, we glued it back together.
Yes, I know, it's a little artificial, but hopefully you get the idea.

There's a third option, and this is to use what is known as an ergative verb. This is quite subtle. An ergative verb is one that can be transitive or intransitive - that is, it can either be used with a subject and object, or just with a subject - but the object when it is used transitively becomes the subject when it is used intransitively. Blerk. The easiest way of explaining this is to give an example.
The government closed the mines.
Here, "the government" is the subject of the verb, and "the mines" is the object - or to use different grammatical categories, "the government" is the actor and "the mines" is the goal. It's possible to write this using a short passive, as explained above:
The mines were closed (omitting "by the government").
The agent/actor/subject can be omitted, which means we don't need to mention who actually closed the mines. Or we can use a nominalisation, and talk about the closure of the mines - again, the agent disappears. But we have a third option, because "close" here is an ergative verb - that is, the object of the sentence above (the mines) actually becomes the subject if we use the verb intransitively. If we want to use this verb intransitively, then the sentence becomes:
The mines closed.
(rather than "The government closed.") Once again, the agent/actor has disappeared.

There are various reasons why agent de-emphasis might be considered desirable. Those of us who have been using computers for more than ten years probably remember earlier versions of Word for Windows nagging us about using the passive voice. In my case, it was because I was often writing about science - and an aspect of science writing is use of the passive voice - to de-emphasise the person actually doing the work. Using nominalised verbs allows the writer/speaker to increase the lexical density - that is, to convey more information in less space. This is valuable in media where word count and space is at a premium - like journalism.

More significantly, as the example above suggests, there may be political reasons for de-emphasising the agent. And I'd like to return to this in a future post ....

Monday, October 21, 2013

Language stuff - field, tenor, mode

One of the aspects of studying language that interested me is how recently much of linguistic theory has been developed. When I was studying computer science in the late 80s, many of the theoretical foundations were pretty new - Dijkstra's algorithm, for example, that we were taught about in my degree (which is now part of the Further Maths A Level syllabus!) was published in 1959. However, Michael Halliday's seminal book on language An Introduction to Functional Grammar was not even first published until 1985. Both Halliday and Noam Chomsky, perhaps the most famous linguistic theorist, are still alive.

The features of a use of language may be considered by considering its field, tenor and mode. The field is also referred to as the experiential metafunction. It is how language is used to make meaning about the world - in other words, the actual content of what is being communicated.

One might naively think that this was all that language was - the communication of information - but it is more subtle than that. Every language event takes place between one or more participants, and in addition to communicating information, language events are used as part of the process of enacting interpersonal relations. This is tenor, also referred to as the interpersonal metafunction. Additionally, language events can take place in many different forms (conversation, email, a sermon ...), and these are themselves largely detached from both field and tenor. This textual metafunction is the mode.

So, what does this mean in practice? Let's take this blog post.

  • Field - I am attempting to explain, in fairly simple terms, information about the theoretical use of language.
  • Tenor - I'm addressing unknown readers (who are you? Say hello!), but I'm writing in a fairly informal style - I'm assuming that the average reader will just have happened across this, and wants to read something that's engaging, friendly and not too heavy. Frankly, that's how I like communicating anyway, and since this is my space, really written for my own amusement, I guess I do what I want.
  • Mode - a blog post. Written language can be more planned and deliberate than spoken language, for a start - there's a definite structure, and I've assumed in writing it that people will start at the beginning and read through hopefully to the end!
You can imagine changing each of those individually would change the way in which language was used. For example - suppose (field) I was writing about something else, maybe a film review? Or suppose (tenor) I knew that the people reading this were children aged 12? Or suppose (mode) I was delivering this as a talk? Each of the metafunctions, then, has a bearing on how language is used, and this linguistic structure is something that has only really been described in the last 30 years or so.

Monday, October 14, 2013

Language stuff - lexical density

Words in a text can be divided into content words and function words. Content words refer to some object, action or other non-linguistic meaning. The sort of words that are content words are nouns, adjectives, adverbs and most verbs. These are "open classes" of words - that is, it's possible to invent more content words. If I said "The abdef ghijk was mnoping stuvly over there", even though you'd never come across a sentence like that, you'd probably conclude that "ghijk" was a thing (noun), of which you can get "abdef" ones (adjective) "to mnope" was a verb which it is possible to do "stuvly" (adverb). If someone feels like drawing a picture of this happening, I'll include it in this post!

Function words are words that have little meaning in themselves, but express grammatical relationships with other words in the sentence. In the sentence above, "the", "was" and "over there" are all function words. They include conjunctions, prepositions, modal and auxiliary verbs, and pronouns. These are all "closed classes" - that is, there isn't "space" in the English language to add new ones, Dr. Dan Streetmentioner notwithstanding. All the new words that come along are content words, not function words.

It is possible to work out the proportion of content words compared to the total number of words. This is the lexical density. Different sorts of texts will have different lexical densities. On our OU course, we used the Longman Student Grammar of Spoken and Written English. This is a descriptive grammar (in other words, it described how English was used, rather than saying how it ought to be used), and analysed four different styles of discourse. Based on large corpora, it gave the lexical density of different sorts of discourse as:

  • Conversation - 35%
  • Fiction - 47%
  • Academic prose - 51%
  • News - 54%
Conversation is low for several reasons. The first is that, unlike written discourse, in most conversations, there is a shared context. This means that it's possible to use pronouns to a greater extent than nouns, for example. Also, conversation is improvised to a greater extent than written discourse. This means that there are likely to be dysfluencies - such as hesitators and repetition - which have the effect of decreasing the lexical density. As part of the OU course, I did my own analysis of lyrics from pop records. Their lexical density turns out to be almost exactly the same as that of fiction.


Tuesday, October 08, 2013

Language stuff - corpus (pl. corpora)

A new(ish) tool for the systematic examination of the English language is the development of language corpora. These are collections of samples of English language into a "body", which can then be examined using computer programs.

According to Wikipedia, the first corpus used for language investigation was the Brown Corpus. It consisted of around a million words, gathered from about 500 samples of American English, and was used in the preparation of the benchmark work Computational Analysis of Present-Day American English by Kucera and Francis. This was as recently as 1967.

The size of corpora has increased with increasing computer power. The British National Corpus currently contains 100 million words. The Oxford English Corpus - used by the makers of Oxford Dictionaries amongst others - contains 2 billion words of English. The Cambridge English Corpus is a "multi-billion word" corpus. In addition to the texts that make up the corpora, the words they include can be tagged for parts of speech - for example, whether "love" as it appears in a text is being used as a noun ("His love was so great...") or a verb ("I love you"). A corpus can be examined with software called concordancing software. This will search for specific words, phrases or instances of grammar, and can do things like highlight words that are frequently collocated. This can be used to identify patterns in the language that might otherwise go unnoticed.

Corpora can be produced from particular classes of text - for example, transcribed conversations, newspaper articles, academic journals, fiction. For E303, the Open University undergraduate course that introduces corpus linguistics, we were provided with a 4 million word corpus, with a million words drawn from each of these classes. It's also possible to produce your own corpus. I created a corpus of pop song lyrics - only 33000 words or so, but still enough to look for trends and patterns of language use. There is software available that can tag a text with parts of speech - for example, CLAWS4. And for analysing the sofware, the AntConc concordancing software is freely available.

Monday, October 07, 2013

Language stuff - type/token ratio

I just finished the Open University module E303 - English Grammar in Context - it sounds pretty deadly, but I loved it. Language is inherent to who we are as human beings - we all communicate. And yet, it's only relatively recently that the resources have been available to examine language in a systematic, large-scale way. A lot of the underlying theory is actually newer than the computer science theory that felt pretty new when I was doing my first degree.

I've promised blog series before, and they rarely amount to much, but I'd like to see whether I can write about some of the ideas we covered, and maybe get across some of the reason that I found the material so fascinating.

The first concept is type/token ratio. "Tokens" are the number of words in a piece of text - if I do a word count, then it tells me the number of tokens. But not all of them are unique. The most common word "the" I have now used ten times so far in this text (don't hold me to that - it's likely to have been edited in a highly non-linear manner - but you get the idea). You can get some insight into a text by dividing the number of unique words by the total number of words, and expressing it as a percentage. So for the text up to the start of this sentence, there were 229 words and 134 types - giving a type/token ratio of 59%.

A couple of things about type/token ratio. The first is that as a piece of text gets longer, the type/token ratio is likely to fall. The number of words is clearly increasing, but the number of types is increasing more slowly - it's more likely that you will be using the same words again. What that means is that if you want to compare type/token ratio of two different texts, they need to be about the same size.

The next thing is that different sorts of text will have different type/token ratios, as they are a measure of the diversity of the vocabulary being used. For my final assignment, I looked at pop song lyrics. I had a database of around 34000 words, and this had a type/token ratio of just under 10%. I compared this with a slightly larger database of words from a work of fiction, and this had a higher type/token ratio - just over 12%. A slightly smaller database of words from transcribed conversations had a lower type/token ratio - about 6.5%.

One might assume that the language used in pop music was pretty narrow in its range. But it turns out that it is quite diverse - almost as diverse as fictional writing, and much more so than the sort of language that's used in everyday conversation.