Wednesday, October 27, 2010

Boggle(TM), Scramble(TM) and statistics

I've been fiddling around with Boggle letters and trying to learn more about the statistics associated with them. This is partly because of a phenomenon I noticed when playing Scramble, which is a Zynga version of a Boggle-like game that can be played on Facebook.

The phenomenon is that it seemed quite common for a word to crop up on successive boards - eg. the word "IRON" might appear on one board and then also on the next board. This isn't something I'd particularly noticed playing Boggle - the reason probably being is that Boggle games are substantially slower. Did this mean that the algorithm for generating letters was "cheating", or was it actually to be expected?

The short answer is that it is probably to be expected, though there is a lot more analysis that can be done. Here is a page where the most likely words to appear in a Boggle game are listed. I am assuming this is reliable; it's based on a sample size of 50,000 boards. Notice that the most common 4 letter words will each appear on roughly 5% of boards; the most common 5 letter words will appear on roughly 2% of boards.

Furthermore, from the lower graph, the mode number of 5 letter words is around 8, and the mode number of 4 letter words about 28.

The probability of a common five letter word repeating, then, is roughly 1 - (1 - 0.02)8 - that is, around 15% of the time. But the probability of a common four letter word repeating is 1 - (1 - 0.05)28 - that is, around 76% of the time. The probability of a specific word repeating in successive games is much lower - even for a common four letter word, only 5% of the time. But it doesn't have to be a specific word - you just have to notice any word being repeated to think - "That's odd, I had that in the last game."

More geeky numbery stuff to come another time, I expect .... and H/T Sofia Knutson for the link.

No comments: