Friday, July 25, 2014

"The Book Thief", Gaza, Ukraine, Iraq

It doesn't need me to say how good this book is. It was a fascinating and beautiful book to read, particularly having just finished a literature course. Both "contemplation" and "enjoyment" (as C.S.Lewis describes them) were present - that is, I found myself able to accept the odd narrative perspective of the book, and also wonder at the structure, and the structures within the structure, that Zusak has created. 

It seemed a particularly relevant book to read at the moment. Set in the Second World War in Germany, it portrays the impact of the War from the perspective of the "losing side" - both those who believed in Nazism and those who didn't - but in almost all cases, shows their humanity, and the price of the war.

I have no desire to multiply words regarding the fighting in Gaza or the Ukraine/Russian border or Iraq. Enough has been said: what is needed is for people to recognise the fellow humanity of other people. I included the word "simply" in that clause originally - but of course, there is nothing simple about it. Unfortunately, hope and history don't rhyme.
How long?

Friday, July 18, 2014

Scan/OCR/Proofread

It's quite feasible to convert a text from a physical book to an electronic book. However, it's a multiple stage process.

The first stage is scanning the physical text. Here's a scanned image from a book called "A Memoir of Adolph Saphir D.D.".


Next, Optical Character Recognition (OCR) software has to be used to convert this from an image into text. This is pretty intensive in terms of computer power. I use Abbyy FineReader 10 Professional Edition. Here's a sample of the output from this (though much of the formatting has been lost in copying from a Word document into Blogger):
PREFACE.
TT has been impossible to publish sooner the Memoir of the lamented Dr. Adolph Saphir. On account of his sudden death, which followed so closely that of his wife, there was a delay in the settlement of his affairs; and, consequently, no access could be had to documents of any kind till about the middle of last year—a year after his death. When I was then asked to write the Memoir, much time and labour were required to collect letters and documents from friends and correspondents of Dr. Saphir. But though there has consequently been delay, the Memoir will, I believe and hope, be not less valued by devoted friends, of whom he had very many, nor less interesting to the general public.
A good quality scan makes a difference - by comparing the image and the text, you can see how good a job the software has done in "reading" the image.

However, the most intensive stage is still to come. That is proofreading the text that has been produced. FineReader will highlight places where it was unsure about the translation from image to text, which means that the file can be edited directly in the software. Alternatively, a rough word processor file can be used as a starting point with reference to the original document. In either case, the Scan/OCR stages are pretty much just a question of getting round to them and then letting the computer run. The proofreading stage is a project in its own right.