Since the previous post we’ve succeeded in using tesseract and we now have a nice plain text version of the EB entry on shakespeare:
http://knowledgeforge.net/shakespeare/svn/trunk/shksprdata/ancillary/britannica-11th.txt
What we now need to do is ‘proof’ this to correct the OCR errors. This kind of think is perfect for distributed volunteers so if you’d like to help out just step up and starting correcting with one of the sections. To make it especially easy for people to make edits the text has in a temporary location on the Open Knowledge Foundation wiki (only the first five pages for the time being):
http://wiki.okfn.org/p/Open_Shakespeare/Britannica
Related posts:
- OCRing Shakespeare Entry from Encyclopaedia Britannica 11th Edition One of next things we want to do for open shakespeare is provide an open introduction for to his works. The obvious idea for this was to use the Shakespeare entry in the 11th ed of the Encyclopaedia Britannica as...
- v0.3 of Open Shakespeare Released We’ve been doing quite a bit of work on the Open Shakespeare project (which we’ve mentioned before). Given that a brief search on the net turns up many sites about Shakespeare and lots of online copies of shakespeare’s texts you...
- Does an ‘open’ scan of a shakespeare folio exist? We’d really like to have some nice images of a shakespeare first folio (if possible from Hamlet) for use in the Open Shakespeare project. However all the scanned copies we’ve managed to find seem to be under full ‘all rights...

Could Amazon’s mechanical turk be used for this? http://www.mturk.com
Could Gutenberg’s distributed proofreaders system be useful? http://www.pgdp.net/c/
jean: thanks for the suggestions. We haven’t considered the mechanical turk so far because of the need to pay money (we’re pro bono publico and are volunteer based).
We’ve definitely been considering pgdp.net (see discussions on the mailing list). However for the time being given that the whole piece is only 30 pages we thought it better just to ‘put it in a wiki’ and do it on a volunteer basis rather than have to go through the pgdp.net process. However as I said we’ve been considering pgdp and may submit there.