The following post is from Jonathan Gray, Community Coordinator at the Open Knowledge Foundation. It is cross-posted from jonathangray.org.
Since finally blogging about OpenPhilosophy.org last month I’ve been thinking about how one could make a generic open source platform that could be used to power it, and other things like it. Enter ‘TEXTUS’:
TEXTUS is an open source platform for working with collections of texts and metadata. It enables users to transcribe, translate, and annotate texts, and to manage associated bibliographic data.
Here’s the rationale:
The combination of freely available digital copies of public domain works, open bibliographic data and open source tools has the potential to revolutionise research in the humanities. However there are currently numerous obstacles which mean that they are often under-utilised by scholars and students in teaching and research:
- From classic literary and cultural works, to letters, drafts, notes, and other historical documents, there is a huge amount of freely available public domain material that is highly relevant to scholars and students engaged in research in the humanities. But these works can be difficult to find, difficult to work with, and works by a given author may be scattered in a variety of locations. Search results may be confusing or unclear. Automated Optical Character Recognition of texts may be inaccurate or incomplete. The metadata for the work for may be unclear and the provenance and rights status for a given digital edition may be unknown. It is not always clear how to cite passages from digital editions of public domain works.
- Over the past few years, libraries and other cultural heritage organisations have been releasing open data about works they hold. This has the potential to be a rich resource for scholars interested in building scholarly bibliographies and working with large collections of texts. While there are a growing number of tools and services for working with bibliographic data, many researchers may not know how to use these, and online bibliographies may not link through to digital copies of public domain works which are available online.
- There are a growing number of open source tools for transcribing, translating and annotating texts. However many of these are one off projects and it may not be clear how to deploy the tools in relation to a given text or collection of texts.
Here’s what it would do:
The TEXTUS platform will enable users to:
- Transcribe texts from images, PDFs or other non-machine readable sources.
- View texts and translations side by side – and create new translations of texts for use in teaching or research.
- Annotate texts, and share annotations with groups of users, or with the public.
- Curate, share and export collections of bibliographic metadata (scholarly references), including metadata associated with texts published on the platform.
Here’s a peek under the hood:
TEXTUS builds on and utilises existing best of breed open source components and software packages such as:
- Annotator – an open-source Javascript tool to enable annotations to be added to any webpage
- Bibserver – which includes numerous tools, services and standards for working with bibliographic metadata
- Open Literature – which powers OpenShakespeare, OpenMilton and other sites
- Public Domain Works – a nascent directory of works which have entered the public domain in different countries around the world
- Scripto – an open source tool that enables users to contribute transcriptions to online documentary projects
- WordPress – due to its popularity, ease of use, and extensive plugin system, TEXTUS will use WordPress as its main CMS
If you’re interested, you can join discussion on the Open Knowledge Foundation’s open-humanities mailing list.
Dr. Jonathan Gray is Lecturer in Critical Infrastructure Studies at the Department of Digital Humanities, King’s College London, where he is currently writing a book on data worlds. He is also Cofounder of the Public Data Lab; and Research Associate at the Digital Methods Initiative (University of Amsterdam) and the médialab (Sciences Po, Paris). More about his work can be found at jonathangray.org and he tweets at @jwyg.
Very interesting.
I’m an active user into wikisource.org, the wikipedia subproject which covers transciptions of books. I’ve a lot of questions about this project, which seems a great one; is there some current textus/wikisource cohoperation? Some common standard and some interfacing would be great.
Hi, Alex.
We could try to organize your questions and following answers here
http://wiki.okfn.org/Projects/Textus
You are also a wiki guy, so you know it’ll be good to organize some ideas there. Funny that I was planning something involving books digitalization (I was thought about wikisource!) and Rufus Pollock told me about me Textus.
http://okfnpad.org/scratchpad
See you!
Tom