Open dictionaries are excellent examples of open knowledge projects. Whether monolingual or bilingual, and whether dealing with definitions, etymology, translation or pronounciation – they can often be large, collaborative undertakings.
Dictionary databases have a wide variety of potential applications – from education and research to machine translation and integration with software applications and services.
We’ve listed several open dictionary projects and packages on CKAN:
- Currently offers 69 bilingual dictionaries released under the GPL.
- Currently includes over currently 308 dictionary files in various languages published in XML format. All material is under the GPL.
- Offers a variety of dictionaries with over 20 different language pairs. Material is under the GPL, the GFDL and the Creative Commons Attribution-Sharealike license.
- The Wikimedia Foundation’s dictionary project – currently including over 5 million entries in over 170 languages.
- A project to build a basic public domain dictionary for children.
- Scans of the first several volumes of the Oxford English Dictionary (the portion which has fallen into the public domain). It would be great to have a machine-readable version of this!
- A German-English dictionary with over 216,000 entries. Under the GPL.
- A Welsh-English, English-Welsh dictionary with over 13,000 entries. Under the GPL.
- A Japanese-Multilingual dictionary available under a Creative Commons Attribution Sharealike license.
- A set of thesauri in 8 different languages under the GPL.
We’d like to start using tags to correspond with the ISO 639-2 codes for the representation of names of languages, such as:
If you know of any other open dictionary projects – we’d love to hear about them! You can either pop us a line to the okfn-discuss list, or add packages directly to CKAN:
Dr. Jonathan Gray is Lecturer in Critical Infrastructure Studies at the Department of Digital Humanities, King’s College London, where he is currently writing a book on data worlds. He is also Cofounder of the Public Data Lab; and Research Associate at the Digital Methods Initiative (University of Amsterdam) and the médialab (Sciences Po, Paris). More about his work can be found at jonathangray.org and he tweets at @jwyg.