DBpedia recently released the new version of their dataset. The project aims to extract structured information from Wikipedia so that this can be queried like a database. On their blog they say:
The renewed DBpedia dataset describes 1,950,000 “thingsâ€, including at least 80,000 persons, 70,000 places, 35,000 music albums, 12,000 films. It contains 657,000 links to images, 1,600,000 links to relevant external web pages and 440,000 external links into other RDF datasets. Altogether, the DBpedia dataset now consists of around 103 million RDF triples.
As well as improving the quality of the data, the new release includes coordinates for geographical locations and a new classificatory schema based on Wordnet synonym sets. It is also extensively linked with many other open datasets, including: “Geonames, Musicbrainz, WordNet, World Factbook, EuroStat, Book Mashup, DBLP Bibliography and Project Gutenberg datasets”.
<
p>This is probably one of the largest open data projects currently out there – and it looks like they have done an excellent job at integrating structured data from Wikipedia with data from other sources. (For more on this see the W3C SWEO Linking Open Data project – which exists precisely in order to link more or less open datasets together.)
Related posts:
- The Open Library and Versioned Data The Internet Archive has recently launched a beta version of The Open Library. A demo can be found here and the Open Library book can be read here. It is inspired by the idea of a “library that makes all...
- Opening Up Ancient Geodata: The Barrington Atlas II I’ve written previously about the Barrington Atlas of the Ancient World which took 12 years to produce (1988-2000). It’s a wonderful example of interdisciplinary collaboration using, as it did, the talents of a multitude of classical scholars as well as...
- Collaborative Development of Data $ This version: 2007-02-15 (First version 2006-05-24) $ We already have some fairly good working processes for collaborative development of unstructured text: the two most prominent examples being source code of computer programs and wikis for general purpose content (encyclopedias...

Pingback: Unilever Centre for Molecular Informatics, Cambridge - petermr’s blog » Blog Archive » DBPedia2: major opportunity for semantic web (including chemistry)