Wikidata: a new open data repository for the world
This month Wikidata, a new project of Wikimedia Germany, finally started. The ambitious goal of the project is to create an open data repository for the world’s knowledge that can be accessed and edited by everyone, humans and machines alike. Wikidata will be a place where Wikipedia’s editors and others will be able to collect statements about the world we live in, and references for them. Wikidata will become an enormous open collection of knowledge.
The thousands of editors around Wikipedia have been collecting open knowledge for more than 10 years now. There are Wikipedias in more than 280 different languages at the moment. Imagine what they will be able to achieve if given the opportunity to collect and use structured data. Imagine what 3rd parties could do with all the collected data.
Wikidata will contain information like the birthdate of a famous person, the length of a large river or the year a book was written. But it does not end there. Wikidata will not just collect facts. Wikidata will be able to represent the ambiguity of the world. It will be possible to have different sources for one item all saying different things about it. It will collect data like the length of the Amazonas where different sources might have varying numbers for. Wikidata will be able to provide the length in metres for those who prefer the metric system and in miles for those who prefer to use that instead.
Our goals for Wikidata are twofold:
1. We want to provide Wikipedia editors with a central place to collect and maintain data. This way the data will no longer have to be maintained in the article texts of each of the over 280 Wikipedias but instead in only one place. This should reduce the maintenance burden for each of the Wikipedias significantly. It will also help smaller Wikipedia communities who have limited resources and can then rely on the work of larger Wikipedia communities for boot-strapping articles and keeping them up-to-date. This will bring Wikipedia’s knowledge to many more people in their native language.
2. We want to help build a significant part of the open data ecosystem. Data in Wikidata will be licensed under a free license. APIs and exports into RDF and JSON will be available so everyone can access and use it. The software that will be running Wikidata is Free Software, just like MediaWiki, the software Wikipedia is running on. It will be possible to set up your own Wikidata-like instance for more specific use-cases or topics that Wikipedia’s community is just not interested in covering. We hope this can become a cornerstone of the open data ecosystem.
The initial development is going to be done in three phases. The aim of the first phase will be to improve language links. These are the links in the sidebar of each Wikipedia article leading to an article on the same topic in a different language. Right now they are stored in the source of each article in each language. After the completion of the first phase they will just be stored once in Wikidata. The second phase will be about infoboxes. After its completion, editors will be able to enrich infoboxes with data from Wikidata. Lists are the focus of the third and last phase. The goal is to allow automatic list creation based on data in Wikidata as opposed to creating and maintaining them by hand. It will then be possible to have the “list of 10 largest cities in the United States with a female major” created automatically. You can read more about each of the three phases in the technical proposal.
The project is realized with donations by the Allen Institute for Artificial Intelligence, Google, Inc. and the Gordon and Betty Moore Foundation. Twelve people are employed to make Wikidata a reality over the next year and we hope you will join us and the Wikimedia community on the journey to bring structured open data to Wikipedia and beyond.