We’re pleased to announce that a new report on access to information and open government data is open for consultation! From the announcement:

Access Info Europe and the Open Knowledge Foundation, in collaboration with the Open Society Institute Information Program, are holding a public consultation on open government data and the right of access to information.

This consultation is based on a new report “Beyond Access: Open Government Data and the ‘Right to Reuse’” produced as a result of research into the open government data and access to information movements. The report identifies the practical, technical and legal challenges facing these movements. The report is based on discussions with activists about the main issues to be address in the next couple of years, questions such as whether a right of access is linked to a “right to reuse” the data received.

You can download the full report here:

Consultation: we would like to hear your comments on the “Beyond Access” report.

  • Did we miss any important initiatives?
  • Are there issues we should include?
  • Are you doing something you’d like us to capture in the report?
  • Do you agree with our findings and recommendations?

There are three ways to make comments:

  1. Fill in our questionnaire on the report by clicking here.
  2. Make comments on the individual paragraphs at WritetoReply.org/beyondaccess
  3. Write to us at beyond@access-info.org

Consultation closes on Monday 11 October 2010

Last week I attended the Data-driven journalism in Amsterdam (which we blogged about here) run by the European Journalism (who interviewed me here).

My slides from the event are now up here:

Below are some lovely lofi graphical notes from Anna Lena Schiller:

It was a very well organised event and there were lots of interesting presentations and discussions. While many there were sold on the value of public bodies opening up datasets for others to use, there were more reservations about news organisations sharing datasets with each other and with the public. To address this, I’d like to start a brief document called:

  • Why should journalists and media organisations consider opening up their data?

The document would refer to existing success stories (such as the Guardian Datablog datasets, NYT Linked Data, …), compelling reasons, evidence, etc. and would appeal to enlightened self-interest. I’ve started some very preliminary notes at:

I hope this is something we will be able to discuss and add to at the data journalism event in Berlin later this week!

We’re delighted to announce a meetup on Data Journalism in Berlin in September organised by the Open Knowledge Foundation and Georgi Kobilarov at Uberblic Labs. Details are as follows:

  • When? 1st September 2010
  • Where? Fjord Office, Friedrichstrasse 210, Berlin
  • Register? You can register here!

Speakers will include:

  • Martin Belam, The Guardian
  • Jonathan Gray, The Open Knowledge Foundation
  • Christian Heise, ZEIT Online
  • Gerd Kamp, Deutsche Presse Agentur
  • Georgi Kobilarov, Uberblic Labs
  • John O’Donovan, BBC News
  • Tom Scott, BBC Earth
  • Ole Wintermann, Bertelsmann Foundation

From the blurb:

Data Journalism and the new and exciting possibilities that the Web of Data opens up for creators and consumers of news and media online will be the topic of this first meetup.

We have a brilliant lineup of speakers from media organisations like the BBC, The Guardian, the Deutsche Presse Agentur, the Bertelsmann Foundation coming to Berlin and talking about data journalism and the latest developments and projects in this field, and our friends from ZEIT Online will join the discussion.

The event takes place at the office of our friends at Fjord in the heart of Berlin. Starting at 2pm, you’ll hear talks followed by a panel discussion and an open space for working groups, and when the official programme ends at 7pm we’ll of course have drinks with all of you.

Language of all talks at the event will be English, but don’t be surprised to hear a bit of German here and there in conversations.

The Open Knowledge Foundation is organising an international workshop on open government data, which will take place in London this autumn:

You can register at:

From the announcement:

Open Government Data Camp 2010

What is it?

Basic details are as follows:

  • What? A two day workshop for people interested in open government data.
  • When? 18-19th November 2010
  • Where? University of London Union, London, UK
  • How much? Tickets cost £10 to help cover costs. You can sign up here!
  • Hashtag? ##ogdcamp2010

Tell me more…

Its been a big year for open government data. Around the world governments and public bodies have been opening up official datasets for the public to reuse. There has been an explosion of new applications, competitions, hackdays and other initiatives from local authorities, central government departments, international bodies and others. This event will bring together movers and shakers from the world of open government data — including government representatives, policymakers, lawyers, technologists, academics, advocates, citizens, journalists and reusers.

What will happen?

There will be two days of discussions, drafting, planning and hacking. Crucially we hope to:

  • Build consensus around key legal, technical and policy issues related to opening up government information.
  • Strengthen the community of people working on different aspects of opening up official data around the world — from both inside and outside government. (Many people working on this area will not have met in person!)
  • Encourage the exchange of experiences, expertise and ideas between those involved in leading open government data initiatives in different countries.
  • Make things! We hope there will be plenty of space for developers to hack on things — from refining core bits and pieces of technology to rapid prototyping of new ideas.

What will the format be?

Presentations will be kept to a minimum. Each day will begin with a sprinkling of short talks followed by plenty of time to talk, plan and work on things.

Can I submit a presentation?

We are going to put out a call for short presentations (around 30 x 10 minute slots) shortly. Details/links will be posted on the open-government discussion list.

Can I propose a session?

Yes please! Again, we’re going to brainstorm, plan and schedule sessions on the open-government discussion list — so head there if you have any cunning ideas!

What kinds of topics will be covered?

Possible sessions include:

  • How can we encourage other countries to open up official information?
  • Open government data in law and policy: obstacles and opportunities
  • Promoting reuse: competitions, community engagement, the role of the media
  • Finding open government data: catalogues, registries and metadata
  • Raw Data Now! Technical aspects of opening up government data
  • The role and value of linked data
  • Open government data and data journalism

What kinds of outputs will there be?

Projected outputs include things like:

  • First draft of an international ‘open data manual’ (organised as a ‘Book Sprint’)
  • A set of key open government data principles
  • A timeline of key developments for open government data around the world
  • A fairly comprehensive list of official initiatives — including data catalogues and competitions
  • A list of key examples of the reuse of open government data
  • Launch of RawDataNow.com — illustrating what we mean by ‘raw data’ aimed at those who publish official information
  • Brainstorming about projects which would make it easier for citizens to find, analyse and visually represent the data they are looking for

Who’s behind the event?

Open Government Data Camp was conceived and is being primarily organised by the Open Knowledge Foundation. The event is also supported by:

  • Cabinet Office, UK
  • EU LAPSI project, Turin, Italy
  • EU LOD2 project, Leipzig, Germany
  • Guardian, UK
  • Sunlight Foundation, USA

Who is coming?

You can find a list of participants at:

If you add your name to the list, please don’t forget to register! (And vice versa: if you’ve registered, please also add your name to the pad page above…)

Can I sponsor the event?

Yes please! We are still actively seeking sponsorship for lunches, coffee, travel and accommodation for international participants and so on. If you think you might be interested, please contact jonathan dot gray at okfn dot org.

What countries will be represented?

We are currently expecting representation from:

  • Argentina
  • Australia
  • Austria
  • Belgium
  • Brazil
  • Canada
  • Denmark
  • Finland
  • France
  • Germany
  • Hungary
  • Iceland
  • India
  • Ireland
  • Italy
  • Luxembourg
  • Netherlands
  • New Zealand
  • Norway
  • Russia
  • Spain
  • Sweden
  • Taiwan
  • United Kingdom
  • United States

Why do I have to pay?

The £10 ticket price is to help cover costs. If the ticket price is a problem, don’t hesitate to let us know. We won’t turn anyone away because they can’t afford to come!

A few weeks back we blogged about Russ Nelson’s proposals for the Open Source Initiative (OSI) to adopt the Open Knowledge Definition, our standard for openness in relation to content and data.

Russ has written back to us with some notes and questions from a session on this at OSCON:

Okay, so, as promised, here is my report on the “Open Data Definition” BOF held on Wednesday, July 21, at 7PM. There were about ten people present, which is a reasonable attendance, particularly when set against the Google Android Hands-on session at which they gave out free Nexus One phones.

Didn’t seem wise to me to start from scratch, especially given the good work done by the Open Knowledge Foundation on their Open Knowledge Definition: http://www.opendefinition.org/okd/. So we read through it section by section, by way of review. Here are the questions we arrived at (thanks to Skud aka Kirrily Robert for taking notes):

  1. What happens with data that’s not copyrightable? 1a. What about data that consists of facts about the world and thus even a collection of it cannot be copyrighted, but the exact file format can be copyrighted? Many sub-federal-level governments in the US have to publish facts on demand but claim a copyright on the formatting.
  2. What about data that’s not accessible as a whole, but only through an API?
  3. We’re thinking that OKD #9 should read “execution of an additional agreement” rather than “additional license”.
  4. Does OKD #4 apply to works distributed in a particular file format? Is a movie not open data if it’s encoded in a patent-encumbered codec? Does it become open data if it’s re-encoded?
  5. What constitutes onerous attribution in OKD #5? If you get open data from somebody, and they have an attribution page, is it sufficient for you to comply with the attribution requirement if you point to the attribution page?

This serves as an invitation to discuss these issues on the new list open-data@opensource.org . Send subscription requests to open-data-subscribe@opensource.org . Unsubscribe by sending a request to open-data-unsubscribe@opensource.org .

If these issues are successfully resolved, then this committee will recommend to the OSI board that the OKD should be adopted as OSI approved. If they can’t be resolved by, say, the end of 2010, then we will give up on trying. Either way, the intent is to lay down the list by the end of this year unless the participants desire otherwise.

So if you’d like to join the conversation, please join the list! We’ve also created an Etherpad to gather responses to some of these issues:

The Open Knowledge Foundation Working Group on EU Open Data is organising a session on linked data and open data at the ICT2010 event in Brussels later this year.

  • Where? T 003, Brussels Expo
  • When? 11:00-12:30 CET, 28th September 2010

From the blurb:

This networking session will discuss how public access to government data – crucial for an open and transparent society – can be improved.

This session has been proposed by IT professionals, scientists and government representatives organised – under the auspices of the Open Knowledge Foundation – as the Working Group on EU Open Data. It aims to establish a forum for networking and exchanging ideas with regard to publishing and linking governmental data, identifying technological developments and showcasing successful cases of linked governmental data. Developments in linked data could help further integrate information published by regional, national and European public administrations. The session is thematically relevant to a number of pillars within the Framework Programme as well as the Competitiveness and Innovation Programme.

Coordinator: Sören AUER (Universität Leipzig, AKSW, Institute for Computer Science, Germany)

Peter Murray-Rust — Cambridge University chemist, Open Knowledge Foundation Advisory Board member and tireless advocate for open data in chemistry — has recently started a series of blog posts about open data, focusing on issues related to the Panton Principles for open data in science.

The first is called Open Data: why I need the Open Knowledge Foundation, and in it he introduces some of the issues he wishes to discuss and gives his vision for the role he hopes the OKF community will play in relation to open data. He writes:

After a period of silence on this blog (but not on the Open Knowledge Foundation lists) I hope to publish a flurry of ideas on Open Data. There is no doubt that “Open Data” has arrived and there is enormous interest. (By contrast when I started to investigate it 5 years ago there was nothing). It’s desperately important, more complex than I ever imagined, and it’s critical to address it immediately, responsibly, dispassionately and inclusively. If we manage to set out the concerns now, we may manage to avoid the worst problems that were encountered by the Open Source and later Open Access movements. [They have made enormous progress and without their footsteps Open Data would fall into many of the same pitfalls. But Open Data is Difficult – a phrase I shall repeat frequently.]

I am putting my faith and energy into the Open Knowledge Foundation – its people and its infrastructure. This is because it’s an organisation which is wideranging (it deals with open content of all sorts, open metadata, services, etc.). It has great expertise in legal problems and solutions (where these are necessary) and also how to find alternative approaches. It’s neutral (apart from urging Openness and developing the infrastructure). It’s very professional, and realises that ideas without implementation have less weight. So there is an impressive range of software and information skills. I am reminded of my favourite motto (from the IETF) – “rough consensus and running code”, one the greatest productive mantras of our time.

The enthusiasm is palpable. [Today I had a breakfast Skype session with Jonathan Gray (coordinator of OKF) and it's all about how we can make things happen fast and responsibly.] The OKF works through Working Groups and discussion lists, and so when I had a concern about Open Data I brought it to the OKF and – after a great deal of work – we emerged with the Panton Principles which have now been translated into several languages by OKF members.

Simply, the OKF amplifies the visions of individuals from the almost-impossible to the attainable.

So I am putting some ideas into the OKF melting pot to see what emerges.

In the next post, titled Open Data: The concept of Panton Papers, he lays out his ideas for the Panton Papers:

The current theme is “Panton Papers”. The idea is that part of the value of the Panton Principles is that the whole document is short and the key points are simply made. But the “Principles” can therefore only address the motivation and the procedures for Open data in a general manner, and many of the problems are in the details. I believe that many of the problems in Open Access (which is simpler than Open Data) arose because not enough communal effort was given to the practice of Open Access and I want to avoid as many OD problems as possible before they occur.

Over the last 2 years (when Open Data has started to become important and discussed) I have seen several potentially difficult areas. I’ll simply list the ones I have thought of here and then outline the idea of the Panton Papers. This discussion is mirrored in part by the OKF open-science discussion list and you may wish to subscribe. There’s also a regular working group on open-science. (Almost everything in OKF is Open, but it may take a little while to find out where you want to be!). The issues that I currently have are:

  • What is data? Images? Graphs? Tables? Equations? Accounts of experiments? This is a major problem and almost completely unexplored. Without solving this we are held back 10 year or more in our ability to re-use the primary scientific literature (e.g. by closed-access publishers who claim that factual graphs belong to them).
  • Why should data be open? (and when should it not be?). I’ve put forward ideas here and here . They range from moral, to legal/quasi-legal to utilitarian.
  • Who owns data? This is one of the trickiest areas – there is legal and contractual ownership and there is moral ownership. Generally there is far far too much “ownership” of data.
  • When should data be released? This is a key question (see here for an example). Some communities have solved it – most haven’t addressed it and will have to go through the rigour of working out release protocols.
  • How and where should data be exposed? I am strongly of the opinion that we need domain-specific repositories (which could be national or international) and the Institutional Repositories are almost never the best place to expose data (I expect and welcome alternative opinions). The “how” depends on understanding what the data and metadata are and is increasingly dependent on specialist software and information standards. “Archival” is often the wrong word to use.
  • Datamining and textmining. Most authors, publishers, repository owners are unaware of the enormous power of automated analysis of the literature. Some closed access publishers expressly forbid these activities. We have to liberate the right of the scientific community to do this enthusiastically and efficiently.
  • Reproducibility. Science is based on reproducibility – we expect to be able to replicate the “materials and methods” of an experiment and to try to falsify its claims. Physical materials are beyond the immediate discussion (though this may change) but much science is now based on computing. It should be possible to replicate simulations, data cleaning, data analysis, model fitting etc. This is a tricky area. It is difficult (though with virtualization and the cloud is becoming easier) to reproduce the computing environment. Large or complex data sets are a major problem but must be addressed. This is not without monetary cost.

I may add more.

The idea is that each of these is a “Panton Paper”. It may or may not be crafted in Pantonia (the hectare of the Chemistry Department, The OKF headquarters, and the Panton Arms in Cambridge UK). Everything I now write is mutable.

Each paper will have a top level document of similar form to the Panton Principles, i.e. 3-8 ideas, with short explanatory paragraph(s). This document will be crafted by the OKF in public view on a wiki or Ether/Piratepad. Anyone can take part. We shall welcome contributions from a wide range of disciplines (in fact this is essential). At some stage version 1.0 of the paper will be frozen and will be formally published. We have an offer from a major publisher to do this and I am hoping we can announce this at Open Science Summit.

The Paper should carry a wider range of links to other essays in Open Data and should carry examples from different disciplines. For example there is a well tried and accepted process in many areas of bioscience and astronomy as to what when and how data get published.

Peter has started drafting ideas for the first two of these at:

If you’d like to get stuck in, please head on over to the open-science list and say hi! :-)

The following guest post is from Ivan Begtin, who is a member of the Open Knowledge Foundation’s Working Group on Open Government Data.

I would like to announce new open data project on Russian government spending…

Background

Russian Federal Law - 94-FZ of 21.07.2005 declared that Russian Federal Treasury and Russian regional procurement agencies should publish online limited but valuable information about government contracts. So since 1 January 2006 all government procurement systems were reconstructed to be online and to publish online contracts registries. These registries are just tables, DOC, XLS and PDF files and so on. Nothing visualized, no analytics at all, but lot’s of unstructured raw material.

So there is quite a bit of material available for anyone interested. But so far nobody seems to have converted this data into a public service!

The Project

This project named RosGosZatraty (Russian Government Spending) can be found at:

This project dedicated to all Russian government spending by government contracts and president grants. It’s completely public, it contains all raw data and provides details of any contract and any grant, it includes lots of reports and other analytics, as well as a quick and simple search function.

It was initiated and launched by Institute of Contemporary Development, which is a non-profit fund which Russian president Dmitry Medvedev is on the board of.

For now we have information 2007-2009 years that includes:

  • 137 high level government agencies (including dissolved)
  • 26 654 government bodies
  • 266 032 government suppliers
  • 1 390 704 individual contracts
  • 1306 grants

And lots of reports:

What next?

Sure, not everything is yet complete. We currently don’t have:

  • API
  • Better visualization
  • Data export
  • … and so on.

We will keep working on and improving the site.

For now it’s just non-profit mashup project based on existing open data. But I hope that later we will be able to make it more semantic web ready.

I am personally represent small software development company behind this project and as e-Gov expert and public spending specialist I am project manager of it. So it you have any questions - feel free to get in touch.

P.S. We don’t yet have any English pages on the site, so if you don’t know Russian then Google Translate can help you to find out more.

The following guest post is from Christiane Fellbaum at Princeton University who is working on a statistical picture of how words are related to each other as part of the WordNet project.

Information retrieval, document summarization and machine translation are among the many applications that require automatic processing of natural language. Human language is amazingly complex and making it “understandable” to computers is a significant challenge. While automatic systems can segment text into words (or tokens), strip them of their inflectional endings, identify their part of speech and analyze their syntactic (grammatical) function fairly accurately, they cannot determine the context-appropriate meaning of a polysemous word. Somewhat perversely, the words we use most often also have the greatest number of different senses (try to think of the many meanings of “check” or “case” for example).

WordNet organizes over 150 000 different English words into a huge, multi-dimensional semantic network. A word is linked to many other words to which it is meaningfully related. Thus, one sense of “check” is related to “chess,” another to “bank cheque” and a third to “houndstooth”. Based on the assumption that words in a context are similar in meaning to one another, a system can simply navigate along the arcs connecting WordNet’s words and measure how close or distant a given word is from another one in a text. Thus, if “check” occurs in the context of “draft,” WordNet will suggest that the appropriate sense “check” here is “bank cheque” as there only a few arcs connecting that sense of “check” with “draft”, while there are more (or none at all) connecting “draft” with the chess or textile pattern senses of “check.”

wordnet

WordNet is a major tool for word sense disambiguation in many Natural Language Processing applications. A number of terminological databases build on WordNet as a general lexicon to which domain-specific terminology can be added. WordNet is furthermore used for research in linguistics and psycholinguistics, for language pedagogy (English as a First and Second Language) and it has been integrated into many on-line dictionaries, including Google’s “define” function.

Being freely and publicly available, WordNet is queried tens of thousands of times daily and the database is downloaded some 6 000 times every month from the Princeton website.

Wordnet image from W3

Wordnet image from W3

Work on WordNet continues with support from the U.S. National Science Foundation. We are currently annotating selected words in the American National Corpus, a freely available text collection of modern American English, with WordNet senses. The annotated corpus will illustrate the use of specific word meanings for study and applications by both human users and computers, who can “learn” from examples to better identify context-appropriate word meanings. Another goal is to increase the internal connectivity of the semantic network by collecting human ratings of semantic similarity among words. The similarity ratings, once integrated into WordNet, will create many more connections among words and senses and improve automatic sense discrimination and identification.

We are delighted to announce that the authors of the Panton Principles have been awarded the SPARC Innovator prize!

The principles are currently maintained by the Open Knowledge Foundation’s Working Group on Open Data in Science.

From the announcement:

Science is based on building on, reusing, and openly criticizing the published body of scientific knowledge. For science to effectively function, and for society to reap the full benefits from scientific endeavors, it is crucial that science data be made open.

That’s the belief of four leaders who have put forth a groundbreaking set of recommendations for scientists to more easily share their data – The Panton Principles – and who have been named the latest SPARC Innovators for their work. [...]

The authors advocate making data freely available on the Internet for anyone to download, copy, analyze, reprocess, pass to software or use for any purpose without financial, legal or technical barriers. Through the Principles, the group aimed to develop clear language that explicitly defines how a scientist’s rights to his own data could be structured so others can freely reuse or build on it. The goal was to craft language simple enough that a scientist could easily follow it, and then focus on doing science rather than law.

The Panton Principles were publicly launched in February of 2010, with a Web site at www.pantonprinciples.org to spread the word and an invitation to endorse. About 100 individuals and organizations have endorsed the Principles so far.

“This is the first time we’re seeing diverse viewpoints crystallize around the pragmatic idea that we have to start somewhere, agree on the basics, and set the tone,” says Heather Joseph, Executive Director of SPARC (the Scholarly Publishing and Academic Resources Coalition). “The authors are all leading thinkers in this area – as well as producers and consumers of data. They each approached the idea of open data from different directions, yet with the same drive to open up science, and ended up on common ground.”

According to Pollock, “It’s commonplace that we advance by building on the work of colleagues and predecessors – standing on the shoulders of giants. In a digital age, to build on the work of others we need something very concrete: access to the data of others and the freedom to use and reuse it. That’s what the Panton Principles are about.”

Further details, including more background information and comments from other leading voices in the open science community, are available at:

Alma Swan comments, quite rightly, that:

Coming up with the Principles is not going to cut the mustard by itself. They will need to be advocated and promoted so that scientists are interested in debating them.

If you’re interested in helping to promote the principles to scientists in different domains, to research funding bodies, or to the general public, please introduce yourself on our open-science mailing list.