Last week I attended the Data-driven journalism in Amsterdam (which we blogged about here) run by the European Journalism (who interviewed me here).

My slides from the event are now up here:

Below are some lovely lofi graphical notes from Anna Lena Schiller:

It was a very well organised event and there were lots of interesting presentations and discussions. While many there were sold on the value of public bodies opening up datasets for others to use, there were more reservations about news organisations sharing datasets with each other and with the public. To address this, I’d like to start a brief document called:

  • Why should journalists and media organisations consider opening up their data?

The document would refer to existing success stories (such as the Guardian Datablog datasets, NYT Linked Data, …), compelling reasons, evidence, etc. and would appeal to enlightened self-interest. I’ve started some very preliminary notes at:

I hope this is something we will be able to discuss and add to at the data journalism event in Berlin later this week!

We’re delighted to announce a meetup on Data Journalism in Berlin in September organised by the Open Knowledge Foundation and Georgi Kobilarov at Uberblic Labs. Details are as follows:

  • When? 1st September 2010
  • Where? Fjord Office, Friedrichstrasse 210, Berlin
  • Register? You can register here!

Speakers will include:

  • Martin Belam, The Guardian
  • Jonathan Gray, The Open Knowledge Foundation
  • Christian Heise, ZEIT Online
  • Gerd Kamp, Deutsche Presse Agentur
  • Georgi Kobilarov, Uberblic Labs
  • John O’Donovan, BBC News
  • Tom Scott, BBC Earth
  • Ole Wintermann, Bertelsmann Foundation

From the blurb:

Data Journalism and the new and exciting possibilities that the Web of Data opens up for creators and consumers of news and media online will be the topic of this first meetup.

We have a brilliant lineup of speakers from media organisations like the BBC, The Guardian, the Deutsche Presse Agentur, the Bertelsmann Foundation coming to Berlin and talking about data journalism and the latest developments and projects in this field, and our friends from ZEIT Online will join the discussion.

The event takes place at the office of our friends at Fjord in the heart of Berlin. Starting at 2pm, you’ll hear talks followed by a panel discussion and an open space for working groups, and when the official programme ends at 7pm we’ll of course have drinks with all of you.

Language of all talks at the event will be English, but don’t be surprised to hear a bit of German here and there in conversations.

Announcement below — voting ends 27 August

Raw Data Now: Building an Open Data Ecosystem Rufus Pollock and Jordan Hatcher of the Open Knowledge Foundation have submitted a proposal for a workshop highlighting the great work of the Open Knowledge Foundation, including Where Does My Money Go?, Open Shakespeare, CKAN, the Open Definition, and Open Data Commons (among many many more great projects!). The panel will cover:
  • What legal rights apply to databases?
  • What tools are available to developers and data publishers involved in public sector data?
  • How do I encourage public sector institutions to release data?
  • If I’m in the public sector, what’s the best way for me to release my data?
  • Why is open data different from open source or open content?
Voting is a key part of the SXSW selection process, so please vote for our panel.

===

Also plug for The Itinerant Poetry Librarian’s panel will very likely also be of interest to OKFN folks into open bibliographic data and all things librarian:: “They stopped coming?”: Librarians Don’t Cry They Re-View

We are pleased to announce a one day workshop on Open Bibliographic Data and the Public Domain. Details are as follows:

Here’s the blurb:

This one day workshop will focus on open bibliographic data and the public domain. In particular it will address questions like:

  • What is the role of freely reusable metadata about works in calculating which works are in the public domains in different jurisdictions?
  • How can we use existing sources of open data to automate the calculation of which works are in the public domain?
  • What data sharing policies in libraries and cultural heritage institutions would support automated calculation of copyright status?
  • How can we connect databases of information about public domain works with digital copies of public domain works from different sources (Wikipedia, Europeana, Project Gutenberg, …)?
  • How can we map existing sources of public domain works in different countries/languages more effectively?

The day will be very much focused on productive discussion and ‘getting things done’ — rather than presentations. Sessions will include policy discussions about public domain calculation under the auspices of Communia (a European thematic network on the digital public domain), as well as hands on coding sessions run by the Open Knowledge Foundation. The workshop is a satellite event to the 3rd Free Culture Research Conference on 8-9th October.

If you would like to participate, you can register at:

If you have ideas for things you’d like to discuss, please add them at:

To take part in discussion on these topics before and after this event, please join:

The Open Knowledge Foundation is organising an international workshop on open government data, which will take place in London this autumn:

You can register at:

From the announcement:

Open Government Data Camp 2010

What is it?

Basic details are as follows:

  • What? A two day workshop for people interested in open government data.
  • When? 18-19th November 2010
  • Where? University of London Union, London, UK
  • How much? Tickets cost £10 to help cover costs. You can sign up here!
  • Hashtag? ##ogdcamp2010

Tell me more…

Its been a big year for open government data. Around the world governments and public bodies have been opening up official datasets for the public to reuse. There has been an explosion of new applications, competitions, hackdays and other initiatives from local authorities, central government departments, international bodies and others. This event will bring together movers and shakers from the world of open government data — including government representatives, policymakers, lawyers, technologists, academics, advocates, citizens, journalists and reusers.

What will happen?

There will be two days of discussions, drafting, planning and hacking. Crucially we hope to:

  • Build consensus around key legal, technical and policy issues related to opening up government information.
  • Strengthen the community of people working on different aspects of opening up official data around the world — from both inside and outside government. (Many people working on this area will not have met in person!)
  • Encourage the exchange of experiences, expertise and ideas between those involved in leading open government data initiatives in different countries.
  • Make things! We hope there will be plenty of space for developers to hack on things — from refining core bits and pieces of technology to rapid prototyping of new ideas.

What will the format be?

Presentations will be kept to a minimum. Each day will begin with a sprinkling of short talks followed by plenty of time to talk, plan and work on things.

Can I submit a presentation?

We are going to put out a call for short presentations (around 30 x 10 minute slots) shortly. Details/links will be posted on the open-government discussion list.

Can I propose a session?

Yes please! Again, we’re going to brainstorm, plan and schedule sessions on the open-government discussion list — so head there if you have any cunning ideas!

What kinds of topics will be covered?

Possible sessions include:

  • How can we encourage other countries to open up official information?
  • Open government data in law and policy: obstacles and opportunities
  • Promoting reuse: competitions, community engagement, the role of the media
  • Finding open government data: catalogues, registries and metadata
  • Raw Data Now! Technical aspects of opening up government data
  • The role and value of linked data
  • Open government data and data journalism

What kinds of outputs will there be?

Projected outputs include things like:

  • First draft of an international ‘open data manual’ (organised as a ‘Book Sprint’)
  • A set of key open government data principles
  • A timeline of key developments for open government data around the world
  • A fairly comprehensive list of official initiatives — including data catalogues and competitions
  • A list of key examples of the reuse of open government data
  • Launch of RawDataNow.com — illustrating what we mean by ‘raw data’ aimed at those who publish official information
  • Brainstorming about projects which would make it easier for citizens to find, analyse and visually represent the data they are looking for

Who’s behind the event?

Open Government Data Camp was conceived and is being primarily organised by the Open Knowledge Foundation. The event is also supported by:

  • Cabinet Office, UK
  • EU LAPSI project, Turin, Italy
  • EU LOD2 project, Leipzig, Germany
  • Guardian, UK
  • Sunlight Foundation, USA

Who is coming?

You can find a list of participants at:

If you add your name to the list, please don’t forget to register! (And vice versa: if you’ve registered, please also add your name to the pad page above…)

Can I sponsor the event?

Yes please! We are still actively seeking sponsorship for lunches, coffee, travel and accommodation for international participants and so on. If you think you might be interested, please contact jonathan dot gray at okfn dot org.

What countries will be represented?

We are currently expecting representation from:

  • Argentina
  • Australia
  • Austria
  • Belgium
  • Brazil
  • Canada
  • Denmark
  • Finland
  • France
  • Germany
  • Hungary
  • Iceland
  • India
  • Ireland
  • Italy
  • Luxembourg
  • Netherlands
  • New Zealand
  • Norway
  • Russia
  • Spain
  • Sweden
  • Taiwan
  • United Kingdom
  • United States

Why do I have to pay?

The £10 ticket price is to help cover costs. If the ticket price is a problem, don’t hesitate to let us know. We won’t turn anyone away because they can’t afford to come!

The Open Knowledge Foundation Working Group on EU Open Data is organising a session on linked data and open data at the ICT2010 event in Brussels later this year.

  • Where? T 003, Brussels Expo
  • When? 11:00-12:30 CET, 28th September 2010

From the blurb:

This networking session will discuss how public access to government data – crucial for an open and transparent society – can be improved.

This session has been proposed by IT professionals, scientists and government representatives organised – under the auspices of the Open Knowledge Foundation – as the Working Group on EU Open Data. It aims to establish a forum for networking and exchanging ideas with regard to publishing and linking governmental data, identifying technological developments and showcasing successful cases of linked governmental data. Developments in linked data could help further integrate information published by regional, national and European public administrations. The session is thematically relevant to a number of pillars within the Framework Programme as well as the Competitiveness and Innovation Programme.

Coordinator: Sören AUER (Universität Leipzig, AKSW, Institute for Computer Science, Germany)

Yesterday I went to ScotGovCamp in Edinburgh and had a lovely time. Spent more of it chatting in the hallway than participating in the sessions; but have detailed notes from the Open Data session led by Chris Taggart of Openly Local, and scatterings from elsewhere.

Open Data

Chris cites his membership of OKF’s Open Government Data Working Group, the London Datastore advisory body, and the Westminster Local Public Data Panel. Good, we now know we are dealing with a pretty serious guy.

His focus has been on the “English Experience” and he’s come to make contacts in Scotland. Citing as recent developments with impact yet to be fully felt, the Ordnance Survey Open Data release and the disclosure of Westminster MPs’ expenses. Looking for “drivers and levers” that will surface as yet unseen issues in local government.

It’s much less clear (at least here in the UK) how local, as opposed to central, communications and decision-making networks actually work. Local authorities are in an unclear legal situation - European PSI law should oblige local government to publish more data, but the knowledge of the law is often just not there (people are too busy).

OpenlyLocal has been going for a mere 15 months. It was inspired by a Manchester version of They Work For You and by the ScraperWiki project. OpenlyLocal collects information about local government data sources and critically the people involved, the social networks involved in decision making at council level. The site now has some amount of data (scraped from websites and republished as Linked Data) for 158 councils in England and Wales - but for only 4 in Scotland. One ultimate aim is to encourage local authorities to re-adopt the data, and the practises, being created by Chris and the contributors to OpenlyLocal. Other motivating things for publishing local administration info, as pure data:

  • Accessibility concerns. Publication of data, as opposed to pictures of data (like PDFs) avoids accessibility concerns. Creation of interfaces to data is expensive and incurs a maintenance burden…
  • Possible to tie in to other hyperlocal resources - a good example in Edinburgh is Greener Leith
  • Creation of an index, or directory, to existing council resources, that is easier to explore than a conventional website

Chris outlined 4 key reasons why open local data is important (though the reasons seem to alter with every re-telling).

  1. Transparency - we can see for ourselves, and draw our own conclusions.
  2. Engagement - citing Planning Alerts - casual engagement is possible, you don’t need to be obsessive
  3. Equality - “open data is about equality of access, because all this data is currently available for a price, and that’s not right”
  4. Relevance - to local temporal reality of affairs - less decoupled synthesis of prepared or reported data - just data.

“Quality of data is important and opening that helps (and is used as a blocker) but not as important as other points”

Can we make interfaces that work for our grandparents?

“There’s a much bigger step between creating nothing, and creating something, than between creating something stupid, and creating something great… just make a start, somewhere, anywhere.”

To local administrations - “it should cost nothing to release open data. If it doesn’t cost nothing, you’ve got a really bad outsourcing deal”.

To everyone else - “Fundamentally, it’s our data.”

Questions about quality

Recently, I’ve been thinking a lot about data quality within the geo ghetto, so it surprised me to hear several audience questions from local administrative workers, directly asking about data quality. How imperfect/unreliable/uncertain is the data? Given inevitable uncertainty, how is this doubt stopping us (or the decision makers for whom we are responsible) from opening data?

Data quality problems can have severe cost and social effects - one case cited was a database recording details of children, in which 5% of dates of birth were wrong, so 5% of people are being treated administratively as children when they are not, or treated as adults when they are not (at least according to the administrative definitions, processes etc).

It’s quite possible to measure quality, to test and to describe it. Data package tests, like software package tests, extracting what’s useful from the formal standards thinking on quality. But this is too much of a digression, some of which is here, some of which is on the way.

Law and Computers

An interesting session which i only caught the end of and is more fully described on the ScotGovCamp blog by my EDINA colleague, Nicola Osborne. My notes say this:

German reform in the early 19C. | Biblical census. 
Legislation | Standards | Influence
e-Care records, ATOS Origin
distributed versioning in citizen data - propagation,
provenance, merging.
Robot Queen? Automaton?
Target specification - e.g. music education,
department of education directive.
specifications, models, records management 
overspecification in law, cost, fear.
Westminster Information Act
(ontology-like)

Cuts

Dropped into the session on cuts, which wasn’t all gloom and doom, but more vendor optimism about shared services. Asked vendors about whether they made free software, or could find a place where business benefit to themselves and organisational benefit to their (public administration) clients could be created by freeing their software (in parallel to building shared hosted services). Not sure there was an answer.

Wondering about open demographic data, social credit data, and what’s the non-proprietary answer to Experian.

Good comments from Chris Taggart in this session too - “specialising in one thing, as a service provider. Low barriers to entry - low barriers to exit equally important”. Wondering about a JISC-like body for stewardship of shared services for local authorities. Would probably become a beast.

Fragments of insight

The big consultancies that form consortia to do government work, work by mimicry - by mirroring the hierarchical administrative structures that they are serving. But then internally, they actually do iterative micro-procurements - as in EU consortia the bulk of the actual work is done by very small providers. Many large and small companies work across local authorities, and it would be fascinating to see the map of who and where they are, which Chris is beginning to derive from spending data.

Shadow networks, shadow systems form, inevitably, in organisations at scale. But a paradox - the more superficial openness there is (coming from cultural change, or coming from legal or quasi-legal mandates, or meeting in the middle) the less is actually recorded. Data implies audit, audit invokes fear of loss. So organisation becomes about emotional concerns - perhaps it would be helpful to recognise this more?

Note, i corrected a bit of this, Equality rather than Quality, with which i must be temporarily obsessed. Thanks Chris for notes. Thanks Tim Howgego for insights.

I’m very much looking forward to an event on Data Driven Journalism in Amsterdam in late August, which will bring together representatives from various media organisations (e.g. The New York Times, The Financial Times, The Times, …) and other stakeholders for a day of talks and discussions on the role of new digital technologies in the future of journalism:

The European Journalism Centre in collaboration with the University of Amsterdam organises the first round table on data-driven journalism on 24 August in Amsterdam. The one day event brings together specialists in fields which intersect with data-driven journalism: data mining, data visualisation and multimedia storytelling to discuss the possibilities of this emerging field, examine and understand the needed tools and workflows, and spread the know-how for data-driven journalism. What can we learn from the existing projects? How can we integrate the existing tools in the journalistic workflows? What skills are needed to enter this field? These are just a few of the issues which will be addressed in this event.

In particular I’m keen to talk about with the other participants about open data, and the role journalists can play in helping to open up official information and to help present it to the public in new ways. They asked me for a quote to use for the event:

Opening up content and data produced by public bodies will enable new forms of reportage as well as a new generation of services enabling the public to participate in the news making process. New tools to analyse, represent, deliver and give context to public data are beginning to revolutionise the way we understand large and complex issues, from Hans Rosling’s analysis of flu statistics, to the Guardian MP expenses crowdsourcing tool, and to the Afghanistan Election Data project. An ecosystem of open data that anyone can reuse or contribute to will be critical for a new generation of data driven journalism to flourish.

You can find out more at:

If you’re going to be in Amsterdam, participation is free and you can register here.

The good folks at Scraperwiki are organising an event for developers and journalists later this month in Birmingham, UK. Great to see them helping to connect the dots between people who build things with datasets and those who can help to put the data into context!

You can sign up at:

From the announcement:

What? Scraperwiki the award-winning new screen scraper and data mining tool, funded by 4iP, is putting on a one day practical hack day* in Birmingham at which web developers and designers will pair up with journalists and bloggers to produce a number of projects and stories based on public data. We would like to thank our sponsors Birmingham Science Park Aston and Digital Birmingham for making the event possible.

Who’s it for? We hope to attract hacks and hackers from all different types of backgrounds: people from big media organisations, as well as individual online publishers and freelancers.

What will I get out of it? The aim is to show journalists how to use programming and design techniques to create online news stories and features; and vice versa, to show programmers how to find, develop, and polish stories and features.

How much? NOTHING! It’s free, thanks to our sponsors.

What should I bring? We would encourage people to come along with ideas for local ‘datasets’ that are of interest. In addition we will create a list of suggested data sets at the introduction on the morning of the event but flexibility is key for this event.

So what exactly will happen on the day? Armed with their laptops and WIFI, journalists and developers will be put into teams of around four to develop their ideas, with the aim of finishing final projects that can be published and shared publicly. Each team will then present their project to the whole group. Overall winners will receive a prize at the end of the day.

Not sure what a hack day is? Let’s go with the Wikipedia definition: It “an event where developers, designers and people with ideas gather to build ‘cool stuff’”…

Open Knowledge Foundation Director Rufus Pollock has been interviewed by the Guardian in the run up to its Activate Summit 2010 which will take place on Thursday.

From the interview:

How, in your experience, have web technologies been employed to make the world a better place?

The internet and new digital technologies have had and will continue to have a huge impact on the way that knowledge is disseminated in society. Sharing knowledge more effectively has the potential to improve the world in all kinds of ways — from closing the loop between citizens and public bodies, allowing for greater accountability and improved service provision, to improving large-scale collaboration in science, e.g. on the development of life-saving drugs and treatments. Better knowledge sharing enables us to understand some of the world’s biggest problems — from our changing climate to our troubled economies — and to respond to them more effectively. In addition to these extrinsic merits, digital content can also be intrinsically valuable — such as in the case of classic literary or musical works which have entered the public domain or recordings of lecture courses which anyone can freely listen to and share.

And where for you are the real problem areas that remain that you think the internet and its associated technologies can help to tackle?

While we have started to see the positive benefits of opening up different kinds of content and data, there is still a long way to go! Our copyright laws mean that in many cases we are not permitted to republish or combine different sources of information available online. Publication workflows in government still revolve around polished documents for people to read in print rather than datasets which can be manipulated, analysed, and represented by computers. Scientists often do not publish the raw data underlying their research publications — meaning that potentially valuable experimental data or analysis can sit gathering dust. In many countries public bodies are often protective of their information assets, hoping to sell them to private companies rather than opening them up for reuse by the public. Across the board we still have vast silos of data that is not shared.

In some cases we have overcome some of the various obstacles to sharing knowledge. We have licenses and legal tools which can be used to give the green light to those wishing to reuse documents or datasets, akin those used for open source software. We have technologies such as wikis and versioned databases to enable widespread collaboration in knowledge development. We have policy documents and good examples to point to which indicate the benefits of opening up data for others to reuse. But these are the exception rather than the rule. Many institutions and communities are now facing decisions which will help to shape the future of how knowledge is shared — and which will help to determine whether we will have plethora of poorly connected walled gardens, or a shared ecosystem that everyone can benefit from.

So what projects are you currently engaged in on a day to day basis and how does the internet fit into this?

At the Open Knowledge Foundation we are involved in a broad range of projects that aim to promote or demonstrate the value of open material — from sonnets to statistics, genes to geodata. Many of these projects are driven by a dispersed community of contributors, who collaborate using a whole range of digital technologies – including things which have been around for a while now like blogs, wikis, mailing lists, as well as newer things like Etherpad, versioned databases and so on.

Our projects include:

  • Where Does My Money Go? - a project which aggregates, cleans up and republishes information about UK public finances in a form which makes it easy to reuse, and provides a dashboard and other tools and services for analysing, visualising and exploring the material.
  • CKAN - an open source registry of open data, currently used to power data.gov.uk. We are currently working with open government data advocates around the world to set up over a dozen new instances so that citizens around the world can easily find and reuse official information.
  • Open Data Commons - a set of easy to use legal tools which can be used to open up datasets and databases. Open Street Map is currently looking to use one of the licenses from this project.

Who do you admire in this space? Who’s inspiring you? Who’s pushing the boundaries and how?

There are so many people who are doing fantastic things in the world of open data at the moment! Open data enables unprecedented large scale collaboration on efforts to clean up, link together, deliver and represent all different kinds of information. The recent work undertaken by the Open Street Map community to provide mapping assistance to humanitarian organisations is a really tangible example of what is possible. Hans Rosling’s Gapminder project and his accompanying video lectures are an excellent example of telling stories with open data to help improve the public understanding of complex issues. Countless public servants in both central and local government bodies in the UK, US, Australia, New Zealand and elsewhere have been hard at work behind the scenes to help open up official information for the public to reuse. Similarly there is growing support for open data in the library world, and in scientific fields like chemistry and bioinformatics.