Read today a Google Books PR piece on the Guardian website. Of out-of-print or hard-to-get books, it says, “Although copies may be available in libraries, they are effectively dead to the wider world.” Also heard today that Google Street View is proposing inside views, museum interiors.

Last week, I and some OKF people heard a Google Books lawyer, Antoine Aubert, speak at the 7th COMMUNIA workshop on the public domain.

Google digitise the holdings of libraries free of cost, returning the library a copy, retaining some exclusivity over further re-use for Google. For example, a library is asked not to allow other search engines to index the digitised full text of the works.

Rufus commented on the Public Domain Calculator cross-European project that “A library who will remain nameless would not provide us with their catalogue metadata because of an exclusive arrangement with Google in rights to re-use the catalogue. Were they mistaken?” Antoine was not able to give a definite answer, to this and other questions.

A library’s raison d’etre is to provide physical access to books. With high-quality digitisations online for free, physical traffic will definitely fall. Space used for storage in prime central locations is inefficient; why not just scan the books and keep them in an air-conditioned warehouse in Swindon?

Meanwhile a library’s purchasing power is partly determined by the number of people borrowing books. New books will be indexed and stored by Google directly from publishers. There won’t be much reason to visit a library.

The library will become a museum of books. The museum will become a mausoleum of things.

To survive as institutions, museums, libraries and archives need a sustainability model, one which cannot depend on state funding alone.

One path to explore is commercial services for special purposes - re-use of very large high-resolution scans, printing of images and facsimiles, new or custom images, new interfaces and search functions.

If Google now has the right to restrict the use of the works online, those libraries accepting the “free” digitisation offer are not free to build and maintain the services that, as memory institutions in a digital age, they really should be providing.

Well, there’s always Wikipedia, and particularly the Britain Loves Wikipedia events going on through February 2010, focused on photographing heritage objects.

Matthias Schindler spoke at the same COMMUNIA meeting about a German Wikipedia effort to fix and link metadata from authority files by the German National Library - some background slides. His message went, “Give us your metadata. Really, just give us your metadata right now.”

Nat Torkington recently wrote the following piece on O’Reilly Radar. He kindly gave us permission to republish it on the Open Knowledge Foundation blog…

In the last year I’ve been involved in two open data projects, Open New Zealand and data.govt.nz. I believe in learning from experience and I’ve seen some signs recently that other projects might benefit from my experience, so this post is a recap of what I’ve learned. It’s the byproduct of a summer reflection on my last nine months working in open data.

Technologists like to focus on technology, and I’m as guilty of that as the next person. When Open New Zealand started, we rushed straight to the “catalogue”. I was part of a smart group of top-notch web hackers–we know what a catalogue is, it’s a web-based database and let’s figure out the UI flow and which fields do we want and hey I can hack one up in Wordpress and I’ll work on the hosting and so on. We spent more time worrying about CSS than we did worrying about the users.

This is the exact analogue of an open source software failure mode: often companies think they can get all the benefits of open source simply by releasing their source code. The best dinner parties are about the other people. Similarly, the best open source projects have great people, attract great people, and the source is simply what they’re working on: necessary but not sufficient. You can build it but they won’t come. All successful open source projects build communities of supportive engaged developers who identify with the project and keep it productive and useful.

Data catalogues around the world have launched and then realised that they now have to build a community of data users. There’s value locked up in government data, but you only realise that value when the datasets are used. Once you finish the catalogue, you have to market it so that people know it exists. Not just random Internet developers, but everyone who can unlock that value. This category, “people who can use open data in their jobs” includes researchers, startups, established businesses, other government departments, and (yes) random Internet hackers, but the category doesn’t have a name and it doesn’t have a Facebook group, newsletter, AGM, or any other way for you to reach them easily.

This matters because it costs money to make existing data open. That sounds like an excuse, and it’s often used as one, but underneath is a very real problem: existing procedures and datasets aren’t created, managed, or distributed in an open fashion. This means that the data’s probably incomplete, the document’s not great, the systems it lives on are built for internal use only, and there’s no formal process around managing and distributing updates. It costs money and time to figure out the new processes, build or buy the new systems, and train the staff.

In particular, government and science are often funded as projects. When the project ends, the funding stops. Ongoing maintenance and distribution of the data hasn’t been budgeted for almost all the data sets we have today. This attitude has to change, and new projects give us the chance to get it right, but most existing datasets are unfunded for maintenance and release.

So while opening all data might be The Right Thing To Do from a philosophical perspective, it’s going to cost money. Governments would rather identify the high-value datasets, where great public policy comment, intra-government optimisation, citizen information, or commercial value can be unlocked. Even if you don’t buy into the cost argument, there’s definitely an order problem: which datasets should we open first? It should be the ones that will give society the greatest benefit soonest. But without a community of users to poll, a well-known place for would-be data consumers to come to and demand access to the data they need, the policy-making parts of governments are largely blind to what data they have and what people want.

That’s not to say that data catalogues aren’t useful. We were scratching an itch–we wanted easier access to government data, so we built the tool that would provide it. The community of data users can be built around the tool. As Krishna was told by Arjuna, “a man must go forth from where he stands. He cannot jump to the Absolute, he must evolve toward it”. I’m just noting that, as with all creative endeavours, we learned about the problem by starting to fix it.

Which brings me to the second big lesson: which problem are we trying to solve? There’s an Open Data movement emerging around governments releasing data. However, there are at least five different types of Open Data groupie: low-polling governments who want to see a PR win from opening their data, transparency advocates who want a more efficient and honest government, citizen advocates who want services and information to make their lives better, open advocates who believe that governments act for the people therefore government data should be available for free to the people, and wonks who are hoping that releasing datasets of public toilets will deliver the same economic benefits to the country as did opening the TIGER geo/census dataset.

The one thing these groups don’t share is an outcome. I can imagine an honest government where the costs of transparency overweigh the costs of corruption (think of the cost of removing every dirt particle from your house). I can imagine PR wins that don’t come from delivering real benefits to citizens, in fact I see this in a recent tweet by Sunlight Labs’s Ellen Miller:

Most of the raw data released by the OGD most likely isn’t for you to use.
She’s grumbling, as does this Washington Post piece, about the results so far from the Open Government Directive, which has prompted datasets of questionable value to be added to data.gov. If this is the future, where’s my flying car? If this is open data, where’s my damn transparency?

There are some promising signs. The UK government data catalogue had a long beta period where developers were working with the data. The UK team built a community as well as a catalogue. That’s not to say that the UK effort is all gold–I saw plenty of frustration with RDF while I was observing the developers–but it stands out simply for the acknowledgement of users. Similarly, the UK’s MySociety defined what success is to them: they’re all about building useful apps for citizens, and open data is a means not an end to them.

So, after nearly a year in the Open Data trenches, I have some advice for those starting or involved in open data projects. First, figure out what you want the world to look like and why. It might be a lack of corruption, it might be a better society for citizens, it might be economic gain. Whatever your goal, you’ll be better able to decide what to work on and learn from your experiences if you know what you’re trying to accomplish. Second, build your project around users. In my time working with the politicians and civil servants, I’ve realised that success breeds success: the best way to convince them to open data is to show an open data project that’s useful to real people. Not a catalogue or similar tool aimed at insiders, but something that’s making citizens, voters, constituents happy. Then they’ll get it.

My next project with Open New Zealand is to build a community of data users. I want to see users supporting each other, I want to build a tight feedback loop between those who want data and those who can provide it, to create an environment where the data users can support each other, and to make it easier to assess the value created by government-released open data. Henry Kissinger said, “each success only buys admission to a more difficult problem”. I look forward to learning what the next problem is.

Communia workshop

We recently attended a workshop in Luxembourg as part of Communia, the EU policy network on the digital public domain. There was a focus on bringing together themes from previous events to make a series of policy recommendations to the European Commission (watch this space!).

Below are a few notes highlighting some of the talks and discussions that we thought might be of particular interest to readers here:

  • We had a meeting to review where we are up to with the Public Domain Calculators. So far it looks like we have 10 EU countries covered, 8 maybe covered and 6 that we are still looking for help with (namely: Cyprus, Denmark, Lithuania, Luxembourg, Slovakia, Slovenia). If you’d like to help out - please drop us a line!
  • Jill Cousins from the European Digital Library Foundation spoke about the latest state of play with respect to licensing the content of Europeana, a collection of over 6 million images, texts, sound recordings and videos. In particular she spoke about the possibility of libraries and cultural heritage organisations releasing digital content into the public domain or under an open license. There has been some opposition - but we very much hope that institutions contributing to Europeana have the foresight to give this serious consideration!
  • Paul Keller and Lucie Guibault presented their work on the recently released public domain manifesto - discussing the rationale behind it, its genesis and various versions, and an overview of its main principles and recommendations. At the time of writing it has been signed by over 50 organisations and 1800 individuals.
  • Francesco Fusaro of the European Commission DG Research spoke about the EU initiatives to support open access to scientific publications and data - from background research in this area to piloting open access to approximately 20% of FP7 funded projects.
  • Patrick Peiffer gave an excellent presentation on licensing options for bibliographic metadata. In particular he suggested that non-commercial restrictions could cause substantial transaction costs and technical complications. On the other hand using an ‘attribution, sharealike’ type license that allowed commercial reuse which would cause no transaction costs, create a level playing field, allow interoperability with projects like Wikimedia and Wikimedia Commons, avoid exclusive deals and open up new channels of discovery. It would be a big step if Europeana libraries and institutions follow the lead of CERN Library, who last week announced that they were opening up their metadata!
  • Mathias Schindler spoke about tools developed by the Wikipedians using open bibliographic metadata. He also described what the Wikipedia community had done to add value to collections of cultural works - such as improving the quality of metadata, adding descriptions to images and so on.
  • Rufus Pollock spoke about his work at the University of Cambridge to estimate the size and value of the public domain in Europe.

See also:

As regular readers of the Open Knowledge Foundation blog will know, bibliographic metadata is a subject close to our heart (see e.g., here, here and here). Hence we were delighted to see today’s announcement that CERN Library are releasing their bibliographic metadata under an open license!

From the announcement:

Librarians are in general very favourable to the principles of Open Access, but surprisingly few libraries have so far set free the data they produce themselves. As one of the first scientific libraries in the world, the CERN Library offers now the bibliographic book records, held in its library catalog, to be freely downloaded by any third party. The records are provided under the Public Domain Data License, a license that permits colleagues around the world to reuse and upgrade the data for any purpose.

Jens Vigen, Head of the CERN Library, says: “Books should only be catalogued once. Currently the public purse pays for having the same book catalogued over and over again. Librarians should act as they preach: data sets created through public funding should be made freely available to anyone interested. Open Access is natural for us, here at CERN we believe in openness and reuse. There is a tremendous potential. By getting academic libraries worldwide involved in this movement, it will lead to a natural atmosphere of sharing and reusing bibliographic data in a rich landscape of so-called mash-up services, where most of the actors who will be involved, both among the users and the providers, will not even be library users or librarians. Our action is made in the spirit of the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities; bibliographic data belongs to the cultural heritage.All other signatories should align their policy accordingly.”

The data of CERN Library will be used by the Open Library Project to provide a webpage for every book and allow users to add content like table of contents, classifications and summaries.

For massive reuse of data, the data will be provided soon by an open Z39.50, SRU and OAI interface via biblios.net, a repository of open bibliographic data.

This is fantastic news - and we hope that other libraries and archives consider following suit and opening up their bibliographic metadata!

We’ve created a new CKAN package for the data at:

Clear Climate Code, and Data

January 28th, 2010

The following guest post is by David Jones who is, among other things, a curator of the climate data group on CKAN (the OKF’s open source registry of open data) and co-founder of Clear Climate Code (which we blogged about back in 2008).

Clear Climate Code have been working on ccc-gistemp, a project to reimplement in clear Python NASA’s GISTEMP. GISTEMP is a global historical temperature analysis, it produces, amongst other things, graphs like this, that tell you whether the Earth is getting warmer or cooler:

Official GISTEMP global anomaly.

Because this graph is important for studying the world’s climate (and determining the signature of global warming), there is a lot of public discussion about where this data comes from. The raw data underlying the graph is surface weather station temperature records. The raw data is processed to produce the data for the graph:

gistemp

The box in the middle, labelled “GISTEMP”, is a process that converts the raw station records into the data for the graph on the right, which is the global temperature anomaly. There are descriptions of this process available, for example Hansen and Lebedeff, 1987. A description is one thing, but it might not tell you everything you need to know. Perhaps the description is sufficiently clear and accurate for you to reproduce the process, perhaps not. The ultimate authority on the process is the source code that implements it, because It’s the source code that is executed in order to produce the processed data. So if you want to know exactly what the process involves, you have to get hold of the source code.

In effect it is the source code that adds value to the raw data to produce processed data. So in a sense, the value of the processed data is embodied in the source code. That’s what makes the source code important.

The source code for GISTEMP is written mostly in Fortran by scientists at NASA, and is available from them. This source code is the working code used by the NASA scientists, it is not necessarily the best source code for explaining how the process works (to an interested and competent member of the general public). There is the question of whether NASA, a publicly funded body, should be paying someone to write code that makes a better tool for communicating with the public (for example by writing better documentation, or writing it in a more exemplary style). I am not going to address that question. The source code NASA use is the source code we have right now.

Our goal at Clear Climate Code is to take this code and produce a new version that is clearer, but does the same thing. We have taken great steps forward towards this goal: We have recently released a version which is all in Python and which reproduces NASA’s results exactly. We think much of this code is already a great deal clearer than the starting material, but we continue to make it clearer. Of course we would welcome your support. If you want to help, please join our mailing list, or you can follow our progress at our blog and on twitter.

The reasons Clear Climate Code chose Python as the implementation language for ccc-gistemp are: accessibility, clarity, and familiarity. By accessible I mean that there is a large community of Python programmers, but also there are several tutorials and other materials for learning Python should you be motivated. Python is used to teach undergraduates programming. Python is relatively clear; it’s deliberately designed to be free of the clutter that imperils other programming languages. It’s certainly possible for people who are not professional programmers to create small programs in Python, and examine and modify existing Python programs. And lastly, it’s familiar; Nick Barnes and I already knew Python when we started the project. This seems like a trivial consideration, but in fact Clear Climate Code is an unpaid project and it’s pretty easy to come up with reasons to do something else instead, so the fact that we already knew Python was important.

Hopefully Clear Climate Code illustrates how both code and data are central to the public understanding of science. For an issue like global warming it is absolutely crucial that public are involved. CKAN’s climate data group is a place where non-specialists can access scientist’s data more easily, and hopefully use it to innovate, do their own hobby science, or create visualisations to better communicate with the public. I’m hoping to add more data sources to the climate data group in the near future, if you’re interested in adding more data to this group, please get in touch.

Public Domain Manifesto

January 27th, 2010

On Monday the Public Domain Manifesto went live:

From the introductory paragraph:

The public domain, as we understand it, is the wealth of information that is free from the barriers to access or reuse usually associated with copyright protection, either because it is free from any copyright protection or because the right holders have decided to remove these barriers. It is the basis of our self-understanding as expressed by our shared knowledge and culture. It is the raw material from which new knowledge is derived and new cultural works are created. The Public Domain acts as a protective mechanism that ensures that this raw material is available at its cost of reproduction - close to zero - and that all members of society can build upon it. Having a healthy and thriving Public Domain is essential to the social and economic well-being of our societies. The Public Domain plays a capital role in the fields of education, science, cultural heritage and public sector information. A healthy and thriving Public Domain is one of the prerequisites for ensuring that the principles of Article 27 (1) of the Universal Declaration of Human Rights (’Everyone has the right freely to participate in the cultural life of the community, to enjoy the arts and to share in scientific advancement and its benefits.’) can be enjoyed by everyone around the world.

The manifesto gives a series of principles and recommendations for promoting and protecting the digital public domain. For example, a point that we have mentioned in the past:

What is in the Public Domain must remain in the Public Domain. Exclusive control over Public Domain works must not be re-established by claiming exclusive rights in technical reproductions of the works, or using technical protection measures to limit access to technical reproductions of such works.

The manifesto was drafted under the auspices of Communia, the EU thematic network for the digital public domain. In particular it was created by Working Group 6, of which I am co-lead. If you support the manifesto, please consider signing it!

Sources of data on data.gov.uk

January 26th, 2010

When data.gov.uk was launched, I had a quick browse around the data, to get a feel for what was in it. Most data sets that I randomly looked at were from statistics.gov.uk (from the Office for National Statistics).

Today, I decided to investigate, and work out some basic statistics about the source of the data. Hopefully this will help find what the interesting new data sets are.

I secretly hoped that I’d have to screenscrape data.gov.uk to work this out. Irony. Luckily, a comment on this blog revealed that there is a handy data dump of all the CKAN data behind data.gov.uk in CSV and JSON formats.

I downloaded the JSON file (21st January 2010 dump) and used basic Unix text processing commands such as grep, sort and uniq to do some calculations.

How many data sets are there, and what protocol are their downloads?

First I did some basic counts, to check how many data sets had a download link, and what protocol the link was in.

Normal HTTP (http://) - 2623 data sets
Secure HTTP (https://) - 178 data sets
No download URL (download_url in the .json dump) - 78 data sets
Total - 2879 data sets

What are the top level domain names of the data sets?

Of the data sets which have a download URL, they are distributed about the following top level domains.

.gov.uk - 2009 data sets
.nhs.uk - 412 data sets
.co.uk - 114 data sets
.org.uk - 79 data sets
.org - 78 data sets
.mod.uk - 34 data sets
.net - 25 data sets
.ac.uk - 14 data sets
.com - 9 data sets
.police.uk - 5 data sets
other (IP, not full qualified domain) - 21 data sets
Total - 2801 domains

Top ten sites the data sets are from

Here are the top domains that download links on data.gov.uk go to. I removed any www from them before analysis, to make sure URLs with and without www were counted together.

257 statistics.gov.uk
245 neighbourhood.statistics.gov.uk
231 hesonline.nhs.uk
176 fti.communities.gov.uk
173 communities.gov.uk
150 wales.gov.uk
125 dcsf.gov.uk
110 scotland.gov.uk
106 nomisweb.co.uk
95 hmrc.gov.uk

First thing to notice is that even including its neighbourhood section, statistics.gov.uk still only counts for about 18% of the total number of data sets. So there is lots else to find in there!

The full table is available here as a file: domain-counts.txt. There are 114 different domains.

What license do the data sets have?

Update:in fact data.gov.uk has its own set of terms and conditions which cover all the datasets on the site. These terms are OKD-compliant as they allow anyone to freely use, reuse and redistribute the data. It would be nice for the license field to reflect this though.

Most are marked as being in a straightforward “crown copyright” section. I’d like to see some work on the licensing, to use more standard licenses, or new OKD compliant license, where possible.

Non-OKD Compliant::Crown Copyright - 2871 data sets
OKD Compliant::UK Click Use PSI - 8 data sets

And a question for you

What interesting data sets have you spotted while browsing about data.gov.uk? Has anything sparked an idea for an application? Have you used any of the new data sets?

Please post in the comments!

Data.gov.uk goes public today, and we’ve very proud that it is using CKAN, our open source registry of open data, to list official UK government datasets (as we announced in October):

We’ve been working closely with the Cabinet Office team to get this out the door, and over 2500 datasets have been released via the site!

In the Cabinet Office press release, Sir Tim Berners-Lee says:

Making public data available for re-use is about increasing accountability and transparency and letting people create new, innovative ways of using it. Government data should be a public resource. By releasing it, we can unlock new ideas for delivering public services, help communities and society work better, and let talented entrepreneurs and engineers create new businesses and services.

The new launch has received lots of press coverage - even making the front page of the BBC news website! Below is a selection:

Data.gov.uk

The following guest post is from Regards Citoyens, a French association of citizens with a shared interest in opening up information about the functioning of democratic institutions in France.

France is lagging behind…

opendatcamp_okv5_r

There is no doubt about it: compared to other countries, France is definitely late in opening up its data. For a country so proud of its human rights and democratic revolution, it took a while before it finally joined the open data movement! The first “Open Data Camp” organized in Paris last December is a good example of this new momentum.

While the US and the UK have taken enormous steps in the past two years with the release of data.gov and data.gov.uk, France and many other southern European countries are still being very conservative about making public data public. To catch up in the world of open data will require more than just a few political measures. French institutions need a drastic change to their approach to the production and dissemination of official data. But nothing will be possible without support, demand and engagement from groups of citizens.

Interesting — and often relatively little known — projects already lead the way. For example, the HAL Archives opens up access to scientific journal articles and IREP offers access to data about pollutants. But this is just a very small fraction of material that is out there. The vast majority of official documents, datasets and publicly funded research remains inaccessible to citizens. Indeed, it can be very difficult for an individual to gain access to specific public documents. In 1978, a committee called CADA was created to provide advice on such demands, but such public services often won’t process the requests easily.

For historical reasons, it is especially difficult to change French officials’ approach to data release. For a very long time, most public data sharing has been done by public administrations classified as EPIC (Etablissement Public à caractère Industriel et Commercial or Public Administration for Industrial and Commercial purposes). These administrations have a prior commercial purpose even though their data are considered public. Examples include key providers of meteorological data and geospatial data. Having both public and commercial purposes, such administrations tend to be interested in making profit from the data by selling it to corporate businesses. Therefore, it can be a real challenge for citizens to get free access to these data and reuse them for civil society projects to strengthen democracy, to increase citizen engagement or to improve the delivery of public services.

The former DJO (Direction des Journaux Officiels or Directorate of Official Publications), now called DILA (Direction de l’Information Légale et Administrative, Directorate of Legal and Administrative Information), is another good example of this situation. This administration is in charge of all legal data including laws issued by the parliament and official government decisions. Before 2002, online access to the French legislation was restricted through a régime de concession à titre onéreux. This means only those able and willing to pay a license, mainly companies like Reuters or Lamy, were allowed to utilise the documents. The situation changed in 2002 and now any individual has access to these key legal documents thanks to LégiFrance. Extra features like an access to the rich XML feed of any legislation modification could be of great help to improve legislative monitoring projects like Regards Citoyens’ Simplify the law. Unfortunately these features are still restricted to users able to pay the fee.

Government initiatives: limited access but not openness

Despite all of this, the global movement for openness has recently taken a radical turn thanks to the data.gov projects, the 2007 EU INSPIRE Directive (planned to be transposed in France in June 2010) and Sweden’s initiative to promote eGovernment projects during its presidency of Europe. All of these seem to have triggered some change within French government’s view of public data and some things have started to change.

A new administration, the DILA, was recently created to replace the DJO and try to impulse an improved production and diffusion of public data. In this context, a new agency called APIE (Agence du Patrimoine Immatériel de l’Etat, the State’s Intangible Heritage Agency) was settled to lead the reflexion, coordinate, estimate and organize a common data effort between the different administrations. The objective is to propose by the middle of 2010 a platform that will promote all different sources of data and describe their respective licenses.

Unfortunately, the French government’s historical lack of openness left an open field to the private sector. Some companies largely benefit of this situation: they make profit out of the data by becoming an intermediate between the administration and data users. A good example of this is the GFII (the Groupement Français de l’Industrie de l’Information, or French Association of Electronic Information Industry). Disappointed in having such difficult contact with the government, this active lobbying group started to take care individually of civil servants’ training, and progressively became the official investor and organizer of training programmes instead of the government. This entry of the private sector into matters of public administration certainly contributed to the APIE’s information licensing decisions: there is an obvious inclination to sell the data to companies without considering the benefits of allowing reuse by citizen driven projects using open licences. This situation is neither good for innovation nor for the production of common knowledge.

Citizen driven open data initiatives in France

logo_redecoupage

Like in many countries, the first steps into open data came from the research and the Free and Open Source Software (F/OSS) communities. WikiMedia France and OpenStreetMap.fr are probably the most popular open knowledge projects in France. Early websites like Mon-Depute.fr — a vote monitoring project created by an archivist — or droit.org — a very active project from l’Ecole des Mines on legal publication — helped a lot to make democratic data available. Our work at Regards Citoyens on parliamentary activity with NosDéputés.fr and on electoral data is a new step for French open data for democracy and civil society.

OpenStreetMap.fr is a very good example of a citizen driven open data project. The Public Land Registry (Cadastre) has a website intended to publish their map, which provides interesting information but not openly. Therefore, some contributors of OpenStreetMap found out how to technically access the raw data. But this still was not enough to open up the data for anyone. So the OSM community studied the legal situation and contacted the French Ministry of Finance in charge of this service. They finally got an answer in January 2009: a global export of their whole database is not allowed, but a partial one is. So hundreds of volonteers began a crowdsourcing effort and OpenStreetMap.fr is now able to free more and more data from the Land Registry.

All of these are good examples that open data is not only about technology: it also often depends on the efforts of a community in order to legally secure the data and encourage others to allow it to be reused for any purpose. That is why we helped organise the first French Open Data Camp in Paris, where more than 120 people came to learn and share their skills. We learned a lot about information visualisation techniques from existing projects and from interesting theoretical ones! We also had a good conversation with activists, ‘hacktivists’, and others about the political, economic and administrative benefits of open data.

The success of this event seems like a pretty good demonstration that France is ready and already made its first steps into the global world of open data. Regards Citoyens will follow these changes and will try to modestly contribute to the global open data movement by working together with international organisations such as the Open Knowledge Foundation. With our fellow “campers”, we are convinced that making public data accessible and reusable will bring great benefits to commercial innovation, democratic organisations, and to civil society.

There has recently been a flurry of activity in the Open Street Map community to improve maps of Haiti to assist humanitarian aid organisations responding to the recent earthquake.

In particular mappers and developers are scouring satellite images to identify collapsed and damaged buildings/bridges, spontaneous refugee camps, landslides, blocked roads and other damaged infrastructure - to help NGOs and international organisations respond more effectively to the crisis.

They have issued a call for assistance:

On January 12 2010, a 7.0 earthquake struck Port-au-Prince. The OpenStreetMap community can help the response by tracing Yahoo imagery and other data sources, and collecting existing data sets below. If you have connections with expat Haitian communities, consider getting in touch to work with them to enter place names, etc.

On Wednesday Mikel Maron wrote to the OSM talk list asking for help. Yesterday several companies authorised the OSM community to use their images.

There have been specific requests for up to date mapping information from humanitarian organisations on the ground. For example, on Wednesday, Nicolas Chavent of the Humanitarian OpenStreetMap Team wrote to the OSM talk list:

I am relaying a mapping requirement grounded in Haiti from GIS practitioners mapping there at the United Nations Office of Coordination of Humanitarian Affairs (UNOCHA): “NEED to map any spontaneous camps appearing in the imagery with size in area”

Recently generated data from Open Street Map has been used in maps by ITHACA (Information Technology for Humanitarian Assistance, Cooperation and Action) and the World Food Programme.

Yesterday evening Mikel Maron reported there had been over 400 edits since the earthquake. At the time of writing it looks like this has now more than doubled to over 800 edits since 12th January.

The following two images - before and after the earthquake - give you an impression of how much the OSM community have been doing!

haiti.osm.pre-event

haiti.osm.20090114180900

For more see: