Why open geodata in an open source software foundation?
August 1st, 2006
I was lucky enough to be able to attend the pre-OSCON meeting of FLOSS Foundations - a group of people too-intimately involved in the management of free and open source software foundations - representing OSGeo. I gave a short talk on the subject of why a free and open source software foundation finds itself engaging in open access to data efforts, through the nascent Open Geodata Committee within OSGeo, and why it matters so much to us. Here’s the writeup of my notes:
OSGeo is an Apache-inspired software foundation for free and open source geospatial tools - Geographic Information Systems, web mapping, tools for spatial applications. It was started when Autodesk decided for strategic reasons that the future of their web mapping software was in Open Source; they approached the Open Source GIS community, which had been thinking about starting a foundation for awhile, and offered the use of their formidable marketing machine. OSGeo has 8 projects in incubation, about another 8 circling around. As well as supporting software development projects, there are a couple of non-software committees (activity groups) within OSGeo - education, and geodata.
Why is a software foundation supporting open access to geodata activities? Partly because the domain demands data in order to do work - if you’re working on apache, you can just start writing HTML pages; if you’re working on openoffice, you can start creating a document - but it’s impossible to develop or to usably distribute Open Source GIS software without real world data to test against.
There’s a gaping disparity in different countries regarding geodata in the public domain. Most projects test and distribute with US-published data, because it’s not available to them where they physically are. Even if developers are hooked into local government data sharing agreements, they still have no freedom to redistribute those data sets with their packages.
As an independent free software developer with a bug in my head about open access to geodata, I had two potential strategies. I could run around talking to people with data holdings, and people working to gain access to those data holdings, figure out what legislation was holding them back, and try to raise awareness of the issues involved - thus OKFN Open Geodata efforts and PublicGeodata.org. Or I could run around with a GPS unit, share my tracks online and spend days painstakingly tracing and annotating free of copyright models of the world around me, contributing to OpenStreetmap.org.
Getting involved with OSGeo and the geodata group there has provided a set of “middle ways” which are much more pragmatic, and (I hope) have much more chance of a positive impact on the policy process through the medium of free software. Now it’s becoming possible to start talking with agencies who are actively looking to release more data on an open licensed basis (such as Canada’s GeoConnections) and with groups who are actively working towards open distribution platforms (such as Harlan Onsrud’s Geodata Commons group at the University of Maine). The software foundation can provide a ’safe space’ to bring a lot of projects together, as it’s already providing for software.
It’s also becoming possible to work on geodata distribution, archiving and reuse in a much more pragmatic, standards-driven context - working with people who’ve had a hand in the industry standards process - and with people who are converging on simple standards for redistribution that aren’t really being thought about yet within proprietary software, but that open source software is running into a lot of needs for. A bootstrap project like OSGeo’s Geodata Repository at telascience takes full advantage of the software in the foundation; should provide both a demo showcase for the software, and enable the building of more interesting and timely data packages for them. This effort faces some fun challenges, that I hope will be applicable to more than geodata, but that open data generally will start running into and need “spike solutions” for:
- Syndicated distribution
We’re dealing with massive quantities of data needing a lot of bandwidth. Often people are intensely interested in data that’s spatially nearby them. Distributed caching, tiling and bittorrent like streaming schemes are on the wishlist/todolist. Industry standards don’t tend to look at this space, where a lot of people have a small amount of resource to share, rather than a lot of resources being published in one place. - Pragmatic metadata
People working on distribution need to know what they’ve got, for data-verifiability, certainty reasons, and potentially also for legal ones. The better metadata coverage is, the better prospects for easy re-use, and being able to find more things like the things you’ve already got. Standards tend to overfocus on production of metadata and underfocus on consumption and distribution of it. - Easier discovery
Discovery is the better part of access; open data to be really useful needs to be easily findable.
Open source, open standards and open data are in this worldview a kind of triad, mutually reinforcing; without really planning it, an open-source-like development process is growing around peoples’ needs for data standards, in particular. A couple of times in the last few weeks several different groups of people from different projects have appeared at our IRC meetings with similar needs, looking for a lowest common denominator implementation - in web map tile caching, in simple web-addressable geodata discovery services for the web. OSGeo’s geodata group is providing a “safe space” for developers to cooperate on needs driven - not standards exactly, but common behaviours with ongoing agreement. As common usage patterns get picked up in more places, they can be usefully formalised, even if just on a wiki page, and the wider the usage, the better the stability. I’m thinking of test specifications rather than standards - if your code+data pass a set of well documented tests, then you’re guaranteed common behaviour.
Finally a few words about open data licensing issues - different groups producing open geodata are using a smorgasbord of licenses - many use Creative Commons - Attribution - Sharealike, though it’s arguably not really appropriate for geodata - for any work which one can excerpt pieces of without having a “missing whole”. The Open Knowledge Definition is an effort to cut through licensing discussions - in the same way as stipulating ‘OSI approved’ or ‘complies with Free Software Definition principles’ can establish commonality for data licenses. (Freedom Defined is another nascent open data definition effort.) One important element of an “OKD-compliant” data license is that it must guarantee the potential for commercial re-use of the data - “open” data that comes with a “noncommercial” caveat is not truly open…
That was the braindump; there’s probably more I could add to it; but I should be writing more code and less words…
The Barrington Atlas of the Greek and Roman World
June 26th, 2006
It took 12 years to produce (1988-2000) and cost 4.5 million dollars (according to its editor Richard Talbert). It has a whole page dedicated to listing donors and supporters of the project. It recruited seventy-three compilers, with ten regional editors with ninety-five reviewers and twenty-two cartographers. It is 148 pp. long and with companion gazetteer comes in at $350.00 (if you take the gazetteer on paper — 1,383 pp. — it comes down to $150.00).
This implies a unit price to fixed cost ratio of 1 to 10,000 which is likely to be the rough number of copies they expect to sell. Given the knowledge stored up in this work and that it seems to have been funded up-front to a large extent this seems a very inefficient way to disseminate it. I can’t help wondering what would happen if they made a digital version of this work open, free for anyone to use and reuse.
In Brussels for Committee Vote on the INSPIRE Directive
March 24th, 2006
By fortuitous coincidence I was in Brussels earlier this week in the run-up to the ENVI committee vote on the INSPIRE directive. The OKF has been actively supporting the Public Geodata campaign and finding myself with some time spare this seemed to a perfect opportunity to do some last minute contacting of MEPs as well as to attend the actual vote.
Thankfully, as had been hoped given the rapporteur’s line, the vote went well and we now need to focus on demonstrating to national governments the commercial and social benefits of freer access to state-collected geodata.
Public Geospatial Data and the OSGeo Foundation
March 20th, 2006
I admit that I vacillated for a while over being nominated to the board of the Open Source Geospatial Foundation. The idea clicked for me when I realised that I would want to put at least as much time into the Public Geospatial Data Project there as into a board membership role; and the PGDP’s mission looks very like the Open Geodata one I’ve been on with OKFN.
It looks a potentially huge remit, but much groundwork has been laid in all these directions, and I think there’s a lot of forward energy behind it already. There are a lot of really committed, really aware people in the open source geospatial community, particularly the denizens of the OpenSDI list, with a lot of experience in building public information systems between them. There is a deep awareness of the importance of open access to data and the rights to reuse and redistribute it. So many of the tools in the domain of Geographic Information Systems are oriented to sharing information and recombining it in interesting ways; and the contrast between the US and European approaches to redistributing state-collected geographic information, really brings the issue into sharp relief. In the US, free software developers can build so much more with public domain information, and the prospects for extracting economic value from open source and open data activity become much clearer. Outside the US, people are driven by the knowledge of how much knowledge they lack access to, to build participatory mapping projects like OpenStreetmap that can usefully inform how bodies of public information can be maintained collectively by the public.
I believe we can create a framework for the “top-down” and “ground-up” approaches to work in a complementary way, to promote open public maintenance of public information. Perhaps the PGDP can find a state agency that’s already hoping to move into an open-source, open-access stance to work with on building a prototype implementation which can be documented, analysed and held up as a leading example. (This would be perhaps most likely to happen in Canada, where providing more open access to public geodata is part of the policy direction already.) It’s my hope that being in a foundation can provide the weight to make something like this happen.
Open Letter from Public Geodata
March 17th, 2006
An Open Letter regarding the INSPIRE Directive to Members of the ENVI Committee in the European Parliament was published by Public Geodata yesterday. OKFN has been providing support resources to Public Geodata as part of the Open Geodata awareness raising effort, and Rufus has been doing a fantastic job of sanitising our rhetoric for official consumption.
Excited coverage on boingboing.net got a lot more people coming to sign the petition, which is really great. I talk a bit more on the Mapping Hacks blog about recent coverage of the issue and related awareness efforts.
