Why open geodata in an open source software foundation?
I was lucky enough to be able to attend the pre-OSCON meeting of FLOSS Foundations – a group of people too-intimately involved in the management of free and open source software foundations – representing OSGeo. I gave a short talk on the subject of why a free and open source software foundation finds itself engaging in open access to data efforts, through the nascent Open Geodata Committee within OSGeo, and why it matters so much to us. Here’s the writeup of my notes:
OSGeo is an Apache-inspired software foundation for free and open source geospatial tools – Geographic Information Systems, web mapping, tools for spatial applications. It was started when Autodesk decided for strategic reasons that the future of their web mapping software was in Open Source; they approached the Open Source GIS community, which had been thinking about starting a foundation for awhile, and offered the use of their formidable marketing machine. OSGeo has 8 projects in incubation, about another 8 circling around. As well as supporting software development projects, there are a couple of non-software committees (activity groups) within OSGeo – education, and geodata.
Why is a software foundation supporting open access to geodata activities? Partly because the domain demands data in order to do work – if you’re working on apache, you can just start writing HTML pages; if you’re working on openoffice, you can start creating a document – but it’s impossible to develop or to usably distribute Open Source GIS software without real world data to test against.
There’s a gaping disparity in different countries regarding geodata in the public domain. Most projects test and distribute with US-published data, because it’s not available to them where they physically are. Even if developers are hooked into local government data sharing agreements, they still have no freedom to redistribute those data sets with their packages.
As an independent free software developer with a bug in my head about open access to geodata, I had two potential strategies. I could run around talking to people with data holdings, and people working to gain access to those data holdings, figure out what legislation was holding them back, and try to raise awareness of the issues involved – thus OKFN Open Geodata efforts and PublicGeodata.org. Or I could run around with a GPS unit, share my tracks online and spend days painstakingly tracing and annotating free of copyright models of the world around me, contributing to OpenStreetmap.org.
Getting involved with OSGeo and the geodata group there has provided a set of “middle ways” which are much more pragmatic, and (I hope) have much more chance of a positive impact on the policy process through the medium of free software. Now it’s becoming possible to start talking with agencies who are actively looking to release more data on an open licensed basis (such as Canada’s GeoConnections) and with groups who are actively working towards open distribution platforms (such as Harlan Onsrud’s Geodata Commons group at the University of Maine). The software foundation can provide a ‘safe space’ to bring a lot of projects together, as it’s already providing for software.
It’s also becoming possible to work on geodata distribution, archiving and reuse in a much more pragmatic, standards-driven context – working with people who’ve had a hand in the industry standards process – and with people who are converging on simple standards for redistribution that aren’t really being thought about yet within proprietary software, but that open source software is running into a lot of needs for. A bootstrap project like OSGeo’s Geodata Repository at telascience takes full advantage of the software in the foundation; should provide both a demo showcase for the software, and enable the building of more interesting and timely data packages for them. This effort faces some fun challenges, that I hope will be applicable to more than geodata, but that open data generally will start running into and need “spike solutions” for:
- Syndicated distribution
We’re dealing with massive quantities of data needing a lot of bandwidth. Often people are intensely interested in data that’s spatially nearby them. Distributed caching, tiling and bittorrent like streaming schemes are on the wishlist/todolist. Industry standards don’t tend to look at this space, where a lot of people have a small amount of resource to share, rather than a lot of resources being published in one place.
- Pragmatic metadata
People working on distribution need to know what they’ve got, for data-verifiability, certainty reasons, and potentially also for legal ones. The better metadata coverage is, the better prospects for easy re-use, and being able to find more things like the things you’ve already got. Standards tend to overfocus on production of metadata and underfocus on consumption and distribution of it.
- Easier discovery
Discovery is the better part of access; open data to be really useful needs to be easily findable.
Open source, open standards and open data are in this worldview a kind of triad, mutually reinforcing; without really planning it, an open-source-like development process is growing around peoples’ needs for data standards, in particular. A couple of times in the last few weeks several different groups of people from different projects have appeared at our IRC meetings with similar needs, looking for a lowest common denominator implementation – in web map tile caching, in simple web-addressable geodata discovery services for the web. OSGeo’s geodata group is providing a “safe space” for developers to cooperate on needs driven – not standards exactly, but common behaviours with ongoing agreement. As common usage patterns get picked up in more places, they can be usefully formalised, even if just on a wiki page, and the wider the usage, the better the stability. I’m thinking of test specifications rather than standards – if your code+data pass a set of well documented tests, then you’re guaranteed common behaviour.
Finally a few words about open data licensing issues – different groups producing open geodata are using a smorgasbord of licenses – many use Creative Commons – Attribution – Sharealike, though it’s arguably not really appropriate for geodata – for any work which one can excerpt pieces of without having a “missing whole”. The Open Knowledge Definition is an effort to cut through licensing discussions – in the same way as stipulating ‘OSI approved’ or ‘complies with Free Software Definition principles’ can establish commonality for data licenses. (Freedom Defined is another nascent open data definition effort.) One important element of an “OKD-compliant” data license is that it must guarantee the potential for commercial re-use of the data – “open” data that comes with a “noncommercial” caveat is not truly open…
That was the braindump; there’s probably more I could add to it; but I should be writing more code and less words…