Open Geoprocessing Standards and Open Geospatial Data
June 21st, 2010
The following guest post is from Lance McKee, who is Senior Staff Writer at the Open Geospatial Consortium (OGC) and a member of the Open Knowledge Foundation’s Working Group on Open Geospatial Data.

As the founding outreach director for the Open Geospatial Consortium (OGC) and now as senior staff writer for the OGC, I have been promoting the OGC consensus process and consensus-derived geoprocessing interoperability standards for sixteen years.
From the time I first learned about geographic information systems in the mid-1980’s, I have been fascinated by the vision of an ever-deepening accumulation of onion-like spatial data layers covering the Earth.
For those unfamiliar with geographic information systems (GIS): a “spatial data layer” is a digital map that can be processed with other maps of the same geographic area. With an elevation map and a road map, for example, you can derive a road slope map. Today, geospatial information has escaped the confines of the GIS to become a ubiquitous element of the world’s information infrastructure. This is largely a result of standards: Communication means transmitting or exchanging through a common system of symbols, signs, or behavior. Standardization means agreeing on a common system. OGC runs an open standardization process, and OGC standards enable communication between GISs, Earth imaging systems, navigation systems, map browsers, geolocated sensors, databases with address fields etc.
I was disappointed when I discovered that, in practice, despite extraordinary advances in technical capabilities for data sharing, much of the geospatial data created by scientists, perhaps most of it (other than data from civil agencies’ satellite-borne imaging systems), never becomes available to their colleagues. This lack of open access to geospatial data seems to me to be more tragic than the lack of open access to other kinds of scientific data, not only because humanity faces critical environmental challenges, but also because all geospatial data refer to the same Earth, and thus every new data layer is rich with possibilities for exploration of relationships to other data layers. I am, therefore, very glad that the Panton Principles have been published and a geospatial open access working group has been established.
In preparation for eventually writing an article on the subject of open access to geospatial data, working with a few OGC member representatives (special thanks to Simon Cox of CSIRO) and OGC staff, I collected a list of 17 reasons why scientists’ geospatial data ought to be published online, with metadata registered in a catalog, using OGC interoperability standards. (The 17 reasons are appended to this blog entry.)
In January I put these reasons into slides that I used in a talk at the Marsh Institute at Clark University in Worcester, Massachusetts. After briefly stating each reason, I explained how OGC standards and the progress of information technology make open access feasible. I provided evidence that the geosciences are rapidly moving in the direction of open access, and I offered ideas on how academics might contribute to and benefit from this progress.
I’m quite sure the Panton Principles are consistent with the goals of the geoscientists in the OGC. But I hasten to add that I am not speaking for them, and most of the 390+ OGC members are not geoscience organizations; most are technology providers, data providers and technology users with other roles in the geospatial technology ecosystem. But this diversity makes the OGC, I think, a particularly valuable “idea space” for academics who have an interest in open access to geospatial data and services. (Services are the future. A land use change model, for example, is a service when it is made available online “in the cloud” for others to use without downloading.)
One domain in the OGC that has value for open science is the work of the OGC Geo Rights Management Working Group (GeoRM WG). The Panton Principles discourage the use of licenses that limit commercial re-use or limit the production of derivative works, because the authors recognize the value of integrating and re-purposing datasets and enabling commercial activities that could be used to support data preservation. That’s important with respect to geospatial data, both because they are so often integrated and repurposed and because geospatial data sets are often complex and voluminous and thus potentially more expensive to curate than other kinds of data. The GeoRM WG has written a remarkable document, the GeoDRM Reference Model for use in developing standards for management of digital rights in the complex area of geospatial data and services. I think this will be a key resource as open access to geospatial data unfolds. The GeoDRM Reference Model provides a technical foundation necessary for implementing the Panton Principles.
Another valuable domain within the larger OGC idea space is the OGC Sensor Web Enablement (SWE) activity. Most geospatial data are collected by means of sensors, and thus it is important in the geosciences to have rigorous standard ways to describe sensors and sensor data in human-readable and machine-readable form. It is also important to have standard ways to schedule sensor tasks and aggregate sensor readings into data layers. Use of SWE standards is becoming important in some scientific areas such as ocean observation, hydrology and meteorology.
Both Web-resident sensors and data collections can be published and discovered by means of catalogs that implement the OGC Catalog Services - Web Interface Standard. This standard will likely become an integral infrastructure element for open access to geospatial data. It is designed to work with the ISO geospatial metadata standards, but those who begin implementing in this area discover that some work remains to make those standards more generally useful.
There are, in fact, many technical and institutional obstacles to overcome before science becomes as empowered by information technology as other estates such as business and entertainment. Technical interoperability obstacles are being overcome in the OGC by groups working in technology domains such as geosemantics, workflow, grid computing, data quality and oblique imagery; and in application domains such as hydrology, meteorology and Earth system science. Overcoming technical obstacles often precedes the obsolescence of institutional policies that stand as obstacles to progress.
I recently read Richard Ogle’s “Smart World,” a book about the new science of networks. In network terms, the OGC is a “hub” in an “open dynamic network”. What were once weak links between the OGC and other hubs such as the World Meteorological Organization and the International Environmental Modeling & Software Society (iEMSs) have been strengthened, and these stronger links make both the OGC and its partner hubs more likely to form new connections with other hubs. Hubs that directly contribute to digital connectivity, as the OGC does, have a special “pizzazz,” I would say. (I haven’t yet mastered the network science vocabulary). It seems to me the Open Knowledge Foundation and the Science Commons are hubs or idea spaces with a bright future of rich connections, and I look forward to seeing what connections they form with the OGC.
17 Reasons why scientific geospatial data should be published online using OGC standard interfaces and ISO standard metadata
Reason 1: Data transparency
Science demands transparency regarding data collection methods, data semantics, and processing methods. Rigor, documented!
Reason 2: Verifiability
Science demands verifiability. Any competent person should be able to examine a researcher’s data to see if those data support the researcher’s conclusions.
Reason 3: Useful unification of observations
Being able to characterize, in a standardized human-readable and machine-readable way, the parameters of sensors, sensor systems and sensor-integrated processing chains (including human interventions) enables useful unification of many kinds of observations, including those that yield a term rather than a number.
(From Simon Cox, JRC Europe and CSIRO Australia, editor of ISO 19156 (Observations and Measurements), coordinator of One-Geology geoinformatics, a designer of GeoSciML, and chair of the OGC Naming Authority.)
Reason 4: Data Sharing & Cross-Disciplinary Studies
Diverse data sets with well documented data models can be shared among diverse information communities*. Cross-disciplinary data sharing provides improved opportunities for cross-disciplinary studies.
OGC defines an information community as a group of people (such as a discipline or profession) who share a common geospatial feature data dictionary, including definitions of feature relationships, and a common metadata schema.
Reason 5: Longitudinal studies
Archiving, publishing and preserving well-documented data yields improved opportunities for longitudinal studies. As data formats, data structures, and data models evolve, scientists will need to access historical data and understand the assumptions so that meaningful scientific comparisons can be conducted. Community standards will help ensure long-term consistency of data representation.
Reason 6: Re-use
Open data enables scientists to re-use or repurpose data for new investigations, reducing redundant data collection and enabling more science to be done.
Reason 7: Planning
Open data policies enable collaborative planning of data collection and publishing efforts to serve multiple defined and yet-to-be-defined uses.
Reason 8: Return on investment
With open data policies, institutions and society overall will see greater return on their investment in research.
Reason 9: Due diligence
Open data policies will help research funding institutions perform due diligence and policy development.
Reason 10: Maximizing value
The value of data increases with the number of potential users*. This benefits science in a general way. It also creates opportunities for businesses that will collect, curate (document, archive, host, catalog, publish), and add value to data.
Similar to Metcalf’s law: “The value of a telecommunications network is proportional to the square of the number of connected users of the system.”
Reason 11: Data Discoverability
Open data is discoverable data. Data are not efficiently discovered through literature searches. Searches of data registered using ISO-standard XML-encoded metadata can be efficient and fine-grained.
Reason 12: Data Exploration
Robust data descriptions and quick access to data will enable more frequent and rapid exploration of data – [“natural experiments”]((http://en.wikipedia.org/wiki/Natural_experiment) – to explore hypothetical spatial relationships and to discover unexpected spatial relationships.
Reason 13: Data Fusion
Open data improves the ability to “fuse” in-situ measurements with data from scanning sensors. This bridges the divide between communities using unmediated raw spatial-temporal data and communities using spatial-temporal data that is the result of a complex processing chain.
(From Simon Cox)
Reason 14: Service chaining
Open data (and open online processing services) will improve scientists’ ability to “chain” Web services for data reduction, analysis and modeling.
Reason 15: Pace of science
Open data enables an accelerated pace of scientific discovery, as automation and improved institutional arrangements give researchers more time for field work, study and communication.
“Changes to the Earth that used to take 10,000 years now take three, one reason we need real-time science. … Governances must be able to see and act upon key intervention points.” Brian Walker, Program Director Resilience Alliance and a scientist with the CSIRO, Australia
Reason 16: Citizen science & PR
Open science will help Science win the hearts and minds of the non-scientific public, because it will make science more believable and it will help engage amateur scientists – citizen scientists – who contribute to science and help promote science. It will also increase the quality and quantity of amateur scientists’ contributions.
Reason 17: Forward compatibility
Open Science improves the ability to adopt and utilize new/better data storage, format, discovery, and transmission technologies as they become available.
(Offered to OGC’s David Arctur for this list on 6 January 2010 by Sharon LeDuc, Chief of Staff, NOAA’s National Climatic Data Center, Asheville, North Carolina, USA.)
(Another reason – cross-checking for sensor accuracy — occurred to me while writing this post.)
New report on sharing aid information is now open for comments
September 21st, 2009
We’re pleased to announce the publication of a new report, Unlocking the potential of aid information. The report, by the Open Knowledge Foundation and Aidinfo, looks at how to make information related to international development (i) legally open, (ii) technically open and (iii) easy to find.
The report and relevant background information can be found at:
It aims to inform the development of a new platform for publishing and sharing aid information:
The International Aid Transparency Initiative (IATI) aims to improve the availability and accessibility of aid information by designing common standards for publication of info about aid. It’s is not about creating another database on aid activities, but creating a platform that will enable existing databases – and potential new services – to access this aid information and create compelling application providing more detailed, timely, and accessible information about aid.
The idea of openness is crucial to creating this platform and achieving transparency. Information must be openly available with as few restrictions in how the information is accessed and used as possible. To this end, we need to design a technical architecture that enables information to be published and accessed in an open way.
There are three main recommendations in the report, which are as follows:
- Recommendation 1 - Aid information should be legally open. The standard should require a core set of standard licenses for pubishing aid information under. It should require that either:
- (i) information is published under one of a small number of recommended options:
- Licenses for content: Creative Commons Attribution or Attribution Sharealike license
- Legal tools for data: Open Data Commons Public Domain Dedication and License (PDDL), Open Data Commons Open Database License (ODbL) or Creative Commons CC0
- or that (ii) information is published using a license/legal tool that is compliant with a standard such as the Open Knowledge Definition.
- Recommendation 2 - Aid information should be technically open. The standard should require that raw data is made available in bulk (not just via an API or web interface) with any relevant schema information and either:
- (i) in one of a small number of recommended formats:
- Text: HTML, ODF, TXT, XML
- Data: CSV, XML, RDF/XML
- or (ii) in a format:
- (a) which is machine readable and
- (b) for which the specification is publicly and freely available and usable
- Recommendation 3 - Aid information should be easily findable. The standard should require that aid organisations add their knowledge assets to a registry with some basic metadata describing the information.
We are now welcoming comments on the report until Sunday 1st November 2009. To submit comments you can:
- Directly annotate the documents with your comments:
- Submit your comments for discussion on the open development mailing list.
- Email your comments to info at okfn dot org.
Interview with Rufus Pollock on NetSquared
January 27th, 2009
Jed Sundwall of Netsquared just published an interview with Rufus Pollock, co-founder of the Open Knowledge Foundation.
The interview includes discussion about the distinction between price and value, about the Open Knowledge Definition, about CKAN, about decentralised approaches to working with large quantities of data, about packaging for knowledge and about ‘Shiny Front End Syndrome’. It ends with 3 suggestions for people publishing collections of content or data.
Here’s an excerpt:
Well, one day soon we’re going to have a lots of material that is open and what’s really exciting about open stuff is that it can easily be shared and recombined. That means we can break very complicated problems down into small bits, which people can manage. But then, we can put it back together again. So, let’s say you were interested in U.S. unemployment, a hot topic, and you’re interested in understanding how it changes. Maybe there’s a data site out there just on unemployment itself. But maybe there’s another one on house repossessions or the housing market, and then, there’s another one on manufacturing. There are a whole bunch of different data sites.
Now, maybe one person could just maintain them all but that might become too big a job. You may need expertise in the housing market to maintain the housing data site, but you really want to bring these together often when you want to do analysis, or compute things, or make pretty pictures, or whatever it is you want to do. This is very similar to building a large building, let’s say, or developing an operating system plus all the applications to use. Maybe one person could build them all and make sure they all work together but that would be quite a big task. Even the world’s greatest monopolist struggles to do this effectively.
So, the typical way we go about doing this is by exploiting divide and conquer. But when you divide stuff up, there was this question about how you bring it back together. So then, we say we’re moving toward a world where you can start getting lots of these data sets and then start putting them out there in the world. They can just start taking this unemployment data or this housing data. But, how do you find that and how do you get a hold of it? So often in software, there’s been this tradition of building some kind of registry where you can find things, and then you start to impose some structure on that material, you start packaging. So rather than just saying: here’s my website, here’s my Wiki, look, there’s lots of data on it, you are going to start packaging that data in a slightly more structured form.
The point of CKAN is to start saying, look, there’s a better way than just having our stuff in wikis or in some random form on a website. We can start registering this material, and packaging it up a bit. That way other people, when they want them, can come and get hold of them easily and wheel of reuse can start to turn.
What Obama can do to promote openness
January 20th, 2009
With the inauguration of US President-Elect Barack Obama later today - we thought we’d prepare a brief list of things he can do to promote openness in his new role.
- Open government data. Make core government data open (as in opendefinition.org) - so that it can be re-used in mashups, visually represented, used in semantic web applications and so on! This idea is currently in 5th place on the Obama CTO site with over 5,800 votes.
- Open access to publicly funded research. As suggested by Open Knowledge Foundation Advisory Board member, Peter Suber: “Require open access to the results of non-classified research funded by taxpayers. Extend the exemplary policy now in place at the NIH to all federal agencies.”. Currently in 12th place on ObamaCTO with over 1,600 votes.
- Publish public information in way which makes it easy to re-use. For example, publish in XML or Text/CSV, not PDF files which data must be extracted from. Allow direct, bulk downloading, rather than access through an API or piecemeal access via a web service. (For more on this see our post Give Us the Data Raw, and Give it to Us Now.)The Data Catalogue of Vivek Kundra’s Office in the District of Columbia is a great example of this.
- Legal and licensing clarity. Be clear about what can and can’t be done with public content and data - with explicit legal and licensing statements, terms of use, and so on. Be clear what is in the public domain and what is free for re-use as long as attribution is given. Be clear about what is not available for use - including material where copyright is held by third parties. Fine grained permissions - with clear terms for each document and dataset - are better than blanket statements, which require each case to be investigated individually!
- Make it open by default. Make public content and data - whether its government data, or publicly funded digitisation of cultural heritage artefacts - open by default. Though this is not appropriate for everything, consider allowing as much as possible to be re-used. Think of the ‘Principle of Many Minds’ - there are lots of interesting things that can be done with a given document or dataset that you may not have thought of!
Dispatches from Digistan
May 14th, 2008
Chris Puttick of OpenArchaeology sends news of the Digital Standards Organisation:
A new group is being formed to promote open digital standards, starting with a declaration regarding the importance of digital standards being truly open.
Part of Digistan’s effort to promote understanding, development, and adoption of open digital standards implies a clear definition of what “open” implies in standards terms. Accompanied by a list of conformant open standards, this has the potential to be used as an equivalent of the opendefinition.org Open Knowledge Definition or the freedomdefined.org Free Cultural Works definition
However the current approach looks different and consists of “metrics” to assess relative “openness”. It’s early days and not immediately clear how this will work - can standards score negative points for unclear status on patent grants, or RAND terms? Surely positive criteria for openness in a metric would, taken as a whole, constitute an open definition? The creators hope that this approach by transcending debate about a single definition of open standard, the project will promote informed discussion about the value of standards in a way that encourages users to participate.
It’s also not clear to what extent Digistan’s interest will be focused on open formats for data and digital media, and how far that will reach out to “standards” in general - which might help simplify the debate over “one definition”. As open standards are the cornerstone of a viable free software approach to open data, an effort to produce a clear open definition that different interest groups can agree on and rally around would be welcome.
Among the founders of Digistan are some FFII representatives and, interestingly, Andrew Updegrove, the standards consortium lawyer and blogger whose writings were a deep mine of useful information about the OOXML controversy. Collectively they are asking people to sign up to their Hague Declaration in support of the following (less the preamble):
We call on all governments to:
- Procure only information technology that implements free and open standards;
- Deliver e-government services based exclusively on free and open standards;
- Use only free and open digital standards in their own activities.
