Open Geoprocessing Standards and Open Geospatial Data

The following guest post is from Lance McKee, who is Senior Staff Writer at the Open Geospatial Consortium (OGC) and a member of the Open Knowledge Foundation‘s Working Group on Open Geospatial Data.

OGC meeting

As the founding outreach director for the Open Geospatial Consortium (OGC) and now as senior staff writer for the OGC, I have been promoting the OGC consensus process and consensus-derived geoprocessing interoperability standards for sixteen years.

From the time I first learned about geographic information systems in the mid-1980’s, I have been fascinated by the vision of an ever-deepening accumulation of onion-like spatial data layers covering the Earth.

For those unfamiliar with geographic information systems (GIS): a “spatial data layer” is a digital map that can be processed with other maps of the same geographic area. With an elevation map and a road map, for example, you can derive a road slope map. Today, geospatial information has escaped the confines of the GIS to become a ubiquitous element of the world’s information infrastructure. This is largely a result of standards: Communication means transmitting or exchanging through a common system of symbols, signs, or behavior. Standardization means agreeing on a common system. OGC runs an open standardization process, and OGC standards enable communication between GISs, Earth imaging systems, navigation systems, map browsers, geolocated sensors, databases with address fields etc.

I was disappointed when I discovered that, in practice, despite extraordinary advances in technical capabilities for data sharing, much of the geospatial data created by scientists, perhaps most of it (other than data from civil agencies’ satellite-borne imaging systems), never becomes available to their colleagues. This lack of open access to geospatial data seems to me to be more tragic than the lack of open access to other kinds of scientific data, not only because humanity faces critical environmental challenges, but also because all geospatial data refer to the same Earth, and thus every new data layer is rich with possibilities for exploration of relationships to other data layers. I am, therefore, very glad that the Panton Principles have been published and a geospatial open access working group has been established.

In preparation for eventually writing an article on the subject of open access to geospatial data, working with a few OGC member representatives (special thanks to Simon Cox of CSIRO) and OGC staff, I collected a list of 17 reasons why scientists’ geospatial data ought to be published online, with metadata registered in a catalog, using OGC interoperability standards. (The 17 reasons are appended to this blog entry.)

In January I put these reasons into slides that I used in a talk at the Marsh Institute at Clark University in Worcester, Massachusetts. After briefly stating each reason, I explained how OGC standards and the progress of information technology make open access feasible. I provided evidence that the geosciences are rapidly moving in the direction of open access, and I offered ideas on how academics might contribute to and benefit from this progress.

I’m quite sure the Panton Principles are consistent with the goals of the geoscientists in the OGC. But I hasten to add that I am not speaking for them, and most of the 390+ OGC members are not geoscience organizations; most are technology providers, data providers and technology users with other roles in the geospatial technology ecosystem. But this diversity makes the OGC, I think, a particularly valuable “idea space” for academics who have an interest in open access to geospatial data and services. (Services are the future. A land use change model, for example, is a service when it is made available online “in the cloud” for others to use without downloading.)

One domain in the OGC that has value for open science is the work of the OGC Geo Rights Management Working Group (GeoRM WG). The Panton Principles discourage the use of licenses that limit commercial re-use or limit the production of derivative works, because the authors recognize the value of integrating and re-purposing datasets and enabling commercial activities that could be used to support data preservation. That’s important with respect to geospatial data, both because they are so often integrated and repurposed and because geospatial data sets are often complex and voluminous and thus potentially more expensive to curate than other kinds of data. The GeoRM WG has written a remarkable document, the GeoDRM Reference Model for use in developing standards for management of digital rights in the complex area of geospatial data and services. I think this will be a key resource as open access to geospatial data unfolds. The GeoDRM Reference Model provides a technical foundation necessary for implementing the Panton Principles.

Another valuable domain within the larger OGC idea space is the OGC Sensor Web Enablement (SWE) activity. Most geospatial data are collected by means of sensors, and thus it is important in the geosciences to have rigorous standard ways to describe sensors and sensor data in human-readable and machine-readable form. It is also important to have standard ways to schedule sensor tasks and aggregate sensor readings into data layers. Use of SWE standards is becoming important in some scientific areas such as ocean observation, hydrology and meteorology.

Both Web-resident sensors and data collections can be published and discovered by means of catalogs that implement the OGC Catalog Services – Web Interface Standard. This standard will likely become an integral infrastructure element for open access to geospatial data. It is designed to work with the ISO geospatial metadata standards, but those who begin implementing in this area discover that some work remains to make those standards more generally useful.

There are, in fact, many technical and institutional obstacles to overcome before science becomes as empowered by information technology as other estates such as business and entertainment. Technical interoperability obstacles are being overcome in the OGC by groups working in technology domains such as geosemantics, workflow, grid computing, data quality and oblique imagery; and in application domains such as hydrology, meteorology and Earth system science. Overcoming technical obstacles often precedes the obsolescence of institutional policies that stand as obstacles to progress.

I recently read Richard Ogle’s “Smart World,” a book about the new science of networks. In network terms, the OGC is a “hub” in an “open dynamic network”. What were once weak links between the OGC and other hubs such as the World Meteorological Organization and the International Environmental Modeling & Software Society (iEMSs) have been strengthened, and these stronger links make both the OGC and its partner hubs more likely to form new connections with other hubs. Hubs that directly contribute to digital connectivity, as the OGC does, have a special “pizzazz,” I would say. (I haven’t yet mastered the network science vocabulary). It seems to me the Open Knowledge Foundation and the Science Commons are hubs or idea spaces with a bright future of rich connections, and I look forward to seeing what connections they form with the OGC.

17 Reasons why scientific geospatial data should be published online using OGC standard interfaces and ISO standard metadata

Reason 1: Data transparency

Science demands transparency regarding data collection methods, data semantics, and processing methods. Rigor, documented!

Reason 2: Verifiability

Science demands verifiability. Any competent person should be able to examine a researcher’s data to see if those data support the researcher’s conclusions.

Reason 3: Useful unification of observations

Being able to characterize, in a standardized human-readable and machine-readable way, the parameters of sensors, sensor systems and sensor-integrated processing chains (including human interventions) enables useful unification of many kinds of observations, including those that yield a term rather than a number.

(From Simon Cox, JRC Europe and CSIRO Australia, editor of ISO 19156 (Observations and Measurements), coordinator of One-Geology geoinformatics, a designer of GeoSciML, and chair of the OGC Naming Authority.)

Reason 4: Data Sharing & Cross-Disciplinary Studies

Diverse data sets with well documented data models can be shared among diverse information communities*. Cross-disciplinary data sharing provides improved opportunities for cross-disciplinary studies.

OGC defines an information community as a group of people (such as a discipline or profession) who share a common geospatial feature data dictionary, including definitions of feature relationships, and a common metadata schema.

Reason 5: Longitudinal studies

Archiving, publishing and preserving well-documented data yields improved opportunities for longitudinal studies. As data formats, data structures, and data models evolve, scientists will need to access historical data and understand the assumptions so that meaningful scientific comparisons can be conducted. Community standards will help ensure long-term consistency of data representation.

Reason 6: Re-use

Open data enables scientists to re-use or repurpose data for new investigations, reducing redundant data collection and enabling more science to be done.

Reason 7: Planning

Open data policies enable collaborative planning of data collection and publishing efforts to serve multiple defined and yet-to-be-defined uses.

Reason 8: Return on investment

With open data policies, institutions and society overall will see greater return on their investment in research.

Reason 9: Due diligence

Open data policies will help research funding institutions perform due diligence and policy development.

Reason 10: Maximizing value

The value of data increases with the number of potential users*. This benefits science in a general way. It also creates opportunities for businesses that will collect, curate (document, archive, host, catalog, publish), and add value to data.

Similar to Metcalf’s law: “The value of a telecommunications network is proportional to the square of the number of connected users of the system.”

Reason 11: Data Discoverability

Open data is discoverable data. Data are not efficiently discovered through literature searches. Searches of data registered using ISO-standard XML-encoded metadata can be efficient and fine-grained.

Reason 12: Data Exploration

Robust data descriptions and quick access to data will enable more frequent and rapid exploration of data – [“natural experiments”](( – to explore hypothetical spatial relationships and to discover unexpected spatial relationships.

Reason 13: Data Fusion

Open data improves the ability to “fuse” in-situ measurements with data from scanning sensors. This bridges the divide between communities using unmediated raw spatial-temporal data and communities using spatial-temporal data that is the result of a complex processing chain.

(From Simon Cox)

Reason 14: Service chaining

Open data (and open online processing services) will improve scientists’ ability to “chain” Web services for data reduction, analysis and modeling.

Reason 15: Pace of science

Open data enables an accelerated pace of scientific discovery, as automation and improved institutional arrangements give researchers more time for field work, study and communication.

“Changes to the Earth that used to take 10,000 years now take three, one reason we need real-time science. … Governances must be able to see and act upon key intervention points.” Brian Walker, Program Director Resilience Alliance and a scientist with the CSIRO, Australia

Reason 16: Citizen science & PR

Open science will help Science win the hearts and minds of the non-scientific public, because it will make science more believable and it will help engage amateur scientists – citizen scientists – who contribute to science and help promote science. It will also increase the quality and quantity of amateur scientists’ contributions.

Reason 17: Forward compatibility

Open Science improves the ability to adopt and utilize new/better data storage, format, discovery, and transmission technologies as they become available.

(Offered to OGC’s David Arctur for this list on 6 January 2010 by Sharon LeDuc, Chief of Staff, NOAA’s National Climatic Data Center, Asheville, North Carolina, USA.)

(Another reason – cross-checking for sensor accuracy — occurred to me while writing this post.)