The following guest post is by David Jones who is, among other things, a curator of the climate data group on CKAN (the OKF’s open source registry of open data) and co-founder of Clear Climate Code (which was previously featured on our blog here and here).

Take a tour of some of the additions we’ve made to the climate data group at OKF’s CKAN.

The Mauna Loa observatory, Hawaii, has the longest period of continual recording of the amount of CO2 (carbon dioxide) in the air, the airborne fraction. The data are available in CKAN, and here they are in chart form:

CO2 is a relatively well mixed gas in the atmosphere, but even so, it would be unwise to rely on a single location for measurements. The Carbon Dioxide Analysis Center maintain a global network of stations collecting CO2. In fact Mauna Loa is not used for the global average because its height gives it a CO2 fraction that is lower than the surface average by 1 to 2 ppm.

What about reconstructing historical CO2 levels? One source is ice cores. On the Antarctic ice sheet snow falls every year and never melts. New snow falls on the snow from the previous year, building up in layers. Eventually the snow builds up to a thickness where it compresses the snow beneath it into solid ice, ice that is impermeable to gas. At that point, air at the surface becomes trapped in little bubbles in the ice.


By drilling down through the ice we can reach older and older ice. Vostok Station sits on the Antarctic ice sheet, above Lake Vostok. Researchers have drilled down through the ice to a depth of 3623m and reached ice that is about 400,000 years old. Drilling was stopped, tantalisingly close to the lake surface, because the Scientific Committee on Antarctic Research (SCAR) raised concerns that life in Lake Vostok, potentially forming a unique biome, may be contaminated.

By measuring the CO2 content of the gas trapped in the ice core, we can reconstruct the historical levels. Of course, the data are in CKAN. Here’s a chart:

(Other data from the Vostok ice core are also available)

Vostok is well known for being the coldest place on Earth. Vostok Station was established in 1957, and since that time weather records have been kept by researchers working there. The temperature record for Vostok Station is just one of the many thousands of records made available in the Global Historical Climate Network (GHCN). Here’s Vostok’s temperature record for the last three decades (more data are available, but three decades fits nicely):


The different colours are because for a particular station the whole series can be comprised of individual records that cover only part of the range (due to different equipment, different reporting procedures, and so on); each record gets a different colour (unfortunately the records often overlap, confusing the colours).

The station records in GHCN, sometimes augmented by other similar datasets such as SCAR’s READER, are used to reconstruct global temperature anomalies, like the Japan Meteorological Agency Global Surface Temperature Anomaly, HadCRUT3 from the UK’s Met Office and the University of East Anglia’s Climate Research Unit, and, perhaps most famously, GISTEMP from NASA:

The seasonal cycle is evident in the Mauna Loa CO2 (it’s caused by photosynthesis of plants, mostly in the Northern Hemisphere, drawing down more CO2 from the atmosphere), and also in the Vostok temperature record. In some cases the seasonal signal and the long term trend are easily visible, in others in takes effort to recover the long term trend. The GISTEMP graph can be thought of as recovering the long term trend from many thousands of individual station records.

Another well known climate data series with both seasonal and long term trends is the National Snow and Ice Data Centre’s Arctic Sea Ice Extent:


The seasonal cycle in the Arctic sea ice is of course due to summer melt and winter freeze.

Another data set available as a CKAN package is the Colorado Sea Level data. This is a measurement of global mean sea level obtained by a series of satellites: TOPEX, then JASON-1 and JASON-2. The next satellite in this series is JASON-3, and it has just secured funding from a European consortium. There is a seasonal cycle in this data too:


Now I think the seasonal cycle in mean sea level is due to the thermal expansion of the oceans. In the summer the ocean warms and expands; both the northern hemisphere and the southern hemisphere are affected but the effects don’t balance so there is a seasonal cycle. Please contact your tour guide (leave a comment!) if you can find a reliable explanation (I looked and was unable to find a good source).

The satellite era has been tremendously useful for earth observation and climate science, but of course the records from satellites are short. For example, the satellite data for sea level only goes back to 1993. Since climate is often a matter of looking at events on long timescales we often have to find longer series from other measurements.

The UK’s Natural Environment Research Council maintains the Permanent Service for Mean Sea Level at the Proudman Oceanographic Laboratory. Using a global network of about 2000 tide gauges they can reconstruct a global mean sea level record that documents sea level rise since 1880. Here’s a chart of the data available from the CKAN package:


The tour is coming to an end now. The data that I’ve shown here are just a selection of the data available, both generally and in CKAN. Often there is much more detailed data (and more detailed science) behind each of these datasets, but one of the reasons I’ve selected many of these datasets is that they are key indicators. They are the headline figures that show increased CO2 emissions, rising sea levels, decreasing Arctic sea ice. These are the data that a curious member of the public will want to engage with, and that reason makes it important that the data are accessible and freely accessible.

If you’d like to contribute to the climate data group then please drop us an e-mail. If you’d like to continue the tour on your own you might want to try the Red Sea Sea Level records and the Paleo Tree Ring records which are just around the corner in the Open Archeology wing.

If you’re interested in promoting open data in climate science, you may wish to endorse the Panton Principles, which were launched last week.

References and Credits

+ posts

15 thoughts on “A tour of climate data at CKAN”

  1. Great post. Thank you.
    A few thoughts:
    Would it be possible to open up and link to the google spreadsheets you used to make the google charts here?
    Would it be possible to use google interactive charts next time?
    Is there any good open dataset for Antarctic sea ice extent?

  2. @Hector: Thanks!

    There are no spreadsheets, just the raw data files which are mostly linked to via the CKAN packages I mention in the blog article and the Python code I wrote to process those into Google charts.

    The Python code is here, but you should note that it’s really just a collection of throwaway scripts rather than publication quality code. Still, I have no objection to you rooting through my laundry, as long as you realise that’s exactly what it is. Each of the Python files is a script you can run that will download the data it uses to the input/ directory and produce a graphic in the result/ directory. Subsequent re-runs will not download the data again.

    The Antarctic Sea Ice Extent is available from the same place as the Arctic Sea Ice Extent: The organisation is a little unusual, there is a directory for each month of the year and within each of the those a file containing the ice extents for a series of years; one for the Northern Hemisphere, one for the Southern. See the Python program, in particular the get_seaice() function for code to download the Northern Hemisphere data. If you’re feeling bold, you should be able to change the ‘N’ for an ‘S’ in the “datafiles =” line near the beginning of the file.

    Google interactive charts do look fun. To be honest I’d only come across them as I was reviewing the documentation when writing this article, last time I read the Google Chart API docs, interactive charts didn’t exist. So I don’t have any interactive chart expertise yet. Also, they require JavaScript, so would be unsuitable for this blogging platform.

  3. David: really fantastic and informative post — I now know a lot more than I did about the data sources in this area. Thanks for putting it together. Some quick thoughts:

    • Many of the data files are in “txt” format with an overview/introduction at the start (before you get the data). When using the data I assume you have to parse the actual data out. Would it be worth also posting those cleaned files online with additional links on CKAN?

    • It would be interesting to a post on the datasets we should have access to but don’t :)

  4. @Rufus: thanks!

    When using the data I didn’t create stripped down files first, I parsed the data out as I was reading it. For example, in I simply reject any line that does not start with 4 digits (a year):

            for line in f:
                if not line[0:4].isdigit():

    It’s all a bit yucky, but mostly pragmatic. I agree it would be nice to provide a sort of “cleaned’ dataset (the arctic sea ice series for example, requires processing 12 separate files, one for each month of the year). Where would such a processed file go? Does CKAN provide a place for it? Also, how would a file be kept up to date? Some of the datasets are continuously maintained, others are more stable records.

    I was thinking that a ScraperWiki like thing might be good. Perhaps ScraperWiki itself.

Comments are closed.