Clear Climate Code, and Data
January 28, 2010 in Exemplars, External, Open Data, Open Science
The following guest post is by David Jones who is, among other things, a curator of the climate data group on CKAN (the OKF’s open source registry of open data) and co-founder of Clear Climate Code (which we blogged about back in 2008).
Clear Climate Code have been working on ccc-gistemp, a project to reimplement in clear Python NASA’s GISTEMP. GISTEMP is a global historical temperature analysis, it produces, amongst other things, graphs like this, that tell you whether the Earth is getting warmer or cooler:

Because this graph is important for studying the world’s climate (and determining the signature of global warming), there is a lot of public discussion about where this data comes from. The raw data underlying the graph is surface weather station temperature records. The raw data is processed to produce the data for the graph:

The box in the middle, labelled “GISTEMP”, is a process that converts the raw station records into the data for the graph on the right, which is the global temperature anomaly. There are descriptions of this process available, for example Hansen and Lebedeff, 1987. A description is one thing, but it might not tell you everything you need to know. Perhaps the description is sufficiently clear and accurate for you to reproduce the process, perhaps not. The ultimate authority on the process is the source code that implements it, because It’s the source code that is executed in order to produce the processed data. So if you want to know exactly what the process involves, you have to get hold of the source code.
In effect it is the source code that adds value to the raw data to produce processed data. So in a sense, the value of the processed data is embodied in the source code. That’s what makes the source code important.
The source code for GISTEMP is written mostly in Fortran by scientists at NASA, and is available from them. This source code is the working code used by the NASA scientists, it is not necessarily the best source code for explaining how the process works (to an interested and competent member of the general public). There is the question of whether NASA, a publicly funded body, should be paying someone to write code that makes a better tool for communicating with the public (for example by writing better documentation, or writing it in a more exemplary style). I am not going to address that question. The source code NASA use is the source code we have right now.
Our goal at Clear Climate Code is to take this code and produce a new version that is clearer, but does the same thing. We have taken great steps forward towards this goal: We have recently released a version which is all in Python and which reproduces NASA’s results exactly. We think much of this code is already a great deal clearer than the starting material, but we continue to make it clearer. Of course we would welcome your support. If you want to help, please join our mailing list, or you can follow our progress at our blog and on twitter.
The reasons Clear Climate Code chose Python as the implementation language for ccc-gistemp are: accessibility, clarity, and familiarity. By accessible I mean that there is a large community of Python programmers, but also there are several tutorials and other materials for learning Python should you be motivated. Python is used to teach undergraduates programming. Python is relatively clear; it’s deliberately designed to be free of the clutter that imperils other programming languages. It’s certainly possible for people who are not professional programmers to create small programs in Python, and examine and modify existing Python programs. And lastly, it’s familiar; Nick Barnes and I already knew Python when we started the project. This seems like a trivial consideration, but in fact Clear Climate Code is an unpaid project and it’s pretty easy to come up with reasons to do something else instead, so the fact that we already knew Python was important.
Hopefully Clear Climate Code illustrates how both code and data are central to the public understanding of science. For an issue like global warming it is absolutely crucial that public are involved. CKAN’s climate data group is a place where non-specialists can access scientist’s data more easily, and hopefully use it to innovate, do their own hobby science, or create visualisations to better communicate with the public. I’m hoping to add more data sources to the climate data group in the near future, if you’re interested in adding more data to this group, please get in touch.



















