GISTEMP is a crucial open data set, because it contains the historical global temperature record. Not very important right now, but in the medium term absolutely vital for the continuing functioning of our society given the likelihood of adverse climate change.
Stations that measure temperature naturally do so at specific points in space, and the historical record is additionally contaminated by changes in hardware, urbanisation and other issues. Because of this GISTEMP is made using software that estimates a single global temperature from the measurements using a basic scheme invented by James Hansen in the 1970s.
What is interesting from an open knowledge point of view, is that without this software the GISTEMP data itself is fairly meaningless. It defines clearly what the data is. There have been arguments about the derivation, and to address these the original Fortran software was released into the public domain by NASA in September 2007.
Of course, the software is no use if people can’t read and understand it. Because of this, Nick Barnes (from a company called Ravenbrook) has started a project to rewrite the GISTEMP software in Python, ensuring it produces the same output as the original Fortran.
This is called the Clear Climate Code project. They intend eventually to make clear climate modelling code, they are just starting with the global temperature record.
This open approach to the scientific code and data has already found some rewards. The August 11, 2008 GISTEMP update describes a bug in the original Fortran code which the Python rewrite unearthed:
Nick Barnes and staff at Ravenbrook Limited have generously offered to reprogram the GISTEMP analysis using python only, to make it clearer to a general audience. In the process, they have discovered in the routine that converts USHCN data from hundredths of Â°F to tenths of Â°C an unintended dropping of the hundredths of Â°F before the conversion and rounding to the nearest tenth of Â°C. This did not significantly change any results since the final rounding dominated the unintended truncation. The corrected code has been used for the current update and is now part of the publicly available source.
So two lessons – 1) Free that scientific code and data. The proper peer review might save more than you think, one day. 2) Good software engineering is worth it in the case of critically important, academic software.
(Some details and references in this Wikipedia article)