More Reading

Post navigation

15 Comments

  • @Rufus: thanks!

    When using the data I didn’t create stripped down files first, I parsed the data out as I was reading it. For example, in articseaice.py I simply reject any line that does not start with 4 digits (a year):

            for line in f:
                if not line[0:4].isdigit():
                    continue
    

    It’s all a bit yucky, but mostly pragmatic. I agree it would be nice to provide a sort of “cleaned’ dataset (the arctic sea ice series for example, requires processing 12 separate files, one for each month of the year). Where would such a processed file go? Does CKAN provide a place for it? Also, how would a file be kept up to date? Some of the datasets are continuously maintained, others are more stable records.

    I was thinking that a ScraperWiki like thing might be good. Perhaps ScraperWiki itself.

  • David: really fantastic and informative post — I now know a lot more than I did about the data sources in this area. Thanks for putting it together. Some quick thoughts:

    • Many of the data files are in “txt” format with an overview/introduction at the start (before you get the data). When using the data I assume you have to parse the actual data out. Would it be worth also posting those cleaned files online with additional links on CKAN?

    • It would be interesting to a post on the datasets we should have access to but don’t 🙂

  • @Hector: Thanks!

    There are no spreadsheets, just the raw data files which are mostly linked to via the CKAN packages I mention in the blog article and the Python code I wrote to process those into Google charts.

    The Python code is here, but you should note that it’s really just a collection of throwaway scripts rather than publication quality code. Still, I have no objection to you rooting through my laundry, as long as you realise that’s exactly what it is. Each of the Python files is a script you can run that will download the data it uses to the input/ directory and produce a graphic in the result/ directory. Subsequent re-runs will not download the data again.

    The Antarctic Sea Ice Extent is available from the same place as the Arctic Sea Ice Extent: ftp://sidads.colorado.edu/DATASETS/NOAA/G02135/. The organisation is a little unusual, there is a directory for each month of the year and within each of the those a file containing the ice extents for a series of years; one for the Northern Hemisphere, one for the Southern. See the arcticseaice.py Python program, in particular the get_seaice() function for code to download the Northern Hemisphere data. If you’re feeling bold, you should be able to change the ‘N’ for an ‘S’ in the “datafiles =” line near the beginning of the file.

    Google interactive charts do look fun. To be honest I’d only come across them as I was reviewing the documentation when writing this article, last time I read the Google Chart API docs, interactive charts didn’t exist. So I don’t have any interactive chart expertise yet. Also, they require JavaScript, so would be unsuitable for this blogging platform.

  • Great post. Thank you.
    A few thoughts:
    Would it be possible to open up and link to the google spreadsheets you used to make the google charts here?
    Would it be possible to use google interactive charts next time?
    Is there any good open dataset for Antarctic sea ice extent?

Leave a Reply

Your email address will not be published. Required fields are marked *

back to top