Last week I was at the XTech conf along with Jo Walsh in order to present in the Open Data track. We built on our recent discussion to argue for the fundamental importance of componentization in developing the Open Data/Knowledge ecosystem — you can find the slides of our talk (entitled Open Data and Componentization) here.

Being here for the week has been a great experience. With one of the four streams being dedicated to Open Data the conference has been a chance to see and chat with a whole bunch of other projects and people, some of which I knew about before, but many of which I did not (or had not met in person).

Coming out of this was a really good sense of convergence in understanding as to what we need to do: add licenses to data, get a consensus on what ‘openness’ is, find ways to add knowledge APIs so we can plug difference corpora together. It is also very heartening to see the growing maturity of many of the tools and resources — e.g. PubMed, the World Wide Molecular Matrix, time visualization tools, gene databases — though I would say we still find it very hard to plug different resources together — where that has been achieved it is usually thanks to a high degree of agreement in terminology and standards combined with a significant commitment to add the associated structures into the data.

Random Notes

Open Data BOF Tuesday

  • value of unique identifiers
  • css(3)
  • zip archive including supplementary data
  • rel=offline-resource
  • able to extract data


  • project prospect (Royal Society Chemistry)
  • scopus (proprietary)
  • sitebite (annotations)

Gavin Bell

  • Mining personal connections out of semi-structured web data
  • Focus on microformats

Jon Trowbridge: The 21st Century Sneakernet

  • Organize the world’s information and make it universally available and useful.
  • Take large scholarly datasets
    • Must be open/free
  • NASA Hubble Archive: 120TB
  • Archimedes Palimpset: 1TB
  • PMM: 10TB
  • What’s good about a sneakernet
    • commodity technologies
    • high throughput
    • trivially scalable
    • $1700 for 3TB
    • Rapidly getting cheaper
  • The device
    • Sonet: Enclosure ($400)
    • 7TB with RAID

Talis: Open Data Licensing

  • Excellent talk laying out the legal issues particularly in relation to DB right
  • Database rights are a good thing because they allow open licensing
  • Great to hear much of the stuff we have been saying coming from ‘Industry’

Talis: Value of Open Data

  • Open data up and then make money from complementary services
  • Use licenses as a tool
  • Database rights are useful


  • Interesting point at end where they discussed use of google data
  • Apparently getting a particular tile back for a particular place for you to use in something else is almost impossible
  • What they are doing is probably not allowed by the Google License and they are looking to move to another data source

Stamen (Tom Carden, Michal Migurski)

Analyzing Time

  • Simile TimeLine project
  • 1k project
  • Mike’s Back Channel interface:
  • Flickr organizr
  • measuremap: Flash date slider
  • Google finance data
  • Yahoo stock data: url meaningful
  • New York Times: Casualties of War
  • Folding: morit
  • Abbleton live
  • Stamen demos: Emergency call visualization
    • How many calls go to each center
  • Stamen demos: Open Crime demos
    • e.g. prostitution arrests in oakland
    • …
  • Stamen demos: property analysis
    • year a property was built
    • costs of a property
    • animate this information by time
Website | + posts

Rufus Pollock is Founder and President of Open Knowledge.