Interview with Rufus Pollock on NetSquared

Jed Sundwall of Netsquared just published an interview with Rufus Pollock, co-founder of the Open Knowledge Foundation.

The interview includes discussion about the distinction between price and value, about the Open Knowledge Definition, about CKAN, about decentralised approaches to working with large quantities of data, about packaging for knowledge and about ‘Shiny Front End Syndrome’. It ends with 3 suggestions for people publishing collections of content or data.

Here’s an excerpt:

Well, one day soon we’re going to have a lots of material that is open and what’s really exciting about open stuff is that it can easily be shared and recombined. That means we can break very complicated problems down into small bits, which people can manage. But then, we can put it back together again. So, let’s say you were interested in U.S. unemployment, a hot topic, and you’re interested in understanding how it changes. Maybe there’s a data site out there just on unemployment itself. But maybe there’s another one on house repossessions or the housing market, and then, there’s another one on manufacturing. There are a whole bunch of different data sites.

Now, maybe one person could just maintain them all but that might become too big a job. You may need expertise in the housing market to maintain the housing data site, but you really want to bring these together often when you want to do analysis, or compute things, or make pretty pictures, or whatever it is you want to do. This is very similar to building a large building, let’s say, or developing an operating system plus all the applications to use. Maybe one person could build them all and make sure they all work together but that would be quite a big task. Even the world’s greatest monopolist struggles to do this effectively.

So, the typical way we go about doing this is by exploiting divide and conquer. But when you divide stuff up, there was this question about how you bring it back together. So then, we say we’re moving toward a world where you can start getting lots of these data sets and then start putting them out there in the world. They can just start taking this unemployment data or this housing data. But, how do you find that and how do you get a hold of it? So often in software, there’s been this tradition of building some kind of registry where you can find things, and then you start to impose some structure on that material, you start packaging. So rather than just saying: here’s my website, here’s my Wiki, look, there’s lots of data on it, you are going to start packaging that data in a slightly more structured form.

The point of CKAN is to start saying, look, there’s a better way than just having our stuff in wikis or in some random form on a website. We can start registering this material, and packaging it up a bit. That way other people, when they want them, can come and get hold of them easily and wheel of reuse can start to turn.