Support Us

You are browsing the archive for Interviews.

Storytelling with

Heather Leson - October 21, 2014 in Events, Interviews


As we well know, Data is only data until you use it for storytelling and insights. Some people are super talented and can use D3 or other amazing visual tools, just see this great list of resources on Visualising Advocacy. In this 1 hour Community Session, Nika Aleksejeva of shares some easy ways that you can started with simple data visualizations. Her talk also includes tips for telling a great story and some thoughtful comments on when to use various data viz techniques.

We’d love you to join us and do a skillshare on tools and techniques. Really, we are tool agnostic and simply want to share with the community. Please do get in touch and learn more: about Community Sessions.


Heather Leson - July 9, 2014 in Events, Ideas and musings, Interviews, Network, OKFest, OKFestival, Open Knowledge Foundation

Everyone is a storyteller! Just one week away from the big Open Brain Party of OKFestival. We need all the storytelling help you can muster. Trust us, from photos to videos to art to blogs to tweets – share away.

The Storytelling team is a community-driven project. We will work with all participants to decide which tasks are possible and which stories they want to cover. We remix together.

We’ve written up this summary of how to Storytell, some story ideas and suggested formats.

There are a few ways to join:

  • AT the Event: We will host an in person meetup on Tuesday, July 15th to plan at the Science Fair. Watch #okstory for details. Look for the folks with blue ribbons.
  • Digital Participants: Join in and add all your content with the #okfest14 @heatherleson #OKStory tags.
  • Share: Use the #okstory hashtag. Drop a line to heather.leson AT okfn dot org to get connected.

We highlighted some ways to storytell in this brief 20 minute chat:

Community Session: Open Data Hong Kong

Heather Leson - May 7, 2014 in Events, Interviews, OK Hong Kong, Open Knowledge international Local Groups

Open Data Hong Kong is an open, participative, and volunteer-run group of Hong Kong citizens who support Open Data. Join Mart van de Ven, Open Knowledge Ambassador for Hong Kong, and Bastien Douglas of ODHK for a discussion about their work.

odhk - logo

How to Participate

This Community Session will be hosted via G+. We will record it.

  • Date: Wednesday, May 14, 2014
  • Time: Wednesday 21:00 – 22:00 EDT/ Thursday 09:00 – 10:00 HKT/01:00 – 02:00 UTC
  • See to convert times.
  • Duration: 1 hour
  • Register for the event here

About our Community Session Guests

Mart van de Ven

Mart van de Ven co-founded Open Data Hong Kong to inspire and nurture a techno-social revolution in Hong Kong. He believes Open Data is a chance for citizens to be better served by government. Not only because it enables greater transparency and accountability, but because when governments open up their data it allows them to concentrate on their irreducible core – enabling us as citizens. He is also Open Knowledge’s ambassador to Hong Kong, a data-driven developer and technology educator for General Assembly.

Bastien Douglas

Bastien’s role with ODHK is to create a structure for the community to develop sustainability, form partnerships with other organisations and operationalize projects to achieve the goals of the organisation. Bastien’s background combines public sector experience, research analysis and citizen engagement. For over 4 years as a public servant in the federal government of Canada in Ottawa, he analysed policy at the front lines of policy development and researched public management issues at the centre of the bureaucracy. In 2009, a community of innovative public servants formed by Bastien to work across silos using collaborative tools and social media pushed projects for to forward Open Data to raise capacity to share knowledge and better support the public. Bastien then worked in the NGO sector building knowledge capacity for the immigrant-serving sector, while supporting advocacy for improved services, information-sharing, access to resources and sharing of practices for service delivery.

Bastien Douglas on Twitter

More Details

See the full Community Session Schedule

Vice Italy interview with the editor of the Public Domain Review

Theodora Middleton - January 28, 2013 in Interviews, Public Domain, Public Domain Review

The editor of The Public Domain Review, Adam Green, recently gave a feature-length interview to Vice magazine Italy. You can find the original in Italian here, and an English version below!

While there is a wealth of copyright-free material available online, The Public Domain Review is carving out a niche as strongly curated website with a strong editorial line. How did the PDR begin?

Myself and The Public Domain Review’s other co-founder, Jonathan Gray, have long been into digging around in the these huge online archives of digitised material – places like the Internet Archive and Wikimedia Commons – mostly to find things with which to make collages. We started a little blog called Pingere to share some of the more unusual and compelling things that we stumbled across. Jonathan suggested that we turned this into a bigger project aiming to celebrate and showcase the wonderfulness of this public domain material that was out there. We took the idea to the Open Knowledge Foundation, a non-profit which promotes open access to knowledge in a variety of fields, and they helped us to secure some initial seed funding for the project. And so the Public Domain Review was born.

What was the first article you posted?

We initially focused on things which were just coming into the public domain that year. In many countries works enter the public domain 70 years after the death of the author or artist – although there are lots of weird rules and exceptions (often unnecessarily complicated!). Anyway, 2011 saw the works of Nathaniel West enter the public domain, including his most famous book Day of the Locusts. The first article was about that, and West’s relationship with Hollywood, written by Marion Meade who’d recently published a book on the subject.

What criteria do you use to choose stuff for the Review?

As the name suggests, all our content is in the ‘public domain’, so that is the first criterion. We try to focus on works that are in the public domain in most countries, which isn’t as easy as it sounds as every country has different rules. Generally it means stuff created by people who passed away before the early 1940s. The second criterion is that there are no restrictions on the reuse of the digital copies of the public domain material.

What kind of restrictions?

Well, some countries say that in order to qualify for copyright digital reproductions have to demonstrate some minimal degree of originality, and others say that there just needs to be demonstrable investment in the digitisation (the so-called “sweat of the brow” doctrine). Many big players in the world of digitisation – like Google, Microsoft, the Bridgeman Art Library, and national institutions – argue that they own rights in their digital reproductions of works that have entered the public domain, perhaps so they can sell or restrict access to them later down the line. We showcase material from institutions who have already decided to openly license their digitisations. We are also working behind the scenes to encourage more institutions to do the same and see free and open access to their holdings as part of their public mission.

But you have a strong aesthetic line as well, don’t you?

Yes of course, the material has to be interesting! We tend to go for stuff which is less well known, so rather than put up all the works of Charles Dickens (as great as they are) we’ll go instead for something toward the more unorthodox end of the cultural spectrum, e.g. a personal oracle book belonging to Napoleon, or a 19th century attempt to mathematically model human consciousness through geometric forms. I guess a sort of alternative history to the mainstream narrative, an attempt to showcase just some of the excellence and strangeness of human ideas and activity which exist ‘inbetween’ these bigger events and works about which the narrative of history is normally woven.

Is there anything you wouldn’t publish?

I guess there is some material which is perhaps a little too controversial for the virtuous pages of the PDR – such as the racier work of Thomas Rowlandson or some of the less family friendly works of the 16th century Italian printmaker Agostino Carracci. Our most risque thing to date is probably a collection of some of Eadweard Muybridge’s ‘animal locomotion’ portfolio, which included a spot of naked tennis.

It seems that authors are becoming less and less important, publishers are facing extinction, and yet the potential for users of content is ever-expanding. What do you think about the future of publishing?

It is certainly true that things are radically changing in the publishing world. Before the advent of digital technologies, publishers were essentially gatekeepers of what words were seen in the public sphere. You saw words in books and newspapers and – for many people – that was pretty much it. What you saw was the result of decisions made by a handful of people. But now this has changed. People don’t need publishing contracts to get their words seen. Words, pictures and audiovisual material can be shared and spread at virtually no cost with just a few clicks. But people still do want to read words in books. And they turn to publishers – through bookshops, the media, etc – to find new things to read. While there is DIY print-on-demand publishing, it is hard to compete with the PR and promotion of professional publishers. I don’t think publishers will become extinct. No doubt they will adapt to new markets in search for profits.

Is the internet causing works to become more detached from their authors? Is there a way in which this could be a good thing?

With the rise of digital technologies it is, no doubt, much easier for this detachment to happen. Words leave the confines of books and articles, get copied and pasted into blogs, websites and social media, are shared through illegal downloads, etc, perhaps losing proper attribution along the way. But in a way none of this is new. It is just a more accelerated version of what has happened for hundreds of years. If anything it is probably better for authors now than it was with the past – as the internet also enables people to try to check where things come from, their pedigree and provenance. In the 17th century, before there was a proper copyright law, it was common for whole books to be “stolen”, given a new title and cover, and be sold under a new author’s name.

Could this be a good thing? Well, one could argue that reuse and reworking are an essential part of the creative process. We can find brilliant examples of literary pastiche and collaging techniques in the works of writers like W.G. Sebald, where you are not sure whether he’s speaking with his own words or that of another writer (whose work he is discussing). In Sebald’s case it gives the whole piece a fluency and unity, a sense that its one voice, of humanity or history speaking. But of course Sebald’s work is protected by copyright held by his publishers or his literary estate. One wonders whether one could use his works in the same way and get away with it.

So is copyright a big negative?

No not at all – from the perspective of artists/writers copyrighting their work, in general it makes complete sense to me. This is not just about money but also about artistic control over how a work is delivered. Looking back to the past before copyright – it wasn’t just about royalties but also about reputation, about preventing or discouraging mischievous or sloppy reuse. While copyright is far from perfect – and often pretty flawed – it still offers creators a basic level of protection for the things that they have created. As an author or artist if you want something more flexible than your standard copyright license then you can combine it with things like Creative Commons licenses to say how you want others to be able to use your works.

The question of how long (or whether!) works should be copyrighted after the death of creators is an entirely different question. I think copyright laws and international agreements are currently massively skewed in favour of big publishers and record companies (often supported by well heeled lobbyist groups purporting to serve the neglected interests of famous authors and aging rock stars), and do not take sufficient account of the public domain as a positive social good: a cultural commons, free for everyone.

Have you ever had problems with a copyright claim from an author?

Well almost all of the public domain material we feature is by people who are long dead, so we haven’t (thank god!) had any direct complaints from them. We did get one take down notice on Gurideff’s Harmonium Recordings. The law can get very complex, particularly around films and sound recordings. I am not sure they were right, but we took it down all the same.

What are your plans for the future?

As well as expansion of the site with exciting new features we are also planning to break out from the internet into the real world of objects! We’re planning to produce some beautiful printed volumes with collections of images and texts curated around certain themes. We’ve wanted to do this for a while, and hopefully we’ll have time (and funds!) to finally do this next year.

You can sign up to The Public Domain Review’s wonderful newsletter here

“Carbon dioxide data is not on the world’s dashboard” says Hans Rosling

Jonathan Gray - January 21, 2013 in Featured, Interviews, OKFest, Open Data, Open Government Data, Open/Closed, WG Sustainability, Working Groups

Professor Hans Rosling, co-founder and chairman of the Gapminder Foundation and Advisory Board Member at the Open Knowledge Foundation, received a standing ovation for his keynote at OKFestival in Helsinki in September in which he urged open data advocates to demand CO2 data from governments around the world.

Following on from this, the Open Knowledge Foundation’s Jonathan Gray interviewed Professor Rosling about CO2 data and his ideas about how better data-driven advocacy and reportage might help to mobilise citizens and pressure governments to act to avert catastrophic changes in the world’s climate.

Hello Professor Rosling!


Thank you for taking the time to talk to us. Is it okay if we jump straight into it?

Yes! I’m just going to get myself a banana and some ginger cake.

Good idea.

Just so you know: if I sound strange, it’s because I’ve got this ginger cake.

A very sensible idea. So in your talk in Helsinki you said you’d like to see more CO2 data opened up. Can you say a bit more about this?

In order to get access to public statistics, first the microdata must be collected, then it must be compiled into useful indicators, and then these indicators must be published. The amount of coal one factory burnt during one year is microdata. The emission of carbon dioxide per year per person in one country is an indicator. Microdata and indicators are very very different numbers. CO2 emissions data is often compiled with great delays. The collection is based on already existing microdata from several sources, which civil servants compile and convert into carbon dioxide emissions.

Let’s compare this with calculating GDP per capita, which also requires an amazing amount of collection of microdata, which has to be compiled and converted and so on. That is done every quarter for each country. And it is swiftly published. It guides economic policy. It is like a speedometer. You know when you drive your car you have to check your speed all the time. The speed is shown on the dashboard.

Carbon dioxide is not on the dashboard at all. It’s like something you get with several years delay, when you are back from the trip. It seems that governments don’t want to get it swiftly. And when they publish it finally, they publish it as total emissions per country. They don’t want to show emission per person, because then the rich countries stand out as worse polluters than China and India. So it is not just an issue about open data. We must push for change in the whole way in which emissions data is handled and compiled.

You also said that you’d like to see more data-driven advocacy and reportage. Can you tell us what kind of thing you are thinking of?

Basically everyone admits that the basic vision of the green movement is correct. Everyone agrees on that. By continuing to exploit natural resources for short term benefits you will cause a lot of harm. You have to understand the long-term impact. Businesses have to be regulated. Everyone agrees.

Now, how much should we regulate? Which risks are worse, climate or nuclear? How should we judge the bad effects of having nuclear electricity? The bad effects of coal production? These are difficult political judgments. I don’t want to interfere with these political judgments, but people should know the orders of magnitude involved, the changes, what is needed to avoid certain consequences. But that data is not even compiled fast enough, and the activists do not protest, because it seems they do not need data?

Let’s take one example. In Sweden we have data from the energy authority. They say: “energy produced from nuclear”. Then they include two outputs. One is the electricity that goes out into the lines and that lights the house that I’m sitting in. The other is the warm waste water that goes back into the sea. That is also energy they say. It is actually like a fraud to pretend that that is energy production. Nobody gets any benefit from it. On the contrary, they are changing the ecology of the sea. But they get away with it as the destination is energy produced.

We need to be able to see the energy supply for human activity from each source and how it changes over time. The people who are now involved in producing solar and wind produce very nice reports on how production increase each year. Many get the impression that we have 10, 20, 30% of our energy from solar and wind. But even with fast growth from almost zero solar and wind it is nothing yet. The news reports mostly neglect to explain the difference in percentage growth of solar and wind energy and their percent of total energy supply.

People who are too much into data and into handling data may not understand how the main misconceptions come about. Most people are so surprised when I show them total energy production in the world on one graph. They can’t yet see solar because it hasn’t reached one pixel yet.

So this isn’t of course just about having more data, but about having more data literate discussion and debate – ultimately about improving public understanding?

It’s like that basic rule in nutrition: Food that is not eaten has no nutritional value. Data which is not understood has no value.

It is interesting that you use the term data literacy. Actually I think it is presentation skills we are talking about. Because if you don’t adapt your way of presenting to the way that people understand it, then you won’t get it through. You must prepare the food in a way that makes people want to eat it. The dream that you will train the entire population to about one semester of statistics in university: that’s wrong. Statisticians often think that they will teach the public to understand data the way they do, but instead they should turn data into Donald Duck animations and make the story interesting. Otherwise you will never ever make it. Remember, you are fighting with Britney Spears and tabloid newspapers. My biggest success in life was December 2010 on the YouTube entertainment category in the United Kingdom. I had most views that month. And I beat Lady Gaga with statistics.


Just the fact that the guy in the BBC in charge of uploading the trailer put me under ‘entertainment’ was a success. No-one thought of putting a trailer for a statistics documentary under entertainment.

That’s what we do at Gapminder. We try to present data in a way that makes people want to consume it. It’s a bit like being a chef in a restaurant. I don’t grow the crop. The statisticians are like the farmers that produce the food. Open data provide free access to potatoes, tomatoes and eggs and whatever it is. We are preparing it and making a delicious food. If you really want people to read it, you have to make data as easy to consume as fish and chips. Do not expect people to become statistically literate! Turn data into understandable animations.

My impression is that some of the best applications of open data that we find are when we get access to data in a specific area, which is highly organized. One of my favorite applications in Sweden is a train timetable app. I can check all the communter train departures from Stockholm to Uppsala, including the last change of platform and whether there is a delay. I can choose how to transfer quickly from the underground to the train to get home fastest. The government owns the rails and every train reports their arrival and departure continuously. This data is publicly available as open data. Then a designer made an app and made the data very easy for me to understand and use.

But to create an app which shows the determinants of unemployment in the different counties of Sweden? No-one can do that because that is a great analytical research task. You have to take data from very many different sources and make predictions. I saw a presentation about this yesterday at the Institute for Future Studies. The PowerPoint graphics were ugly, but the analysis was beautiful. In this case the researchers need a designer to make their findings understandable to the broad public, and together they could build an app that would predict unemployment month by month.

The CDIAC publish CO2 data for the atmosphere and the ocean, and they publish national and global emissions data. The UNFCCC publish national greenhouse gas inventories. What are the key datasets that you’d like to get hold of that are currently hard to get, and who currently holds these?

I have no coherent CO2 dataset for the world beyond 2008 at the present. I want to have this data until last year, at least. I would also welcome half year data but I understand this can be difficult because carbon dioxide emission vary for transport, heating or cooling of houses over the seasons of the year. So just give me the past year’s data in March. And in April/May for all countries in the world. Then we can hold government accountable for what happens year by year.

Let me tell you a bit about what happens in Sweden. The National Natural Protection Agency gets the data from the Energy Department and from other public sources. Then they give these datasets to consultants at the University of Agriculture and the Meteorological Authority. Then the consultants work on these datasets for half a year. They compile them, the administrators look through them and they publish them in mid-December, when Swedes start to get obsessed about Christmas. So that means that there was a delay of eleven and a half months.

So I started to criticize that. My cutting line was when I was with the Minister of Environment and she was going to Durban. And I said “But you are going to Durban with eleven and a half month constipation. What if all of this shit comes out on stage? That would be embarrassing wouldn’t it?”. Because I knew that she had in 2010 an increase in carbon dioxide emission and it increased by 10%. But she only published that coming back from Durban. So that became a political issue on TV. And then the government promised to make it earlier. So 2012 we got CO2 data by mid-October, and 2013 we’re going to get it in April.


But actually ridiculing is the only way that worked. That’s how we liberated the World Bank’s data. I ridiculed the President of the World Bank at an international meeting. People were laughing. That became too much.

The governments in the rich countries don’t want the world to see emissions per capita. They want to publish emissions per country. This is very convenient for Germany, UK, not to mention Denmark and Norway. Then they can say the big emission countries are China and India. It is so stupid to look at total emissions per country. This allows small countries to emit as much as they want because they are just not big enough to matter. Norway hasn’t reduced their emissions for the last forty years. Instead they spend their aid money to help Brazil to replant rainforest. At the same time Brazil lends 200 times more money to the United States of America to help them consume more and emit more carbon dioxide into the atmosphere. Just to put these numbers up makes a very strong case. But I need to have timely carbon dioxide emission data. But not even climate activists ask for this. Perhaps it is because they are not really governing countries. The right wing politicians need data on economic growth, the left wing need data on unemployment but the greens don’t yet seem to need data in the same way.

As well as issues getting hold of data at a national level, are there international agencies that hold data that you can’t get hold?

It is like a reflection. If you can’t get data from the countries for eleven and a half months, why the heck should the UN or the World Bank compile it faster? Think of your household. There are things you do daily, that you need swiftly. Breakfast for your kids. Then, you know, repainting the house. I didn’t do it last year, so why should I do it this year? It just becomes slow the whole system. If politicians are not in a hurry to get data for their own country, they are not in a hurry to compare their data to other countries. They just do not want this data to be seen during their election period.

So really what you’re saying that you’d recommend is stronger political pressure through ridicule on different national agencies?

Yes. Or sit outside and protest. Do a Greenpeace action on them.

Can you think of datasets about carbon dioxide emissions which aren’t currently being collected, but which you think should be collected?

Yes. In a very cunning way China, South Africa and Russia like to be placed in the developing world and they don’t publish CO2 data very rapidly because they know it will be turned against them in international negotiations. They are not in a hurry. The Kyoto Protocol at least made it compulsory for the richest countries to report their data because they had committed to decrease. But every country should do this. All should be able to know how much coal each country consumed, how much oil they consumed, etc and from that data have a calculation made on how much CO2 each country emitted last year.

It is strange that the best country to do this – and it is painful for a Swede to accept this – is the United States. CDIAC. Federal Agencies in US are very good on data and they take on the whole world. CDIAC make estimates for the rest of the world. Another US agency I really like is the National Snow and Ice Data Centre in Denver, Colorado. Thay give us 24 hours updates on the polar sea ice area. That’s really useful. They are also highly professional. In the US the data producers are far away from political manipulation. When you see the use of fossil fuels in the world there is only one distinct dip. That dip could be attributed to the best environmental politician ever. The dip in CO2 emissions took place in 2008. George W. Bush, Greenspan and the Lehman Brothers decreased CO2 emissions by inducing a financial crisis. It was the most significant reduction on the use of fossil fuels in modern history.

I say this to put things into proportion. So far it is only financial downturns that have had an effect on the emission of greenhouse gases. The whole of environmental policy hasn’t yet had any such dramatic effect. I checked this with Al Gore personally. I asked him “Can I make this joke? That Bush was better for the climate than you were?”. “Do that!”, he said, “You’re correct.” Once we show this data people can see that the economic downturn so far was the most forceful effect on CO2 emission.

If you could have all of the CO2 and climate data in the world, what would you do with it?

We’re going to make teaching materials for high schools and colleges. We will cover the main aspects of global change so that we produce a coherent data-driven worldview, which starts with population, and then covers money, energy, living standards, food, education, health, security, and a few other major aspects of human life. And for each dimension we will pick a few indicators. Instead of doing Gapminder World with the bubbles that can display hundreds of indicators we plan a few small apps where you get a selected few indicators but can drill down. Start with world, world regions, countries, subnational level, sometimes you split male and female, sometimes counties, sometimes you split income groups. And we’re trying to make this in a coherent graphic and color scheme, so that we really can convey an upgraded world view.

Very very simple and beautiful but with very few jokes. Just straightforward understanding. And for climate impact we will relate to the economy. To relate to the number of people at different economic levels, how much energy they use and then drill down into the type of energy they use and how that energy source mix affects the carbon dioxide emissions. And make trends forward. We will rely on the official and most credible trend forecast for population, one, two or more for energy and economic trends etc. But we will not go into what needs to be done. Or how should it be achieved. We will stay away from politics. We will stay away from all data which is under debate. Just use data with good consensus, so that we create a basic worldview. Users can then benefit from an upgraded world view when thinking and debating about the future. That’s our idea. If we provide the very basic worldview, others will create more precise data in each area, and break it down into details.

A group of people inspired by your talk in Helsinki are currently starting a working group dedicated to opening up and reusing CO2 data. What advice would you give them and what would you suggest that they focus on?

Put me in contact with them! We can just go for one indicator: carbon dioxide emission per person per year. Swift reporting. Just that.

Thank you very much Professor Rosling.

Thank you.

If you want help to liberate, analyse or communicate carbon emissions data in your country, you can join the OKFN’s Open Sustainability Working Group.

Video: Julia Kloiber on Open Data

Rufus Pollock - October 3, 2012 in Ideas and musings, Interviews, OK Germany, OKFest, Our Work

Here’s Julia Kloiber from OKFN-DE’s Stadt-Land-Code project, talking at the OKFest about the need for more citizen apps in Germany, the need for greater openness, and how to persuade companies to open up.

Building the Ecology of Libraries – An Interview with Brewster Kahle

Adrian Pohl - March 23, 2012 in Featured, Interviews, OKCon, Open GLAM

This interview is cross-posted here and on the Open GLAM blog.

Kai Eckert (left) and Adrian Pohl interviewing Brewster Kahle at OKCon 2011

At OKCon 2011, we had the opportunity to interview Brewster Kahle who is a computer engineer, internet entrepreneur, activist, and digital librarian. He is the founder and director of the Internet Archive, a non-profit digital library with the stated mission of “universal access to all knowledge”. Besides the widely known “Wayback Machine“, where archived copies of most webpages can be accessed, the Internet Archive is very active in the digitization of books, as well, and provides with the “Open Library” a free catalog that aims to describe “every book ever published”. Kahle and his wife, Mary Austin, created the Kahle/Austin Foundation that supports the Internet Archive and other non-profit organizations.

As open data enthusiasts from the library world, we were especially interested in how the activities of the Internet Archive relate to libraries. We wanted to know how its general approach and service could be useful for libraries in Europe.

Brewster Kahle, what is the Internet Archive and what is your vision for its future?

The idea is to build the library of Alexandria version 2. The idea of all books, all music, all video, all lectures, well: kind of everything, available to anybody, anywhere that would want to have access. Of course, it’s not gonna be done by one organisation, but we hope to play a role by helping move forward libraries, ourselves and making as much technology as required to be able to fulfil this goal.

What are the obstacles preventing this happening in the moment?

We see the world splitting in two parts: There are the hyper-property interests and then there are the hyper-open interests, and I’d say actually the hyper-open is much more successful, but it’s not slowing down those that want to clamp down, shut down, control.

What are the challenges faced by the Internet Archive regarding the digitization of books?

There are two big problems: there is going and building a digital collection, either by digitizing materials or buying electronic books. And the other is: how do you make this available, especially the in-copyright works? For digitizing books, it costs about 10 cents a page to do a beautiful rendition of a book. So, for approximately 30 dollars a book for 300 pages you can do a gorgeous job. Google does it much more quickly and it costs only about 5 dollars for each book. So it really is much less expensive in less quality, but they are able to do things at scale. We digitize about 1000 books every day in 23 scanning centers in six countries. We will set up scanning centers anywhere, or, if there are people that would like to staff the scanners themselves, we provide the scanners and all of the backend processing for free, until we run out of scanners and we’ve got a bunch of them. So we’re looking either for people that want to scan their own collections by providing there own labour or they can employ us to do it and all told it is 10 cent a page to complete.

Also, part of building a collection is buying electronic books. But when I say buying, I really mean buying books, not licensing books, books that are yours forever. There are a growing number of publishers that will sell ebooks like music publishers now sell MP3s. That does not mean that we can post them on the internet for free for everybody, it means we can lend them, one person at a time. So if we have a hundred copies, then we can lend them out, it’s very much like normal book collections. This has a nice characteristic that it does not build monopolies. So instead of going in licensing collections that will go away if you stop licensing them, or they are built into one centralized collection like Elsevier, JSTOR, or Hathi Trust, the idea is to buy and own these materials.

Then there is the rights issue on how to make them available. Out of copyright materials are easy, those should be available in bulk – they often aren’t, and for instance Google books are not, they are not allowed to be distributed that way. But open books and libraries and museums that care should not pursue those closed paths.

For the in-copyright materials, what we have done is to work with libriaries. We are lending them, so that the visitors to our libraries can now access over 100.000 books that have been contributed by libraries that are participating.

There are now over 1000 libraries in 6 countries that are putting books into this collection that is then available in all other libraries. And these are books from all through the 20th century. So, if there are any libraries that would like to join, all they need to do is contribute a book from the post 1923 book from their collection to be digitized and made available and the IP addresses for their library. Then we will go and turn those on, so that all of those users inside that library or those that are dialing in (for instance via VPN) can borrow any of the 100.000 books. Any patron can borrow five books at one time and the loan period is for two weeks. But there is only one book circulating at any one time. So it is very good for the long tail. We think that this lending library approach and especially the in-library lending library has been doing very well. We have universities all over the world, we have public libraries all over the world now participating in building a library by libraries for libraries.

You already talked about cooperating with traditional libraries with in-copyright books. How is the cooperation between the Internet Archive – which itself seems to be a library – and traditional libraries in other fields, with respect to the digitization of public domain books for instance?

The way the Internet Archive works with libraries all over the world is by helping them digitize their books, music, video, microfilm, microfiche very cost-effectively. So we just try to cover the costs – meaning we make no profit, we are a non-profit library – and give the full digital copy back to the libraries and also keep a copy to make available on the Internet for free public access. Another way that we work with libraries is through this lending program where libraries just have to donate at least one book – hopefully many more to come – to go and build this collection of in-copyright materials that can then be lent to patrons again that are in the participating libraries. So those are the two main ways that we work with libraries on books. We also work with libraries on building web collections, we work with a dozen or so of the national libraries and also archive it. We have our subscription based services for helping build web collections, but never do things in such a way that if they stop paying, they lose access to data. The idea is to only pay once and have forever.

Are there already cooperations between European libraries and the Internet Archive, and are scanners already located in Europe which could be used for digitization projects in university or national libraries?

Yes, we have scanners now in London and in Edinburgh – the Natural History Museum and the national library – where we are digitizing now. We would like to have more scanners in more places so that anybody that would be willing to staff one, keep it busy for 40 hours each week, we will provide all of the technology for free or we can go and cooperate and we can hire the people and operate these scanning centers. We find that scanning centers – i.e. 10 scanners – can scan 30.000 books in a year and it costs about 10 cents a page. It is very efficient and very high quality. This is including fold outs and difficult materials as well. And it is working right within the libraries, so the librarians have real access to how the whole process is going and what is digitized. It is really up to the libraries. The idea is to get the libraries online as quickly as possible and without the restrictions that come with corporations.

A very important topic is also the digitization of very old materials, valuable materials, old prints, handwritten manuscripts and so on. How do you also deal with these materials?

We have been dealing now with printed material that goes back 500 years with our scanners, sometimes under very special conditions. The 10 cents a page is basically for normal materials. When you are dealing with handwritten materials or large-format materials, it costs a little bit more, just because it takes more time. All it really is, we are dealing with the labour. We are efficient, but it still does cost to hire people to do a good job of this. We are digitizing archives, microfilm, microfiche, printed materials that are bound, unbound materials, moving images, 16 mm as well as 8 mm films, audio recordings, LPs. We can also deal with materials that have already been digitized from other projects to make unified collections.

How do you integrate all these digitized objects and how do you deal with the specific formats that are used to represent and consume digital materials?

We use the metadata that comes from the libraries themselves, so we attach MARC records that come from the libraries to make sure that there’s good metadata. As these books move through time from organization to organization, the metadata and the data stays together. We then take the books after we photograph them, run them through optical character recognition so they can be searched and move them into many different formats from PDF, deja-vu, daisy files for the blind and the dyslexic, mobi-files for Kindle users, we can also make it available in some of the digital rights management services that the publishers are currently using for their in-print books. So all of these technologies are streamlined because we have digitized well over one million books. These are all available and easily plugged together. In total there are now over two million books that are availble on the Internet Archive website to end users to use through the website, where people can go and see, borrow and download all of these books. But If libraries want to go and add these to there own collections they are welcome to. So if there are 100.000 books that are in your particular language or your subject area that you would like to complete your collections with, let’s go and get each of these collections to be world class collections. We are not trying to build an empire, we are not trying to build one database, we want to build a library system that lives, thrives and also supports publishers and authors going forward.

So there’s the Internet Archive and the Open Library. Can you make the distinction any clearer for those that don’t currently understand it? is the website for all of our books, music, video and web pages. is a website really for books. The idea is to build an open catalog system: One webpage for every book ever published. And if it’s available online, then we point to it. If it’s available for sale in bookstores, we point to those. If it’s available in libraries for physical taking out, we point to that. is now used by over a 100,000 people a day to go and access books. Also, people have integrated it in their catalogs. That means, when people come to the library catalogs that we’re partnered with, they search their library catalog and pull down a list and either they’ve added the MARC records from the set into their catalog or, better yet, they just go and call an API such that there’s a little graphic that says: “You can get this online.” or: “This is available to be borrowed for free.” or: “It’s available for borrowing but it’s currently checked out.” So, those little lights turn on so that people can get to the book electronically right there from the catalog. We find that integration very helpful because we don’t want everyone coming to Open Library. They know how to use libraries, they use your library and your library catalog. Let’s just get them the world’s books. Right there and then.

You mentioned Google Books. There are agreements between libraries and Google for digitizing materials. What are the benefits for libraries of choosing the Internet Archive over Google Books?

Google offers to cover the costs of the labor to do the digitization, but the libraries that participated ended up paying a very large amount of money just trying to prepare the books and get them to Google. Often they spent more working with Google than they would have with the Internet Archive, and in the latter case they do not have restrictions on their materials. So Google digitizes even public domain materials and then puts restrictions on their re-use. Everybody that says that it is open has got to mean something bizarre by ‘open’. You can not go and take hundreds of these and move them to another server, it is against the law and Google will stop libraries that try to make these available to people and moving the books from place to place, so this is quite unfortunate.

Is Google reusing Internet Archive books in Google Books?

They are not, but Hathi Trust, the University of Michigan, is taking many of our books. Google is indexing them so that they are in their search engine.

People at OKCon naturally are supporters of Open Content, Open Knowledge but many libraries don’t like their digitized material to be open. Even public domain books which are digitized are normally kept on the libraries’ websites and by contracts or even by copyfraud they say: “You can not do whatever you want with this.” What would you say to libraries to really open up their digitized content?

There’s been a big change over the last four or five years on this. None of the libraries we work with – 200 libraries in a major way and now we have received books from over 500 libraries – have put any restrictions beyond the copyright of the original material. If there’s copyrighted material, then of course it has restrictions. But neither the libraries nor the Internet Archive are imposing new restrictions. You are right that there are some libraries that may not want to participate in this but this is what most libraries are doing – except for the Google libraries which are locking them up.

Do all the libraries provide high resoultion scans or do some choose to only provide PDFs?

All the high resolution, the original images, plus the cropped and descewed ones, all of these are publicly downloadable for free so that all analysis can be done. There’s now over one petabyte of book data that is available from the Internet Archive for free and active download. About 10 million books each month are being downloaded from the Internet Archive. We think this is quite good. We’d like to see even more use by building complete collections that go all the way to the current day. I’d say we are in pretty good shape on the public domain in the English language but other languages are still quite far behind. So we need to go and fill in better public domain collections. But I’d say a real rush right now is getting the newer books to our users and our patrons that are really turning to the internet for the information for their education.

To be more concrete. For libraries that are thinking about digitizing their own collections: what exactly do you offer?

Either write to or myself: We will offer digitization services at 10 cents a page. And if there’s enough to do, we’ll do it inside your library. If the library wants to staff their own scanner and start with one, then we can provide that scanner as long as it doesn’t cost us anything. Somebody has to go and cover the transportation and the set it up, these costs will be borne by the library. But then all of the backend processing or the OCR is provided for free. In the lending system it’s at least one book, a set of IP adresses, contact information and you’re done. No contracts, nothing.

Ok. So that means you offer the scanner technology for free, you offer the knowledge about how to use it for free. Only these additional costs for transportation have to be taken by the libraries. With your experience in digitization projects, every library should – and can – contact you and you explain the process to the people, you say what you’re doing, you give your opinion on how you would do it and then, of course, the library can decide?

Absolutely. We’ll provide all the help we can for free to help people through the process. We find that many people are confused and they’ve heard contradictory things.

Have you ever tried a kind of crowdsourcing approach for library users to digitize books themselves, placing a scanner in the library and let the users do it. Or does it take to much education for handling the scanners?

We find that it actually is quite difficult to go and digitize a book well, unfortunately. Though we have just hired Dan Reetz, who is the head of a Do-it-yourself bookscanner group. And we’re now starting to make Do-it-Yourself bookscanners that people can make themselves and the software automatically uploads to the Internet Archive. So we hope that there’s a great deal of uptake from smaller organizations or from indivudals. In Japan, for instance, many people scan books and we receive those. People upload maybe one or two hundred books to us a day. So, people are uploading things often from the Arab world. They are digitizing on their own and we want those as well. So, we can work with people if they have PDFs of scanned books or just sets of images from either past projects or current projects or if they want to get involved. There are many different ways we would love to help.

Does the Internet Archive collaborate with Europeana in some way, for example for making material from the Internet Archive available in Europeana?

We’ve met with some of the people from Europeana and I believe they have downloaded all of our metadata. All of our metadata is available for free. I’m encouraged by some of what I’ve seen from Europeana towards being a search engine. To the extent that they may grow into being the library for Europe I think this is not a good idea. I like to see many libraries, many publishers, many booksellers, many, many authors and everyone being a reader. What we want are many winners, we don’t want just one. So, Europeana to the extent that it’s just a metadata service, I think is a good project.

You just mentioned the metadata. So everything that you have, not only the digitized versions of the books but also the enrichments, the metadata about it, the OCR result, for example, everything is free and open. So, if I would like to, I could take the whole stuff and put it on my own server, re-publish it in the way that I want?

Yes, absolutely. All the metadata, the OCR files, the image files are all available. There are a lot of the challenges maintaining the files over time and we are committed to do this but we don’t want to be the only one. So the University of Toronto has taken back all of the 300,000 books that were digitized from their collections to put them on their servers and they’re now starting to look at other collections from other libraries to add those. As we move to digital libraries we don’t necessarily just need digital versions of the physical books we own, we want digital books that are of interest to our patrons. Yes, it is all available and it’s forming a new standard for openness.

The MARC records you mentioned, they are of course also available. So it makes sense for a library to include not only their own books but every book in the Internet Archive in their own catalog. Because, in fact, it is available to all the patrons. So, you could think of it as a possession of every library in the world. Is that right?

Yes, we see this as building the ecology of libraries. The really valuable thing about libraries, – yes: there are great collections – but the real value are the librarians, the experts, the people that know about the materials and can bring those to people in new and different ways. That’s the real value to our library system. So, let’s make sure, as we go through this digitization wave we don’t end up with just one library that kills off all of the others, which is a danger.

Thank you for the interview.


Kai Eckert is computer scientist and vice head of the IT departement of the Mannheim University Library. He coordinates the linked open data activities and developed the linked data service of the library. He held various presentations, both national and international, about linked data and open data.

Adrian Pohl has been working at the Cologne-based North Rhine-Westphalian Library Service Center (hbz) since 2008. His main focuses are Open Data, Linked Data and its conceptual, theoretical and legal implications. Since June 2010 Adrian has been coordinating the Open Knowledge Foundation’s Working Group on Open Bibliographic Data.

Acknowledgements: The introductory questions were taken from a former interview on the same day, conducted by Lucy Chambers.

Rufus Pollock on Open Science

Theodora Middleton - July 20, 2011 in External, Interviews, Open Economics, Open Science, WG Open Data in Science

The following guest post is by Maria Neicu, who’s studying at the University of Amsterdam. She’s a member of the OKF’s Working Group on Open Data in Science.

Rufus Pollock of the Open Knowledge Foundation recently gave a video interview on the topic of open science. Here are the videos, and summaries of what he had to say!

Firstly, in his introduction to Openness, Rufus explains the concepts of Open Science and Open Economics, describing the role of the Open Knowledge Foundation in promoting open publishing strategies for scientific data.

For a researcher, being open is an attitude, as well as a life philosophy, requiring the internalization of an ethic of collaborating, sharing and giving back to the community. Therefore, we should aim for a “distributed, collaborative, de-centralized model” of research culture. Rufus thus addresses policy makers who might invest in participative science, which involves the wider public and different expertise in open knowledge production using the potential of digital technologies.

Opening content implies a sustainable use and re-use of information, data filtering, but also a “commitment to greater documentation” and status validation within the scientific community. In imagining the advantages of living in a world in which everyone has access to all knowledge, the second part of the video entitled “Benefits of open science” tackles the current publishing paradigm. For example, Open Data in Science would avoid duplication efforts and thus be more sustainable. Even if there is a “default” mechanism of sharing knowledge already practiced by scientific researchers, this system needs to be changed and made functional in a world more defined by being a “shared enterprise”.

Thirdly, explaining “Why some disciplines are more open than others”, ‘Big Science’ such as physics, mathematics and genomics is depicted in a comparison between different scientific validation systems – from bureaucratic quota systems to informal actors. Looking at how publishing in monopolist elitist journals assigns status reveals the need for open science to set-up a reward system, to motivate researchers and enhance their reputation for opening-up access to their work.

As for the “Barriers, perceived risks, constraints for open science”, one of the proposed solution is to positively frame “collaboration” itself, even in a competitive environment like academia. Lastly, elaborating on “What we need to make open science happen” – the interview includes insights for online participative collaboration, and online tools for equipping funding bodies, like data-management systems.

To learn more about this important and complex topic, visit the OKF Open Data in Science Working Group homepage and get involved in further discussions surrounding open science and open data in science.

Interview with OKF Co-Founder Rufus Pollock on Open Spending

Jonathan Gray - June 7, 2011 in Interviews, OKI Projects, Open Knowledge Foundation, Open Spending, Where Does My Money Go

The following post is from Jonathan Gray, Community Coordinator at the Open Knowledge Foundation.

OKF Co-Founder Rufus Pollock recently interviewed at Open Tech 2011 about and

You can watch the video on or YouTube, or you can download it by right clicking here.

Get Updates