Dreams of a Unified Text

The following is a blog post by Rufus Pollock co-Founder of the Open Knowledge Foundation.

I have a dream, one which I’ve had for a while.

In this dream I’m able to explore, seamlessly, online, every text ever written. With the click of a button I can go from Pynchon to Proust, from Musil to Machiavelli, from Homer to Hugo.

And in this dream not only can I read, but I myself am able to contribute, to write upon these texts — to annotate, to anthologize, to interlink, to translate, to borrow — and to share what I do with others.

I can see what others have shared, what notes they have added, what selections they have made. I can see the interweaving of these texts created by borrowing, by inspiration, by reference, all made concrete by the insight and efforts of myself and others and their ability to layer their insights freely upon those original texts — just as those writers built upon the works that had gone before them.

And while each text still can stand still stand alone — in all its greatness or mediocrity — we have something new, a single unified corpus woven together out of this multitude of separate text — e pluribus unum.

A whole that is a concrete instantiation in an immaterial realm of the cultural achievement of mankind as expressed in the written word.

Dream Meets Reality

Why is this dream not yet a reality? After all don’t we have the tools and technology.

One answer is legal, one answer is technological, and one answer is social. The legal issue is copyright, at least in its current exclusive rights form 1. Copyright means this vision is only really possible for works in the public domain, works therefore that are, in most countries, a hundred years or more old. This isn’t necessarily that big a problem, at least for texts: the public domain though old is already incredibly rich and so we therefore already have more than enough material to be getting on with.

On the technology front we have the cost of digitization, processing and storage. Digitization costs are significant. This has meant either that digitization activities have either been limited or the material created has not been released openly (for example, the material produced by Google’s efforts with its Books project, which is probably largest effort to date, is not open). That said, efforts like Project Gutenberg and the Internet Archive have already made available tens of thousands of texts, and there are now several digitization projects underway that will result in even larger amounts of material freely and openly available.

Then third we have the social issue, or rather it a question of how technology can support the social activities required for this dream of a unified text to become real. Specifically, to realize our dream we need to bring material — texts and the writing upon them — together in a single coherent experience. Yet the centralization (and ownership) that implies may be a significant obstacle to mass participation.2 Similarly, we need it to be possible for anyone with ‘net access to be able to contribute to the weaving of the unified inter-text but, at the same time, to be able to select which contributions we want to see (if we are not to be overwhelmed by an avalanche of material, much of it possibly of dubious quality).


We have then within our grasp, the realization of the dream of a unified text. Combining of text of technology we can create something truly extraordinary.

Interested in making this happen, come join us at the Textus Project.

  1. Let me be clear, I’m not saying that copyright is per se is bad or that everything should be ‘free’. Time, energy and capital are required to create books, music and films and that expenditure often needs to be recompensed. However, the current system of copyright is by no means the best way to achieve this. This is not something I wish to explore in detail here. More can be found on my personal website and in papers such as Forever Minus a Day: Theory and Empirics of Optimal Copyright 
  2. This tension between distributed collaboration and centralizing tendencies of coordination and scale is a common theme in many ‘net projects.