Dead knowledge: why being explicit about openness matters

When I think of the amount of knowledge that is ‘dead’ because of a lack of explicitness about its ‘openness’ I am always surprised by the number of examples. Consider the following two:

Example 1: Everything2 and h2g2

Years ago, back when I was at university I remember stumbling across <http://www.everything2.com/>. Shortly thereafter I remember being shown <http://www.h2g2.com/> by a friend who’d just posted a write up of the Arrow impossibility theorem. Long before wikipedia these sites were demonstrating the ability of decentralized, uncoordinated users to generate a huge amount of interesting and valuable (though fairly unstructured) information.

Thinking about these two sites recently I asked myself: ‘what license did they use’ and, relatedly, ‘am i allowed to download/redistribute/incorporate their data in another project?’. The answer was perhaps unsurprising: neither site seemed to have thought about it — at least not originally — and, as a consequence, their copyright policy was the default: everyone retains copyright to what they do. (As is typical of anything involving copyright things are a little more complex: h2g2 after its take over by the BBC adopted a policy whereby contributors retain copyright in their articles but grant the BBC non-exclusive licence to use it as they see fit. To further complicate matters the BBC claims to retain copyright in ‘Edited Entries’ because a BBC editor has checked and/or altered the article).

Hence with respect to my second question: ‘am i allowed to redistribute/reuse their material’ the simple answer was: No — I’d would have to go out and identify, and then gain permission, from each contributor; an endeavour that would clearly be prohibitively time consuming. And this is despite the fact that — from their very participation — it is clear that the vast majority of individuals who made contributions to these sites wanted others to be able to freely access their work (and freely reuse it as well in all likelihood).

While implicitly anything put on the web is there to be freely accessed when it comes to (re)using — and redistributing (hosting) — that material explicitness really matters. Once you start building any kind of ‘commons’ in which multiple contributors are the norm1 this becomes especially important since relying purely on tacit agreements and implicit consent becomes a major obstacle and serious threat to the long term future, and value, of that information.

In a world in which information rots away in the form of disappearing links and disappearing pages far faster than that inscribed on the physical paper of books the ability to copy, and then to redistribute, is the only way for most works to have any permanent existence — be it one which is fragemented and partial — for it is the only then can it be ‘mirrored’, archived, made available in myriad ways, in short kept alive.

Because no effort was made to have an explicit licensing policy these ‘knowledge-bases’ have, in effect, become partially ‘closed’. While open for access — at least as long as their parent organizations continue to exist — the opportunities for reuse and redistribution have been drastically curtailed. With the advent of Wikipedia which adopted a ‘share-alike’ type license from the very start, these sites have, in many respects, been superseded and it is particularly telling that there are dedicated Wikipedia pages with instructions for ‘node’ owners on everything2 and h2g2 on how to move their content to Wikipedia2.

Example 2: Crystallographic data structures

Recently I was chatting with a Peter Murray-Rust, head of the Unilever informatics lab at Cambridge University, and one of the pioneers of open knowledge (he’s also the man behind SAX, chemical mime, the world-wide-molecular-matrix and his latest collaboration in open chemistry is <http://www.blueobelisk.org/>).

He was telling me about how crystallographers get asked to do analyses. Roughly each analysis costs between 300 and 600 pounds. Now what happens to the data (‘structures’) produced by these analyses. Sometimes they get published (in Acta Crystallographica) but often they just sit in a basement draw gathering dust. Peter said that he had colleagues in Austrailia who had close on 1000 such unpublished ‘structures’. That’s between 300k and 600k in data gathering dust.

So why does this happen? Peter suggested two reasons. They both relate to the circumstances in which the analysis occurs so let me explain that first.

These analyses are often commissioned by someone else (either in industry or academia) in relation to work they are doing. Often the crystallographic analysis is just a check and will only end up being mentioned in a footnote, if mentioned at all (something like: ‘Our hypothesis as to the structure of this molecule was confirmed by crystallographic analysis ….’).

As a result first the crystallographer can’t publish immediately since this might be preempt the associated paper or disclose sensitive information about what a company is working on. Second it is unclear who ‘owns’ the rights in the data — is it the crystallographer or the entity which commissioned the analysis? Together these uncertainties combine to place a dead hand upon publication except in circumstances where the crystallographer did the analysis on their account.

With more explicitness about the legal status (particularly if the default were that the data were open) and efforts to address the social issues (perhaps a delay of three years after which publication is allowed) access could be greatly improved.

Conclusion

To stitch together the knowledge commons it’s not good enough for information to be implicitly open, it has to be explicitly open. To be explicity open it must have clearly attached an open knowledge license. Without this the knowledge produced immediately becomes ‘locked’: in order to do anything other than have the information sit there on the original server requires a rights-clearance effort of such daunting proportions as to be completely infeasible.

Furthermore when engaging in any kind of collaborative effort — the norm on the web — the adoption of an explicitly open approach can be considered as providing a form of social contract among the participants which is clearer than the informal tacit arrangments which would otherwise operate.


  1. Here we need not be thinking only of massively collaborative endeavours involving hundreds or thousands of contributors but a popular weblog where apart from the original author you may have dozens of different individuals commenting on posts. 
  2. <http://en.wikipedia.org/wiki/Wikipedia:Guide_for_Everything2_Noders> and <http://en.wikipedia.org/wiki/Wikipedia:Guide_for_h2g2_Researchers>