When I think of the amount of knowledge that is ‘dead’ because of a lack of explicitness about its ‘openness’ I am always surprised by the number of examples. Consider the following two:

Example 1: Everything2 and h2g2

Years ago, back when I was at university I remember stumbling across http://www.everything2.com/. Shortly thereafter I remember being shown http://www.h2g2.com/ by a friend who’d just posted a write up of the Arrow impossibility theorem. Long before wikipedia these sites were demonstrating the ability of decentralized, uncoordinated users to generate a huge amount of interesting and valuable (though fairly unstructured) information.

Thinking about these two sites recently I asked myself: ‘what license did they use’ and, relatedly, ‘am i allowed to download/redistribute/incorporate their data in another project?’. The answer was perhaps unsurprising: neither site seemed to have thought about it — at least not originally — and, as a consequence, their copyright policy was the default: everyone retains copyright to what they do. (As is typical of anything involving copyright things are a little more complex: h2g2 after its take over by the BBC adopted a policy whereby contributors retain copyright in their articles but grant the BBC non-exclusive licence to use it as they see fit. To further complicate matters the BBC claims to retain copyright in ‘Edited Entries’ because a BBC editor has checked and/or altered the article).

Hence with respect to my second question: ‘am i allowed to redistribute/reuse their material’ the simple answer was: No — I’d would have to go out and identify, and then gain permission, from each contributor; an endeavour that would clearly be prohibitively time consuming. And this is despite the fact that — from their very participation — it is clear that the vast majority of individuals who made contributions to these sites wanted others to be able to freely access their work (and freely reuse it as well in all likelihood).

While implicitly anything put on the web is there to be freely accessed when it comes to (re)using — and redistributing (hosting) — that material explicitness really matters. Once you start building any kind of ‘commons’ in which multiple contributors are the norm1 this becomes especially important since relying purely on tacit agreements and implicit consent becomes a major obstacle and serious threat to the long term future, and value, of that information.

In a world in which information rots away in the form of disappearing links and disappearing pages far faster than that inscribed on the physical paper of books the ability to copy, and then to redistribute, is the only way for most works to have any permanent existence — be it one which is fragemented and partial — for it is the only then can it be ‘mirrored’, archived, made available in myriad ways, in short kept alive.

Because no effort was made to have an explicit licensing policy these ‘knowledge-bases’ have, in effect, become partially ‘closed’. While open for access — at least as long as their parent organizations continue to exist — the opportunities for reuse and redistribution have been drastically curtailed. With the advent of Wikipedia which adopted a ’share-alike’ type license from the very start, these sites have, in many respects, been superseded and it is particularly telling that there are dedicated Wikipedia pages with instructions for ‘node’ owners on everything2 and h2g2 on how to move their content to Wikipedia2.

Example 2: Crystallographic data structures

Recently I was chatting with a Peter Murray-Rust, head of the Unilever informatics lab at Cambridge University, and one of the pioneers of open knowledge (he’s also the man behind SAX, chemical mime, the world-wide-molecular-matrix and his latest collaboration in open chemistry is http://www.blueobelisk.org/).

He was telling me about how crystallographers get asked to do analyses. Roughly each analysis costs between 300 and 600 pounds. Now what happens to the data (’structures’) produced by these analyses. Sometimes they get published (in Acta Crystallographica) but often they just sit in a basement draw gathering dust. Peter said that he had colleagues in Austrailia who had close on 1000 such unpublished ’structures’. That’s between 300k and 600k in data gathering dust.

So why does this happen? Peter suggested two reasons. They both relate to the circumstances in which the analysis occurs so let me explain that first.

These analyses are often commissioned by someone else (either in industry or academia) in relation to work they are doing. Often the crystallographic analysis is just a check and will only end up being mentioned in a footnote, if mentioned at all (something like: ‘Our hypothesis as to the structure of this molecule was confirmed by crystallographic analysis ….’).

As a result first the crystallographer can’t publish immediately since this might be preempt the associated paper or disclose sensitive information about what a company is working on. Second it is unclear who ‘owns’ the rights in the data — is it the crystallographer or the entity which commissioned the analysis? Together these uncertainties combine to place a dead hand upon publication except in circumstances where the crystallographer did the analysis on their account.

With more explicitness about the legal status (particularly if the default were that the data were open) and efforts to address the social issues (perhaps a delay of three years after which publication is allowed) access could be greatly improved.

Conclusion

To stitch together the knowledge commons it’s not good enough for information to be implicitly open, it has to be explicitly open. To be explicity open it must have clearly attached an open knowledge license. Without this the knowledge produced immediately becomes ‘locked’: in order to do anything other than have the information sit there on the original server requires a rights-clearance effort of such daunting proportions as to be completely infeasible.

Furthermore when engaging in any kind of collaborative effort — the norm on the web — the adoption of an explicitly open approach can be considered as providing a form of social contract among the participants which is clearer than the informal tacit arrangments which would otherwise operate.


  1. Here we need not be thinking only of massively collaborative endeavours involving hundreds or thousands of contributors but a popular weblog where apart from the original author you may have dozens of different individuals commenting on posts. 

  2. http://en.wikipedia.org/wiki/Wikipedia:Guide_for_Everything2_Noders and http://en.wikipedia.org/wiki/Wikipedia:Guide_for_h2g2_Researchers 

Related posts:

  1. Open Data: Openness and Licensing Why does this matter? Why bother about
  2. What Obama can do to promote openness With the inauguration of US President-El
  3. Talk at Law 2.0: Openness, Web 2.0 and the Ethic of Sharing Yesterday I was at the SCL’s
  4. A Map of Openness? We’ve recently been in conversatio
  5. Open Data in Agriculture and Why It Matters The following guest post is from Elizabe

Related posts brought to you by Yet Another Related Posts Plugin.

5 Responses to “Dead knowledge: why being explicit about openness matters”

  1. Open Knowledge Foundation Weblog » Blog Archive » The re:transmission of video data Says:

    [...] I attended the meeting to argue, amongst other things, that the group should consider specifying data license as a required field and not as an optional one in their metadata model. In stating the case for this, I am thinking particularly of Rufus’ essay on the importance of being explicit about openness. He cites examples of collections of contributed works which have “died” because the right to re-use the material is clear; the rights technically reside in each individual contributor, who has to be consulted before the work can be legally re-used and redistributed. [to] my second question: ‘am i allowed to redistribute/reuse their material’ the simple answer was: No — I’d would have to go out and identify, and then gain permission, from each contributor; an endeavour that would clearly be prohibitively time consuming. From the point of view of an individual “client” or “consumer” licensing clarity may not be much of a consideration; but for the operation of an aggregator, collecting and providing scheduled or edited collections of feeds from lots of different media publishers, explicit openess becomes much more crucial. At the meeting I heard expressed some resistance to imposition of licensing stance on the grounds that one kind or another of more or less open license is an “ideological” decision and not a “technological” one. I’m not sure the boundaries can be so clearly drawn. There is some resistance to “enforcing” “compliance” in the standard by requiring a statement about licensing - even if all that says is “Public Domain”. The alternative - encouraging compliance in specifications for standards-based publishing software - is still a kind of technological enforcement. There are cultural reasons for participating in an open knowledge network like the one embodied by Transmission. …when engaging in any kind of collaborative effort — the norm on the web — the adoption of an explicitly open approach can be considered as providing a form of social contract among the participants which is clearer than the informal tacit arrangments which would otherwise operate. [...]

  2. Open Knowledge Foundation Weblog » Blog Archive » Is Citizendium Not Open? Says:

    [...] Wikipedia’s (explicit) openness was a significant factor in its continuing success [...]

  3. Open Text Book » Some open maths textbooks Says:

    [...] It is worth chasing up textbook authors to ask them to clarify whether or not their work is open, and to suggest using an explicitly open license if it is. (See Dead knowledge: why being explicit about openness matters for more on this.) [...]

  4. Tuxiano » Blog Archive » OpenTextBook Says:

    [...] Ecco perchè accolgo con moderato entusiasmo e con tanta speranza il progetto OpenTextBook il cui obiettivo è proprio quello di convogliare le energie nella creazione di testi scientifici espressamente liberi. Il buon Stallman avrebbe certo preferito l’uso della parola Free rispetto ad Open ma l’iniziativa sembra essere estremamente chiara e rigorosa su quello che viene perseguito. In particolare consiglio di leggere questa pagina in cui si spiega molto chiaramente perchè essere chiari circa la libertà è necessario per evitare la “morte” della conoscenza. [...]

  5. Open Knowledge Foundation Weblog » Blog Archive » On Getting Raw Data for Cancer Research Says:

    [...] This is an excellent particular case of a more general line we take at the OKF (e.g. see Give Us the Data Raw, and Give it to Us Now and Dead Knowledge: why being explicit about openness matters). Surely much is lost if data that could prove useful to cancer researchers sits collecting dust. Much could be gained if more trials data was open. [...]

Leave a Reply

Subscribe without commenting