To be successful the web sacrificed some of the features of hypertext systems. Things like backwards linking and link integrity, etc. One of the great things about the web is that its possible to rebuild some of those features, but in a distributed way. Different communities can then address their own requirements.
Link integrity is one of those aspects. In many cases link integrity is not an issue. Some web resources are ephemeral (e.g. pastebin snippets), but others — particularly those used and consumed by scholarly communities — need to be longer lived. CrossRef and other members of the DOI Foundation have been successfully building linking services that attempt to provide persistent links to material references in scholarly research, for many years.
Yesterday Geoff Bilder published a great piece that describes what CrossRef and others are doing in this area, highlighting the different communities being served and the different features that the services offer. Just because something has a DOI doesn’t necessarily make it reliable, give any guarantees about its quality, or even imply what kind of resource it is; but it may have some guarantees around persistence.
Geoff’s piece highlights some similar concerns that I’ve had recently. I’m particularly concerned that there seems to be some notion that for something to be citeable it must have a DOI. That’s not true. For something to be citeable it just needs to be online, so people can point at it.
There may be other qualities we want the resource to have, e.g. persistence, but if your goal is to share some data, then get it online first, then address the persistence issue. Data and content sharing platforms and services can help there but we need to assess them against different criteria, e..g whether they are good publishing platforms, and separately whether they can make good claims about persistence and longevity.
Assessing persistence means more than just assessing technical issues, it means understanding the legal and business context of the service. What are its terms of service? Does the service have any kind of long term business plan that means it can make viable claims about longevity of the links it produces, etc.
I recently came across a service called perma.cc that aims to bring some stability to legal citations. There’s a New York Times article that highlights some of the issues and the goals of the service.
The perma.cc service allows users to create stable links to content. The content that the links refer to is then archived so if the original link doesn’t resolve then users can still get to the archived content.
This isn’t a new idea: bookmarking services often archive bookmarked content to build personal archives; other citation and linking services have offered similar features that handle content going offline.
It’s also not that hard to implement. Creating link aliases is easy. Archiving content is less easy but is easily achievable for well-known formats and common cases: it gets harder if you have to deal with dynamic resources/content, or want to preserve a range of formats for the long term.
It’s less easy to build stable commercial entities. It’s also tricky dealing with rights issues. Archival organisations often ensure that they have rights to preserve content, e.g. by having agreements with data publishers.
Personally I’m not convinced that perma.cc have nailed that aspect yet. If you look at their terms of service (PDF, 23rd Sept 2013), I think there are some problems:
You may use the service “only for non-commercial scholarly and research purposes that do not infringe or violate anyone’s copyright or other rights“. Defining “non-commercial” use is very tricky, it’s an issue with many open content and data licenses. One might argue that a publisher creating perma.cc links is using it for commercial purposes.
But I find Section 5 “User Submitted Content and Licensing” confusing. For example it seems to suggest that I either have to own the content that I am creating a perma.cc link for, or that I’ve done all the rights clearance on behalf of perma.cc.
I don’t see how that can possibly work in the general case. Particularly as you must also grant perma.cc a license to use the content however they wish. If you’re trying to build perma.cc links to 3rd party content, e.g. many of the scenarios described in the New York Times article, then you don’t have any rights to grant them. Even if its published under an open content license you may not have all the rights they require.
They also reserve the right to remove any content, and presumably links, that they’re required to remove. From a legal perspective this makes some sense, but I’d be interested to know how that works in practice. For example will the perma.cc link just disappear or will there be any history available?
Perhaps I’m misunderstanding the terms (entirely possible) or the intended users of the service, I’d be interested in hearing any clarifications.
My general point here is not to be overly critical of perma.cc — I’m largely just confused by their terms. My pointis that bringing permanence to (parts of) the web isn’t necessarily a technical issue to solve, its one that has important legal, social and economic aspects.
Signing up to a service to create links is easy. Longevity is harder to achieve.
Image: Links by RubyGold, CC-BY-NC