Dr. Paolo D’Iorio recently invited me to attend the first meeting of an EU funded Working Group “devoted to analyzing the current debate on the legal, economic and social conditions for setting-up open scholarly communities on the web”. The meeting was part of COST:

COST – European Cooperation in the field of Scientific and Technical Research – is one of the longest-running European instruments supporting cooperation among scientists and researchers across Europe. COST is also the first and widest European intergovernmental network for coordination of nationally funded research activities.

Action 32, of which Dr. D’Iorio is Chair, is called “Open Scholarly Communities on the Web” and has two aims:

  • to create a digital infrastructure for collaborative humanities research on the Web; and
  • to establish and foster the growth of Scholarly Communities that will provide feedback to the IT developers regarding the needs and expectations of humanities researchers and will serve as a core group of early adopters.

Talks included:

  • Paolo D’Iorio (CNRS-ITEM, Paris), How to build a Scholarly Community on the Web
  • Maria Chiara Pievatolo (University of Pisa), Copyright in Europe. History and perspectives
  • Thomas Margoni (University of Trento), How to access primary sources in Europe. The legal framework
  • Annaïg Mahé (URFIST, Paris), The market for SSH Journals in Europe
  • Jennie Grimshaw (British Library), Negotiating spaghetti junction: legal constraints on archiving government e-documents in the UK
  • Christine Madsen (OII, Oxford), The significance of “marketing” digital collections: the case of Harvard
  • Yann Moulier Boutang (Professeur de sciences Economiques - Université de Technologie de Compiègne, Directeur adjoint de Laboratoire de l’Unité de Recherche EA 22 23), Economic model(s) of Scholarly Communities: Open Source or Creative commons?
  • Francesca Di Donato (University of Pisa), The evaluation of science. From peer review to open peer review
  • Eric Meyer and Ralph Schroeder (OII, Oxford), Open Access and Online Visibility in the Age of e-Research

Notes and comments

  • For many humanities subjects, having something like the public domain calculators would help to facilitate the growth of open resources for scholarly communities built on works in which the copyright has expired.
  • Paolo’s presentation of Nietzsche Source and the Discovery project gave a compelling vision of how communities might grow around a resource for corpus based scholarship - with users having their own virtual workspace with annotations and notes that could be shared with other users. The ‘Scholarsource’ system would have stable URLs to support accurate citation, and robust ontologies to facilitate exploration of the material. Licensing that permits re-distribution is also a good preservation strategy.
  • The term ‘open’ was often not used in the sense of the Open Knowledge Definition. Several projects used licenses with non-commercial restrictions. While some participants assumed that scholars and institutions would often prefer that their work was not exploited commercially - it would be great if public domain sources such as documents, images and records, could be published under an open license. An approach which recommended open licensing for material that had not been enhanced (scans, text files …) could help to stimulate the growth of a commons that would encourage greater experimentation and collaboration than one which restricted certain kinds of re-use (cf. 7. and 8. in the OKD).
  • The importance of a close working relationship between scholarly communities and technologists. It is crucial that technical development is informed by the needs and working practices of researchers. This is something we’ve been thinking about in relation to Open Shakespeare and Open Milton. Open licensing allows developers to experiment with scholarly material to develop new tools and applications that could be of unanticipated value (e.g. semantic approaches, text analysis or visualisation).
  • Legal, technological and social obstacles to building open scholarly communities. We have various legal mechanisms and emerging technologies to facilitate such communities. Sometime the most hard parts are social - in growing user base, increasing participation and so on. Value and limits of ‘build it and they will come’ approach.

Make Textbooks Affordable, a campaign composed of Student Associations and Public Interest Research Groups from across the US, yesterday released a statement in support of open textbooks signed by 1000 academics. From the press release:

Open textbooks are complete, reviewed textbooks written by academics that can be used online at no cost and printed for a small cost. What sets them apart from conventional textbooks is their open license, which allows instructors and students flexibility to use, customize and print the textbook. Open textbooks are already used at some of the nation’s most prestigious institutions - including Harvard, Caltech and Yale - and the nation’s largest institutions - including the California community colleges and the Arizona State University system.

“Open textbooks are comparable, affordable and flexible alternatives to traditional expensive textbooks,” said Professor Linda Bisson, Chair of the Enology and Viticulture Department at the University of California, Davis. “Not only do they save students money, but they provide instructors with a high-quality textbook that they can customize to meet their needs.”

Textbooks cost students an average of $900 per year, which is a quarter of tuition at an average four-year public university and nearly three-quarters of tuition at a community college, according to a study conducted by the Government Accountability Office (GAO).

“Textbooks can price students out of higher education. With costs rising faster than inflation and tuition, some students are faced with the difficult choice to drop out, take on additional debt, or undercut their own learning by not purchasing textbooks,” said Nicole Allen, Textbooks Advocate for The Student PIRGs.

Research conducted by The Student PIRGs identifies publisher tactics as the primary cause of escalating prices. Bundling textbooks with unnecessary supplements forces students to purchase items they do not need; unnecessary new editions undermine the used book market; and withholding critical price information keeps faculty in the dark.

“As faculty members, our top priority is to choose the textbook that is best for our students. We share concerns about affordability, and face similar frustrations with publisher practices,” said Sandra Schroeder, Chair of the American Federation of Teachers Higher Education Program and Policy Council. “Open textbooks and other affordable options, when appropriate for a course, are a win-win for everyone.”

On the What are Open Textbooks? page, they mention our Open Text Book project, and the Open Knowledge Definition - which is great to see! Its good that they emphasise the importance of licensing that permits people to “reproduce, customize, or distribute” as well as access.

However while they allude to Creative Commons licenses - they don’t explicitly distinguish between those licenses which are open (Creative Commons Attribution and Attribution Sharealike), and those which are not (Creative Commons licenses with No Derivatives or Non-commercial options).

While the latter do afford people more choice about what can be done with their work - there are problems with interoperability, and do not serve well as the basis of an ecosystem of textbooks and textbook content that may be built upon, modified and redistributed without restriction. For example, publishers may not have the incentive to add value to existing content if they would be unable to re-distribute this in a commercial context.

Nevertheless its fantastic to see growing support for open textbooks!

Open Data Going Mainstream?

April 10th, 2008

Bret Taylor’s recent post entitled “We Need a Wikipedia for Data” has been garnering a lot of attention around the blogosphere. While his suggestions are not particularly novel, the post and the attention it has garnered, is, I think, indicative of the growing interests in the issues of (open) data and its importance for the development of related services and products.

While generally in agreement with Bret’s arguments, there are a few differences that are worth raising. First Bret appears to favour some kind of centralized repository that everyone can read from and write to:

To this end, I think we should create a Wikipedia for data: a global database for all of these important data sources to which we all contribute and that anyone can use.

As readers of this blog will know, we’re sceptical of this ‘one ring to rule them all’ approach. In this regard, it is also important to distinguish finding material, parsing it, and plugging it together, issues that got rather run together in the surrounding discussion. As I wrote in a comment to Bret’s post:

There seem to be several distinct issues you (and your commenters) are concerned with:

1. Discoverability of datasets. For this you want a registry of some kind and this is exactly what the Comprehensive Knowledge Archive Network (CKAN) is designed to do. …

2. ‘Developing’ data particularly using many contributors and a versioning (wiki-like) model. This seems a general problem and one which I wrote about in this post on the collaborative development of data back in February last year. Since then various projects have launched or developed which attempt to address this issue, even if only partially (e.g. Freebase, Swivel, Numbrary, http://www.openeconomics.net …). This then leads into:

3. Componentizing data so that one can easily plug different datasets together rather than having to aggregate data together in one big place (crudely: ‘One Ring to Rule them All’ vs. ‘Small Pieces, Loosely Joined’). After all it seems unlikely that any one organization, however large, can hold ‘all the data’, and in ay case doing so would negate the benefits of having ‘many minds’ working on a problem. It is our hope that CKAN would start to facilitate the kind of packaging that one frequently observes in software but is, as yet, fairly rare for knowledge (data/content/…). More on this can be found in this blog post on componentization plus the slides from our presentation at XTech.

To conclude, I definitely agree about the importance of having more open data and making it easier to find and use though I’m hoping that it will take a more decentralized and componentized form than simply a ‘wikipedia’ for data. More important though than any details is the fact that this kind of interest from a wider audience indicates that issues of data openness and production are going mainstream — something we as a community should strongly welcome.

OKCon 2008

We’re pleased to announce that audio, images and slides from OKCon 2008 are now available at the Post-Event Information page.

Most of the material can be obtained from the OKF subversion repository.

If you’ve blogged the event or have pictures or the like, please let us know and we’ll post a link from the Post-Event page. We are also able to host any further documentation in the repository.

Many thanks to all of you who came to speak, present and participate! We had a great day and very much enjoyed the talks, demos and conversations that took place throughout the day.

We’ve now set up a wiki page for local Open Knowledge groups - to arrange meetups, forums and other activities:

In addition to the Cambridge group, which has been around for a few years, we are in the process of creating groups in London and Oxford. If you’d like to get involved in any of these, or you’d like to set up your own local group - don’t hesitate to get in touch!

Our second annual Open Knowledge Conference (OKCon) is taking place tomorrow. Like last year, the event will bring together individuals and groups from across the open knowledge spectrum for a day of seminars and workshops. Though we’re nearing capcity, there are still a few places left for last minute registrants!

Details

Speakers

Session 1 (1045-1200): Transport and Environment

  • Gavin Starks (AMEE and dgen)
  • Tom Steinberg (MySociety)
  • Dr Muki Haklay (Department of Civil, Environmental and Geomatic Engineering, University College London)

Session 2 (1200-1315): Visualization and Analysis

  • Liz Turner (Freelance Designer and Visualizer Extraordinaire)
  • Gael Varoquaux (Mayavi2 - the next Generation Visualization Toolkit)
  • Martin Albrecht (SAGE the Open Source Mathematics Engine)

Session 3 (1415-1530): Education and Academia

  • Erik Duval (ARIADNE)
  • Lisa Petrides (OER Commons)
  • Dr Martin Brett (Cambridge University History Department and the Ivo Project)

Open Space

  • 1540-1640 (Room 1): Open Media
  • 1540-1640 (Room 2): Remixing, Peer Production and Open Knowledge
  • 1645-1745 (Room 1): Law, Licensing and Policy
  • 1645-1745 (Room 2): Versioning, Packaging, and Structuring Open Material
  • 1750-1830 (Room 1): Kept free for spontaneous contributions and breakout sessions

A more detailed schedule can be found at the Open Space wiki page

Theme

‘Open Knowledge’ is material that others are free to access, reuse or re-distribute and may be anything from sonnets to statistics, genes to geodata. In recent years we’ve seen the growth of successful open knowledge projects - from peer reviewed journals to community edited encyclopaedias - but what impact can open licensing have in education, research and commerce? Is sharing the key to scaling? What kinds of business models are available to open knowledge distributors and how is open knowledge applied in different institutional and professional contexts?

Furthermore, there now exist large and growing amounts of open material but what kinds of tools are available to analyse and represent it? How can we sort, search, store it to maximise its visibility and reusability?

We’ve also witnessed in the last few years the rise of web-based services — from social networking sites to online spreadsheet packages. While we have definitions for open software and open knowledge, what is an open service and what kinds of new services can be built using open knowledge?

Organizers

OKCon is organized by the Open Knowledge Foundation in partnership with the LSE Information Systems and Innovation Group.

Given the public role of libraries and the fact that bibliographic metadata (i.e. the material in library catalogues) doesn’t seem that exciting from a commercial point of view you might think that, of all the types of data out there, it would be bibliographic data that would be the most open. You might even think, given the public-spiritedness of librarians, that this is the kind of area where not only could it be openly available but it would be openly available (in nice little bzip or gzipped dumps …).

In fact the situation is quite the opposite. Most libraries appear to implicitly or explicitly exert rights over their data with some libraries licensing access to their catalogue data for substantial sums of money. The following lists some of the examples (both closed and open) that we know of:

  • Library of congress: public domain in the US (or at least free) but copyrighted outside the US. See [1] and comments in in fred2.0 readme which state:

    These data are works of the United States Government and as such are not subject to copyright within the United States. (17 U.S.C §105).

    The Library of Congress has copyrighted these data for use outside the United States. Contact the LC for permission prior to use or distribution of this data outside the United States. [http://www.loc.gov/cds/mds.html]

  • fred2.0 (fred2.0 CKAN package): an excellent example of the effort to make material available but unfortunately has same restrictions as Library of Congress (from which the material is sourced).
  • British Library: closed (and apparently gets sold for substantial sums).
  • OCLC/Worldcat: closed. See the OCLC CKAN page.
  • Barton/Simile: semi-open. Sourced from OCLC. Originally taken down but now back under CC non-commercial. See [1] for further discussion.
  • OpenLibrary: in theory open (though no formal license or dump as yet and some material may have been sourced from LoC making it suspect outside of the US)
  • isbndb.com: not really fully bibliographic data and status uncertain (see isbndb.com CKAN page)
  • LibraryThing: closed. Does not seem to make data available and source would likely make this problematic (from the about page):

    LibraryThing uses Amazon and libraries that provide open access to their collections with the Z39.50 protocol. The protocol is used by a variety of desktop programs, notably bibliographic software like EndNote. LibraryThing appears to be the first mainstream web use.

As we continue to search for open sources of bibliographic data we’d love to hear from anyone who knows of examples not already on this list.

[1] http://www.bookism.org/open/2007/04/02/open-data-what-would-kilgour-think/

Yesterday Creative Commons announced that their Attribution and Attribution Sharealike licenses will feature a seal of approval and link to Freedom Defined - the Definition of Free Cultural Works. We’ve been in touch with Freedom Defined since May 2006 (we blogged about the project last year) as their aims are so similar to that of opendefinition.org and the Open Knowledge Definition.

While there was discussion last year of merging the two projects, it now looks as though they will remain complementary - with Freedom Defined focusing on cultural works, and with the Open Knowledge Definition retaining a broader conception of ‘knowledge’ that includes data (see e.g. Good news for open data).

Mike Linksvayer of Creative Commons comments:

This added signaling is part of an ongoing effort to distinguish among the range of Creative Commons licenses — never say the Creative Commons license, as there is no such thing. Our license deeds have always communicated the distinct properties of each license with icons and brief descriptions.

This is great news and will hopefully contribute to the strengthening of a more robust sense of free culture/open knowledge within the plethora of liberal licensing options that are now available!

We are pleased to announce the launch of an Advisory Council for opendefinition.org. The Council will be formally responsible for maintaining and developing the Definitions and associated material found on the Open Definition site - including the Open Knowledge Definition and the Open Service Definition. As many of you will know, these definitions aim to provide clear and succinct sets of conditions for ‘openness’ in knowledge and services.

Jordan Hatcher of opencontentlawyer.com has kindly agreed to be Chair of the Council, which includes:

  • Paul Jacobson, iCommons
  • Paul Miller, Talis
  • Peter Murray-Rust, Cambridge University
  • Rufus Pollock, Open Knowledge Foundation & Cambridge University
  • Rob Styles, Talis
  • Peter Suber, Scholarly Publishing and Academic Resources Coalition (SPARC) & Earlham College
  • Luis Villa, Columbia Law School, GNOME Foundation & Open Source Initiative
  • Jo Walsh, Open Knowledge Foundation & Open Source Geo-Spatial Foundation
  • John Wilbanks, Science Commons

More detailed biographies are available on the Advisory Council page.

It is our intention that the overall development of the material on the site will continue in the same community based and collaborative manner. The Council’s role will be to provide oversight, guidance and input into this process, not to replace it.

This is fantastic news for the definitions projects!

Tomorrow I’ll be speaking with Nate Olson at the latest Oxford Geek Night on the subject of Open Knowledge and Componentization. Here’s the blurb:

Componentization on a large scale (such as in the Debian ‘apt’ packaging system) has allowed large software projects to be amazingly productive through their use of a decentralised, collaborative, incremental development process. Componentization works so well because it allows us to ‘divide and conquer’ the organizational and conceptual problems of highly complex systems. Given this, what are the possibilities (and problems) of this approach for knowledge generally? How do we best design “knowledge APIs”, discover and distribute existing resources, and recombine decentralised datasets? In this talk we’ll discuss the answers to (some of) these questions focusing particularly on the role the Comprehensive Knowledge Archive Network can play.

So, if you’re in the Oxford vicinity and interested in Open Knowledge and related matters (there’s a good line-up of other speakers including Denise Wilton of moo.com) why not drop in to the Jericho Tavern around 7.30pm tomorrow evening.

CKAN 0.5 Released

February 1st, 2008

The Comprehensive Knowledge Archive Network (CKAN) version 0.5 has just been released.

Changes include:

  • feature to list and search tags
  • feature to make data available in machine-usable form via sql dump
  • feature to purge a revision and associated changes
  • support for reserved html characters in urls
  • upgrade to Pylons 0.9.6
  • new spam management utilities including (partial) blacklist support

The CKAN code is available from:

The data is available from:

We’ve currently got 135 packages. If you come across a large dataset or substantial collection, please consider registering it on CKAN!