open visualisation workshop

The first Open Visualisation Workshop took place on Saturday as we mentioned last week.

Details, notes and links are available on the event’s wiki page.

The event took place at Trampoline Systems’ new site in East London. To make sure the event was as informal as it was billed to be - we left the schedule open until the day, so we could see what people were interested in doing and plan the workshop accordingly!

After introductions and some brainstorming, we had impromptu talks and demos from:

  • Martin Dittus, last.fm
  • Julie Tolmie, Centre for Computing in the Humanities, King’s College London
  • Jan Berkel, Trampoline Systems
  • Jonathan Lister, Osmosoft
  • Gregory Jordan, European Bioinformatics Institute
  • David Aanensen, Division of Epidemiology, Public Health and Primary Care, Imperial College London
  • Jonathan Gray, The Open Knowledge Foundation (me)

Most of the day was occupied by demos and discussions - so we didn’t get around to doing much tinkering with software packages. However, participants said they found it very useful to see people’s work in other fields - and were keen to continue to meet regularly. It was interesting to see how much commonality there existed between visualisation work in very different fields.

Suggestions for future activities included:

  • continuing to build on the list of open source visualisation packages (on the wiki)- possibly including notes, comments and example visualisations from people who have experience using these;
  • domain specific sessions (e.g. visualisation for bioinformatics);
  • shared project to work on, using open source visualisation software to represent an open knowledge package - e.g. using Prefuse to represent data from omdb
  • using different visualisation software to represent the same open dataset - and comparing the results;
  • making very brief screencasts of different visualisation projects with voiceovers from their developers;
  • promoting the open-visualisation mailing list to researchers, developers and practitioners - as participants weren’t aware of any other general mailing list for open-source visualisation technologies;
  • developing a wish-list of features that participants would ideally like to see in open source visualisation software.

It was suggested we have another workshop in June to keep the ball rolling. Nearly everyone there was keen - so we’ve created a doodle page to fix the date.

If you’d like to participate, please:

  1. add your name to the Open Visualisation Workshop wiki page;
  2. select which dates you are free on the doodle page;
  3. sign up to the open-visualisation mailing list.

As we mentioned last month we’ve been organising an informal, hands-on workshop focusing on open source visualisation technologies. This will take place on this coming Saturday in London. Details are as follows:

Trampoline Systems have kindly offered to host the first workshop. They use a mixture of open source and bespoke software packages to produce award winning social computing software for large enterprises - including Channel 4, Raytheon and the UK Foreign Office. It should be interesting to see the interactive visualisations they’ve produced!

The event should be a good opportunity for people new to this area to learn a bit more about the range of open source visualisation software packages that are out there - and for open source visualisation veterans to showcase their work and to exchange their experiences.

Dispatches from Digistan

May 14th, 2008

Chris Puttick of OpenArchaeology sends news of the Digital Standards Organisation:

A new group is being formed to promote open digital standards, starting with a declaration regarding the importance of digital standards being truly open.

Part of Digistan’s effort to promote understanding, development, and adoption of open digital standards implies a clear definition of what “open” implies in standards terms. Accompanied by a list of conformant open standards, this has the potential to be used as an equivalent of the opendefinition.org Open Knowledge Definition or the freedomdefined.org Free Cultural Works definition

However the current approach looks different and consists of “metrics” to assess relative “openness”. It’s early days and not immediately clear how this will work - can standards score negative points for unclear status on patent grants, or RAND terms? Surely positive criteria for openness in a metric would, taken as a whole, constitute an open definition? The creators hope that this approach by transcending debate about a single definition of open standard, the project will promote informed discussion about the value of standards in a way that encourages users to participate.

It’s also not clear to what extent Digistan’s interest will be focused on open formats for data and digital media, and how far that will reach out to “standards” in general - which might help simplify the debate over “one definition”. As open standards are the cornerstone of a viable free software approach to open data, an effort to produce a clear open definition that different interest groups can agree on and rally around would be welcome.

Among the founders of Digistan are some FFII representatives and, interestingly, Andrew Updegrove, the standards consortium lawyer and blogger whose writings were a deep mine of useful information about the OOXML controversy. Collectively they are asking people to sign up to their Hague Declaration in support of the following (less the preamble):

We call on all governments to:
  1. Procure only information technology that implements free and open standards;
  2. Deliver e-government services based exclusively on free and open standards;
  3. Use only free and open digital standards in their own activities.

Over the past week or so there has been a flurry of posts about ’strong’ and ‘weak’ open access, including the following:

Peter Suber and Stevan Harnad both agree:

The term “open access” is now widely used in at least two senses. For some, “OA” literature is digital, online, and free of charge. It removes price barriers but not permission barriers. For others, “OA” literature is digital, online, free of charge, and free of unnecessary copyright and licensing restrictions. It removes both price barriers and permission barriers. It allows reuse rights which exceed fair use.

There are two good reasons why our central term became ambiguous. Most of our success stories deliver OA in the first sense, while the major public statements from Budapest, Bethesda, and Berlin (together, the BBB definition of OA) describe OA in the second sense.

As you know, Stevan Harnad and I have differed about which sense of the term to prefer –he favoring the first and I the second. What you may not know is that he and I agree on nearly all questions of substance and strategy, and that these differences were mostly about the label. While it may seem that we were at an impasse about the label, we have in fact agreed on a solution which may please everyone. At least it pleases us.

We have agreed to use the term “weak OA” for the removal of price barriers alone and “strong OA” for the removal of both price and permission barriers. To me, the new terms are a distinct improvement upon the previous state of ambiguity because they label one of those species weak and the other strong. To Stevan, the new terms are an improvement because they make clear that weak OA is still a kind of OA.

On this new terminology, the BBB definition describes one kind of strong OA. A typical funder or university mandate provides weak OA. Many OA journals provide strong OA, but many others provide weak OA.

Furthermore, Peter Suber adds:

As soon as we move beyond the removal of price barriers to the removal of permission barriers, we enter the range of strong OA. Hence, an article with a CC-NC license is strong OA because it allows some copying and redistribution beyond fair use (even if it doesn’t allow all copying and redistribution). My own preference is still for the CC-BY license, but we shouldn’t speak as if CC-NC were not strong OA or as if there were just one kind of strong OA.

According to this schema, a cost free publication counts as weak open access, and a publication licensed under a CC-NC license counts as strong open access. Stevan Harnad agrees with the distinction but suggests the need for ‘value-neutral’ terms to describe it - suggesting ‘basic’ and ‘full’.

Its worth adding to this discussion that there is also Open Definition compliant open access, which I understand is equivalent to BBB open access and which is more permissive than ’strong’ or ‘full’ open access. As we blogged a couple of weeks back - anything with the SPARC Europe Seal will be open access in this sense.

As Peter Murray-Rust comments:

Open Source has the OSI which determines whether ot not a given licence is OS. Open Knowledge after only a short time of volunteers has the OKF and has an agreed definition and a list of conformant licences.

Scholarly publications, as literary works, constitute knowledge and hence are covered by the OKD. A journal, monograph or any other publication can still be ‘open as in the OKD’ as with other forms of knowledge. Debates about open access aside, demarcating between knowledge that is ‘open’ and ‘closed’ is precisely what the OKD is there for!

It will be interesting to see what emerges as the new classificatory scheme for open access, and where OKD compliant publications sit on the spectrum. Perhaps these will be called ‘OKD/BBB compliant open access’ journals, or suchlike.

The first Open Knowledge London meetup will take place this Wednesday at the London Knowledge Lab. The meetup should be great opportunity for informal discussion of open knowledge projects and issues. If you’d like to participate or present, please add details to the wiki page!

SPARC Europe (Scholarly Publishing and Academic Resources Coalition) and the Directory of Open Access Journals (DOAJ) have just announced a new SPARC Europe Seal for Open Access Journals.

In order for journals to be approved, they must use a Creative Commons Attribution license - which is compliant with the Open Knowledge Definition. It is great to see growing support for making scholarly publications fully open!

The announcement - which includes comments from OKF advisory board members John Wilbanks and Peter Suber - is reproduced below.

Growing numbers of peer-reviewed research journals are opening-up their content online, removing access barriers and allowing all interested readers the opportunity of reading the papers online, with over 3300 such journals listed in the DOAJ, hosted by Lund University Libraries in Sweden.

However, the maximum benefit from this wonderful resource is not being realised as confusion surrounds the use and reuse of material published in such journals. Increasingly, researchers wish to mine large segments of the literature to discover new, unimagined connections and relationships. Librarians wish to host material locally for preservation purposes. Greater clarity will bring benefits to authors, users, and journals.

In order for open access journals to be even more useful and thus receive more exposure and provide more value to the research community it is very important that open access journals offer standardized, easily retrievable information about what kinds of reuse are allowed. Therefore, we are advising that all journals provide clear and unambiguous statements regarding the copyright statement of the papers they publish. To qualify for the SPARC Europe Seal a journal must use the Creative Commons By (CC-BY) license which is the most user-friendly license and corresponds to the ethos of the Budapest Open Access Initiative.

The second strand of the Seal is that journals should provide metadata for all their articles to the DOAJ, who will then make the metadata OAI-compliant. This will increase the visibility of the papers and allow OAI-harvesters to include details of the journal articles in their services.

‘We want to build on the great work already done by the publishers of many open access journals and improve the standards of open access titles,’ said David Prosser, Director of SPARC Europe. ‘Working with the DOAJ means that we can provide help and guidance to journals who wish to move beyond the first step of free access to full open access and our long-term aim is to ensure that all journals listed in the DOAJ can attain the standards expressed within the Seal’

‘Improving the standards of the rapidly increasing numbers of open access and contributing to the widest possible visibility, dissemination and readership of the journals is very much in line with our mission,’ said Lars Björnshauge, Director of Libraries at Lund University. ‘We are very happy to see the enormous usage of the DOAJ and the support from our membership’

‘Legal certainty is essential to the emergence of an internet that supports research. The proliferation of license terms forces researchers to act like lawyers, and slows innovative educational and scientific uses of the scholarly canon’ said John Wilbanks, Executive Director of Science Commons. ‘Using a seal to reward the journals who choose to adopt policies that ensure users’ rights to innovate is a great idea. It builds on a culture of trust rather than a culture of control, and it will make it easy to find the open access journals with the best policies.’

‘This is an excellent program with two important recommendations. CC-BY licenses make OA journals more useful, and interoperable metadata make them more discoverable. The recommendations are easy to adopt and will accelerate research, facilitate preservation, and make OA journal policies more open and more predictable for users. I hope all OA journals will adopt them –not to get the Seal from SPARC Europe and the DOAJ, but for the same reasons that moved these organizations to launch the program: to make OA journals more visible and useful than they already are,` said Peter Suber, Open Access Advocate & Author of Open Access News.

Dr. Paolo D’Iorio recently invited me to attend the first meeting of an EU funded Working Group “devoted to analyzing the current debate on the legal, economic and social conditions for setting-up open scholarly communities on the web”. The meeting was part of COST:

COST – European Cooperation in the field of Scientific and Technical Research – is one of the longest-running European instruments supporting cooperation among scientists and researchers across Europe. COST is also the first and widest European intergovernmental network for coordination of nationally funded research activities.

Action 32, of which Dr. D’Iorio is Chair, is called “Open Scholarly Communities on the Web” and has two aims:

  • to create a digital infrastructure for collaborative humanities research on the Web; and
  • to establish and foster the growth of Scholarly Communities that will provide feedback to the IT developers regarding the needs and expectations of humanities researchers and will serve as a core group of early adopters.

Talks included:

  • Paolo D’Iorio (CNRS-ITEM, Paris), How to build a Scholarly Community on the Web
  • Maria Chiara Pievatolo (University of Pisa), Copyright in Europe. History and perspectives
  • Thomas Margoni (University of Trento), How to access primary sources in Europe. The legal framework
  • Annaïg Mahé (URFIST, Paris), The market for SSH Journals in Europe
  • Jennie Grimshaw (British Library), Negotiating spaghetti junction: legal constraints on archiving government e-documents in the UK
  • Christine Madsen (OII, Oxford), The significance of “marketing” digital collections: the case of Harvard
  • Yann Moulier Boutang (Professeur de sciences Economiques - Université de Technologie de Compiègne, Directeur adjoint de Laboratoire de l’Unité de Recherche EA 22 23), Economic model(s) of Scholarly Communities: Open Source or Creative commons?
  • Francesca Di Donato (University of Pisa), The evaluation of science. From peer review to open peer review
  • Eric Meyer and Ralph Schroeder (OII, Oxford), Open Access and Online Visibility in the Age of e-Research

Notes and comments

  • For many humanities subjects, having something like the public domain calculators would help to facilitate the growth of open resources for scholarly communities built on works in which the copyright has expired.
  • Paolo’s presentation of Nietzsche Source and the Discovery project gave a compelling vision of how communities might grow around a resource for corpus based scholarship - with users having their own virtual workspace with annotations and notes that could be shared with other users. The ‘Scholarsource’ system would have stable URLs to support accurate citation, and robust ontologies to facilitate exploration of the material. Licensing that permits re-distribution is also a good preservation strategy.
  • The term ‘open’ was often not used in the sense of the Open Knowledge Definition. Several projects used licenses with non-commercial restrictions. While some participants assumed that scholars and institutions would often prefer that their work was not exploited commercially - it would be great if public domain sources such as documents, images and records, could be published under an open license. An approach which recommended open licensing for material that had not been enhanced (scans, text files …) could help to stimulate the growth of a commons that would encourage greater experimentation and collaboration than one which restricted certain kinds of re-use (cf. 7. and 8. in the OKD).
  • The importance of a close working relationship between scholarly communities and technologists. It is crucial that technical development is informed by the needs and working practices of researchers. This is something we’ve been thinking about in relation to Open Shakespeare and Open Milton. Open licensing allows developers to experiment with scholarly material to develop new tools and applications that could be of unanticipated value (e.g. semantic approaches, text analysis or visualisation).
  • Legal, technological and social obstacles to building open scholarly communities. We have various legal mechanisms and emerging technologies to facilitate such communities. Sometime the most hard parts are social - in growing user base, increasing participation and so on. Value and limits of ‘build it and they will come’ approach.

We are currently in the process of organising an informal, hands-on workshop for those who work with, or are interested in, open-source visualisation technologies:

The event will take place somewhere in central London on a weekend in May. If you are interested in participating, please add your name to the wiki page and specify which dates you are free on the event’s doodle page.

We hope it will be a good opportunity to learn a bit more about visualisation software packages, to exchange ideas, and possibly to start to work on some new projects! If all goes well, we’ll arrange to meet up on a (semi) regular basis!

Make Textbooks Affordable, a campaign composed of Student Associations and Public Interest Research Groups from across the US, yesterday released a statement in support of open textbooks signed by 1000 academics. From the press release:

Open textbooks are complete, reviewed textbooks written by academics that can be used online at no cost and printed for a small cost. What sets them apart from conventional textbooks is their open license, which allows instructors and students flexibility to use, customize and print the textbook. Open textbooks are already used at some of the nation’s most prestigious institutions - including Harvard, Caltech and Yale - and the nation’s largest institutions - including the California community colleges and the Arizona State University system.

“Open textbooks are comparable, affordable and flexible alternatives to traditional expensive textbooks,” said Professor Linda Bisson, Chair of the Enology and Viticulture Department at the University of California, Davis. “Not only do they save students money, but they provide instructors with a high-quality textbook that they can customize to meet their needs.”

Textbooks cost students an average of $900 per year, which is a quarter of tuition at an average four-year public university and nearly three-quarters of tuition at a community college, according to a study conducted by the Government Accountability Office (GAO).

“Textbooks can price students out of higher education. With costs rising faster than inflation and tuition, some students are faced with the difficult choice to drop out, take on additional debt, or undercut their own learning by not purchasing textbooks,” said Nicole Allen, Textbooks Advocate for The Student PIRGs.

Research conducted by The Student PIRGs identifies publisher tactics as the primary cause of escalating prices. Bundling textbooks with unnecessary supplements forces students to purchase items they do not need; unnecessary new editions undermine the used book market; and withholding critical price information keeps faculty in the dark.

“As faculty members, our top priority is to choose the textbook that is best for our students. We share concerns about affordability, and face similar frustrations with publisher practices,” said Sandra Schroeder, Chair of the American Federation of Teachers Higher Education Program and Policy Council. “Open textbooks and other affordable options, when appropriate for a course, are a win-win for everyone.”

On the What are Open Textbooks? page, they mention our Open Text Book project, and the Open Knowledge Definition - which is great to see! Its good that they emphasise the importance of licensing that permits people to “reproduce, customize, or distribute” as well as access.

However while they allude to Creative Commons licenses - they don’t explicitly distinguish between those licenses which are open (Creative Commons Attribution and Attribution Sharealike), and those which are not (Creative Commons licenses with No Derivatives or Non-commercial options).

While the latter do afford people more choice about what can be done with their work - there are problems with interoperability, and do not serve well as the basis of an ecosystem of textbooks and textbook content that may be built upon, modified and redistributed without restriction. For example, publishers may not have the incentive to add value to existing content if they would be unable to re-distribute this in a commercial context.

Nevertheless its fantastic to see growing support for open textbooks!

Open Data Going Mainstream?

April 10th, 2008

Bret Taylor’s recent post entitled “We Need a Wikipedia for Data” has been garnering a lot of attention around the blogosphere. While his suggestions are not particularly novel, the post and the attention it has garnered, is, I think, indicative of the growing interests in the issues of (open) data and its importance for the development of related services and products.

While generally in agreement with Bret’s arguments, there are a few differences that are worth raising. First Bret appears to favour some kind of centralized repository that everyone can read from and write to:

To this end, I think we should create a Wikipedia for data: a global database for all of these important data sources to which we all contribute and that anyone can use.

As readers of this blog will know, we’re sceptical of this ‘one ring to rule them all’ approach. In this regard, it is also important to distinguish finding material, parsing it, and plugging it together, issues that got rather run together in the surrounding discussion. As I wrote in a comment to Bret’s post:

There seem to be several distinct issues you (and your commenters) are concerned with:

1. Discoverability of datasets. For this you want a registry of some kind and this is exactly what the Comprehensive Knowledge Archive Network (CKAN) is designed to do. …

2. ‘Developing’ data particularly using many contributors and a versioning (wiki-like) model. This seems a general problem and one which I wrote about in this post on the collaborative development of data back in February last year. Since then various projects have launched or developed which attempt to address this issue, even if only partially (e.g. Freebase, Swivel, Numbrary, http://www.openeconomics.net …). This then leads into:

3. Componentizing data so that one can easily plug different datasets together rather than having to aggregate data together in one big place (crudely: ‘One Ring to Rule them All’ vs. ‘Small Pieces, Loosely Joined’). After all it seems unlikely that any one organization, however large, can hold ‘all the data’, and in ay case doing so would negate the benefits of having ‘many minds’ working on a problem. It is our hope that CKAN would start to facilitate the kind of packaging that one frequently observes in software but is, as yet, fairly rare for knowledge (data/content/…). More on this can be found in this blog post on componentization plus the slides from our presentation at XTech.

To conclude, I definitely agree about the importance of having more open data and making it easier to find and use though I’m hoping that it will take a more decentralized and componentized form than simply a ‘wikipedia’ for data. More important though than any details is the fact that this kind of interest from a wider audience indicates that issues of data openness and production are going mainstream — something we as a community should strongly welcome.