You are browsing the archive for Sam Leon.

“Yes We Scan”

December 23, 2011 in WG Open Government Data

Take a look at the campaign being run by Carl Malamud and John Podesta called “Yes We Scan”. It’s an effort to encourage the US government to make plans to digitize the contents of all national libraries including the Library of Congress. In a letter addressed to President Barack Obama, John Podesta and Carl Malamud point to the economic, scientific and social benefits that would arise from a large scale digitization of America’s cultural riches currently held in the vaults of various national institutions.

With reference to Thomas Jefferson’s decision to donate his library to the government of the United States, the letter calls on the present administration to use the power of the internet to realise Jefferson’s vision of government as an agency that encourages the wide diffusion of knowledge for the benefit of society:

We ask your help to achieve this 21st century dream, making the vast resources of our federal government available to all on the global Internet, making access to knowledge a right for all Americans and a defining contribution for our future.

The letter directs you to a corresponding e-petition that can be viewed and signed here.

Opening up Domesday Book

December 19, 2011 in Public Domain

The following guest post is by Anna Powell-Smith from the Open Domesday project. Anna is a member of our brand new Working Group on Open Humanities.

Domesday Book might be one of the most famous government datasets ever created. Which makes it all the stranger that it’s not freely available online – at the National Archives, you have to pay £2 per page to download copies of the text.

Domesday is pretty much unique. It records the ownership of almost every acre of land in England in 1066 and 1086 – a feat not repeated in modern times. It records almost every household. It records the industrial resources of an entire nation, from castles to mills to oxen.

As an event, held in the traumatic aftermath of the Norman conquest, the Domesday inquest scarred itself deeply into the mindset of the nation – and one historian wrote that on his deathbed, William the Conqueror regretted the violence required to complete it. As a historical dataset, it is invaluable and fascinating.

In my spare time, I’ve been working on making Domesday Book available online at Open Domesday. In this, I’ve been greatly aided by the distinguished Domesday scholar Professor John Palmer, and his geocoded dataset of settlements and people in Domesday, created with AHRC funding in the 1990s.

I’m very happy to announce that we’re now able to make full-page, high-resolution images of the Domesday folios available under CC-BY-SA. You can browse or download the folios en masse at the Internet Archive (recommended), or page-by-page at Open Domesday.

I’ve also been working on a RESTful API to Domesday Book, to accompany the release of the folios. Our API supports geographic queries, and returns the economic and social statistics for each settlement in Domesday – from the number of households to (where listed) the number of pigs, sheep and oxen.

As an example, here is the folio entry for Marsh Gibbon in Buckinghamshire, still a thriving village today:

Marsh Gibbon in Domesday Book

Domesday is not often descriptive, but this entry gives us a glimpse of the state of the English population. The entry tells us that Marsh Gibbon held 11 households, including 3 slaves, and it has woodland on which the locals paid a tax of 30 pigs. The owner in 1066 was one Aelric, and he still lives there “harshly and wretchedly”.

There are entries like this for nearly 15,000 places. We hope that this data release will lead to some interesting new applications (may a thousand iPad apps bloom) and research – like this population heatmap of Domesday England, created from Professor Palmer’s raw data by Andrew Bevan at University College London:

Population heatmap of Domesday England

To end on a downbeat note, it’s worrying that among historic texts, Domesday may become the exception, rather than the rule, by being available under an open licence. The only reason we are able to make the folios available at all is that Professor Palmer took his own images of the Ordnance Survey’s photozincographic copies some years ago, and has kindly agreed to release them for the benefit of others.

In particular, although the British Library has teamed up with Google to make thousands of historic texts available online, it seems the digitised copies will not be truly open, as Glyn Moody has warned.

But enough grumbling – I hope you enjoy the Domesday data. If nothing else, it’s something to show elderly relatives over Christmas! Please contact me with comments.

 

Open Data Means Better Science

December 9, 2011 in Open Science

The following post is by Jenny Molloy, coordinator of the Open Science Working Group at the Open Knowledge Foundation.

We are very pleased to announce the publication of an article detailing the working group’s aims and achievements in PLoS Biology’s Community Pages.

The Open Knowledge Foundation: Open Data Means Better Science‘ has already had over 1800 article views and offers a fantastic opportunity to engage the biological community in the work we do and raise awareness of the importance of open data in science.

Published in the same edition was a Perspectives piece tackling an issue that the working group has taken a great interest in – the use of non-commercial clauses in licenses for open access articles. In ‘Why Full Open Access Matters‘, Professor Michael Carroll, a Creative Commons bord member and Director of the Program on Information Justice and Intellectual Property at American University states “We are living through a moment of fundamental opportunity [for text and data mining, innovative reuse of article material]. Let’s be clear. Only those publishers willing to fully seize this opportunity deserve to call their publications “open access.””.

A lively discussion has been taking place on the open-science and okfn-discuss mailing lists and a plan has been compiled to survey the policies of funders and journals as well as to generate a resource pack on the use of non-commercial clauses and the downstream effects of applying such licenses. Please join the mailing list and the conversation if this is an issue you feel strongly about.

LAPSI Design Award Competition

December 5, 2011 in Uncategorized

The following post is by Claudio Artusio who works for LAPSI, the European Network on Legal Aspects of Public Sector Information.

There is still 3 weeks left to apply to the 3rd LAPSI Award on the most user-friendly design of a PSI portal in the EU (http://www.lapsi-project.eu/news/award3).

PSI (acronym for Public Sector Information) can be defined as the wide range of information that public sector bodies collect, produce, reproduce and disseminate in many areas of activity while accomplishing their institutional tasks. PSI may include (among others) social, economic, geographical, cadastral, weather, tourist, and business information.

The technological progress we experience every day in the modern digital age has drastically modified the procedures and broadened the opportunities for any citizen to reach and access information. In such a context, making information generated and collected by public sector entities available and re-usable is crucial; not only it provides citizens with a reliable knowledge regarding Government and public sector bodies activities, it also represents the initial material for public or private undertakings to come up with new added-value services and supply them to citizens.

Since PSI availability is crucial for fostering re-use initiatives, the very purpose of the Award is to support any initiative which can be beneficial to PSI re-use policies for moving forward.

The Award is open to public sector bodies, businesses and citizens who designed or manage a PSI portal in the EU.

A panel of experts will evaluate the submitted projects with regard to the user-friendly design of the portal; the originality and the layout appeal of the portal; the efficacy of the portal in facilitating the access of the PSI; the efficacy of the portal in fostering the awareness on legal aspects of PSI (such as competition, data protection and privacy, intellectual property rights).

The Award has received the support of Infocamere (http://www.infocamere.it/) and as a result, the most user-friendly design of a PSI portal will be rewarded with a prize of 1,000 Euros.

Application must be submitted in English by 23rd of December 2011 (16:00 hrs CET).

The winner of the 3rd LAPSI Award will be announced during the second day of the Public Conference that will take place in Bruxelles, on 23rd and 24th January 2012.

The Call for applications is available at: http://www.lapsi-project.eu/call2

The Submission form is available at: http://www.lapsi-project.eu/form2

Data Debate: Is transparency bad for science?

November 28, 2011 in Open Science

The following post is by Eve Jackson who works for the Index on Censorship.

Is the push for openness helping or hindering science? Index on Censorship will be debating the question on Tuesday 6 December at 6.30pm at Imperial College London, with Sir Mark Walport (Director, Wellcome Trust), George Monbiot (columnist, the Guardian), Professor David Colquhoun (UCL) and Baroness Onora O’Neill (philosopher), chaired by Jo Glanville.

Increasingly it’s argued that scientific data should be made freely available to the public, that the science publishing model should be overhauled to enable free ‘open access’ to articles currently locked behind pay walls. It could also imply that researchers should offer up their data when it’s asked for, for example in Freedom of Information requests. Openness in science is on the government’s agenda too. Chancellor George Osborne is due to announce various open data initiatives tomorrow (29/11), including a new Open Data Institute.

It’s difficult to argue against the principles of openness and transparency. Both are usually seen as good in their own right, regardless of their application.

While advocating transparency and openness, however, it is critical to keep an eye on their effects. Index on Censorship has investigated the question of transparency in science in ‘Dark matter: what’s science got to hide?’, the latest issue of our magazine. The event is sponsored by SAGE and will launch the science issue. Tickets are free. You can register here or email eve[at]indexoncensorship[dot]org for more information.  

 

Work in progress: The Data Digitizer

November 17, 2011 in WG Open Data in Science

The following post is by Sam Leon, who’s just joined the OKF as a coommunity coordinator! Read more about Sam here.

Back in July of this year a crowd of coders, scientists and new media artists gathered in Berlin for the Open Science Workshop at OKCon. One of the projects to come out of this gathering was the Data Digitizer, a tool for transcribing documents and tables that are not currently machine-readable. Suggested applications for this tool ranged from the transcription of Brazilian census data to input of tables from economics articles to allow comparisons across multiple articles that examine the same variables.

The project is still ongoing with the code up on github. You can also find an Etherpad that details what was proposed and achieved in the first session of work on the Data Digitizer. Check out how far it’s got with a little demo that is up and running here.

Work in progress: Public Domain Calculators

November 15, 2011 in Public Domain

The following post is from Primavera De Filippi, representative of Creative Commons France and coordinator of the Open Knowledge Foundation’s Public Domain Working Group.

Many people recognise the value of works which are in the public domain (e.g. the works of Shakespeare, Italian renaissance paintings, classical music, etc). However, it is often difficult for people to determine whether a work has fallen into the public domain in a particular jurisdiction. In order to address this problem, the Open Knowledge Foundation has undertaken the development of the public domain calculators, a series of software applications specifically designed to identify the legal status of a work in any given jurisdiction, by gathering relevant data concerning the work (e.g. date of publication, birth/death date of the author, etc) and processing it through a set of rules that reflect the national implementations of the copyright regime.

A new version of the public domain calculators has recently been released and is available at github.
The public domain calculators are now linked with the national copyright flowcharts of developed as part of the Europeana project. The back-end has been rewritten to allow for national calculators to be written as simple RDF files that subsist independently of the underlying code.

  • Graphm2rdf.rb is a converter that takes a flowchart from the Europeana project (in Graphmlz format) to produce a new flowchart (in RDF format) that describes the national copyright regime according to a specifically designed ontology Flow 0.1 (available at http://bedlam.dk/flow/0.1).
  • Metadata[n].rdf are the metadata files taken from a database describing a particular work in RDF format.
  • Map.rdf provides a link between the questions from Flowchart to the answers in the metadata files by means of specific Sparql queries.
  • Reasoner.py is the actual public domain calculator. It takes those 3 files as an input: Flowchart.rdf (national flowchart), map.rdf (national calculator), and metadata.rdf (work metadata), process the whole thing and declares whether or not the work described in metadata.rdf is in the public domain according to the national copyright regime.

The advantage of this approach is that it makes it easier to create new calculators as new flowcharts gets created or updated. Since the calculators are simple RDF files,  no programming skills are necessary to implement them, except for some decent knowledge of Sparql.

The code is very recent and is still under development. Anyone interested please don’t hesitate to check it out!

Tuesday OKF Hangout

November 10, 2011 in Events

The Open Knowledge Foundation Hangout will be running this coming Tuesday (and every Tuesday after that) from 5:00pm-7:00pm (UK time). The community team (Jonathan Gray, Lucy Chambers, Kat Braybrooke and Sam Leon) will be on IRC in order to help you with any questions or suggestions you have about OKF projects, or advise you on ways of getting more involved with the organisation.

It’s also a great opportunity to meet other members of our ever growing community. Each week we’ll try to focus on a particular community issue, such as organising local meet ups or developing ideas for new OKF projects. At the next hangout we’ll be brainstorming the topics to focus on at future sessions.

We’ll be in the #OKFN room on irc.freenode.net (nicknames: jwyg, lucychambers, keyboardkat and samleon) on Tuesday from 5:00-7:00pm (UK time).

Please drop in if you have some time!