You are browsing the archive for 2011.

Season’s Greetings from the Open Knowledge Foundation

December 24, 2011 in Ideas and musings

‘Tis the season to be jolly. This year when preparing your Christmas feast why not take some inspiration from Mrs Beeton and her legendary 1861 Book of Household Management

 

Words of wisdom from Mrs Beeton…

> “In December, the principal household duty lies in preparing for the creature comforts of those near and dear to us, so as to meet Old Christ-mas with a happy face, a contented mind, and a full larder. And in stoning plums, washing currants, cutting peel, beating eggs, and mixing a pudding, a housewife is not unworthily greeting the season of good will. “

Poultry

> “The cost of poultry varies considerably, being affected both by theseason of the year and the district in which it is purchased. It is well to remember that poultry almost invariably rises in price at Christmas, and also tends to be expensive when no game is on the market. These considerations borne in mind, the table below will give a reliable average of prices.”

> “Fattening Turkeys for the Table. Turkeys grow very slowly ; there-fore, the earlier they are hatched the better when it is necessary that they should attain their full growth by Christmas.”

Boar’s Head

> “In ancient times the boar’s head formed the most important dish, and on Christmas Day was invariably the first placed upon the table, its entrance into the hall being preceded by a body of servitors, a flourish of trumpets, and other marks of distinction. The dish itself was borne by the individual next in rank to the lord of the feast. The custom of serving a boar’s head on a silver platter on Christmas Day is still observed at some colleges and Inns of Court. So highly was the grizzly boar’s head regarded in the Middle Ages that it passed into the cognizance of some of the noblest families in the realm ; thus it was not only the crest of the Nevilles and Warwicks with their collateral houses, but it was the cognizance of Richard III …”

Christmas Pudding

And if none of that takes your fancy – shake it up a little…

All images and text are from Mrs Beeton’s Household Management which is in the public domain and the full text of which is available online at the Internet Archive.

Finally, a big thank you to everyone who has been involved in and supported the Open Knowledge Foundation this year. It’s been a great year in the open data space, so Merry Christmas and a happy New Year. See you back here in 2012…

 

“Yes We Scan”

December 23, 2011 in WG Open Government Data

Take a look at the campaign being run by Carl Malamud and John Podesta called “Yes We Scan”. It’s an effort to encourage the US government to make plans to digitize the contents of all national libraries including the Library of Congress. In a letter addressed to President Barack Obama, John Podesta and Carl Malamud point to the economic, scientific and social benefits that would arise from a large scale digitization of America’s cultural riches currently held in the vaults of various national institutions.

With reference to Thomas Jefferson’s decision to donate his library to the government of the United States, the letter calls on the present administration to use the power of the internet to realise Jefferson’s vision of government as an agency that encourages the wide diffusion of knowledge for the benefit of society:

We ask your help to achieve this 21st century dream, making the vast resources of our federal government available to all on the global Internet, making access to knowledge a right for all Americans and a defining contribution for our future.

The letter directs you to a corresponding e-petition that can be viewed and signed here.

We’re hiring!

December 21, 2011 in CKAN, Join us, OKF

As we head into 2012, there’s lots going on at the OKFN and we’re looking for some more people to come help us build and scale the open data ecosystem.

In particular, we’re looking for a great project manager to deliver a portfolio of CKAN-related projects, and also an awesome front end web developer who will contribute to a range of OKFN projects.

Come join our team! Find out more at http://okfn.org/jobs/

Ideas for OpenPhilosophy.org

December 20, 2011 in Bibliographic, Free Culture, Ideas and musings, Open Content, Open Data, Public Domain, WG Cultural Heritage, WG Humanities, WG Public Domain, Working Groups

The following post is from Jonathan Gray, Community Coordinator at the Open Knowledge Foundation. It is cross-posted from jonathangray.org.

For several years I’ve been meaning to start OpenPhilosophy.org, which would be a collection of open resources related to philosophy for use in teaching and research. There would be a focus on the history of philosophy, particularly on primary texts that have entered the public domain, and on structured data about philosophical texts.

The project could include:

  • A collection of public domain philosophical texts, in their original languages. This would include so called ‘minor’ figures as well as well known thinkers. The project would bring together texts from multiple online sources – from projects like Europeana, the Internet Archive, Project Gutenberg or Wikimedia Commons, to smaller online collections from libraries, archives, academic departments or individual scholars. Every edition would be rights cleared to check that it could be freely redistributed, and would be made available either under an open license, with a rights waiver or a public domain dedication.
  • Translations of public domain philosophical texts, including historical translations which have entered the public domain, and more recent translations which have been released under an open license.
  • Ability to lay out original texts and translations side by side – including the ability to create new translations, and to line up corresponding sections of the text.
  • Ability to annotate texts, including private annotations, annotations shared with specific users or groups of users, and public annotations. This could be done using the Annotator tool.
  • Ability to add and edit texts, e.g. by uploading or by importing via a URL for a text file (such as a URL from Project Gutenberg). Also ability to edit texts and track changes.
  • Ability to be notified of new texts that might be of interest to you – e.g. by subscribing to certain philosophers.
  • Stable URLs to cite texts and or sections of texts – including guidance on how to do this (e.g. automatically generating citation text to copy and paste in a variety of common formats).

The project could also include a basic interface for exploring and editing structured data on philosophers and philosophical works:

  • Structured bibliographic data on public domain philosophical works – including title, year, publisher, publisher location, and so on. Ability to make lists of different works for different purposes, and to export bibliographic data in a variety of formats (building on existing work in this area – such as Bibliographica and related projects).
  • Structured data on secondary texts, such as articles, monographs, etc. This would enable users to browse secondary works about a given text. One could conceivably show which works discuss or allude to a given section of a primary text.
  • Structured data on the biographies of philosophers – including birth and death dates and other notable biographical and historical events. This could be combined with bibliographic data to give a basic sense of historical context to the texts.

Other things might include:

  • User profiles – to enable people to display their affiliation and interests, and to be able to get in touch with other users who are interested in similar topics.
  • Audio version of philosophical texts – such as from Librivox.
  • Links to open access journal articles.
  • Images and other media related to philosophy.
  • Links to Wikipedia articles and other introductory material.
  • Educational resources and other material that could be useful in a teaching/learning context – e.g. lecture notes, slide decks or recordings of lectures.

While there are lots of (more or less ambitious!) ideas above, the key thing would be to develop the project in conjunction with end users in philosophy departments, including undergraduate students and researchers. Having something simple that could be easily used and adopted by people who are teaching, studying or researching philosophy or other humanities disciplines would be more important that something cutting edge and experimental but less usable. Hence it would be really important to have a good, intuitive user interface and lots of ongoing feedback from users.

What do you think? Interested in helping out? Know of existing work that we could build on (e.g. bits of code or collections of texts)? Please do leave a comment below, join discussion on the open-humanities mailing list or send me an email!

LODLAM-NZ Round Up

December 20, 2011 in External, WG Cultural Heritage, WG Humanities, WG Open Bibliographic Data

The following guest post is by Jon Voss, whose projects include History Pin and Civil War Data 150.

I recently traveled to Wellington, New Zealand to take part in the National Digital Forum of New Zealand (#ndf2011), which was held at the national museum of New Zealand, Te Papa. Following the conference, the amazing team at Digital NZ hosted and organized a Linked Open Data in Libraries, Archives & Museums unconference (#lodlam). The two events were well attended by Kiwis as well as a large number of international attendees from Australia, and a few from as far as the US, UK and Germany.

When it comes to innovative digital initiatives in cultural heritage, the rest of the world has been looking to New Zealand and Australia for some time. Federated metadata exchanges and search has been happening across institutions in projects like Digital NZ and Trove. I was able to learn more about the Digital NZ APIs as well as those from Museum Victoria, Powerhouse Museum, and State Records New South Wales. In fact, the remarkable proliferation of APIs in Australasia has allowed us to consider the possibilities of Linked Open Data to harvest and build upon data held in databases in multiple institutions.

Given the extent to which tools for opening access to data have been developed here, I was surprised by the level of frustration that exists around copyright issues. There’s a clear sense that government is moving too slowly in making materials available to the public with open licensing. We talked a lot about the idea of separately licensing metadata and assets (i.e. information about a photo vs the digital copy of the photo), as has been happening across Europe and increasingly the United States. There are strong advocates within the GLAM sector (galleries, libraries, archives & museums) here, and demonstrating use cases utilizing openly licensed metadata will go far in helping to move those conversations forward with policy makers.

To that end, a session was convened to explore the possibilities of an international LODLAM project focused on World War I, the centennial commemoration of which is fast approaching. The Civil War Data 150 project we’ve been slowly moving forward in the US may provide a rough framework to build from. At least a half dozen or more libraries, archives and museums have expressed interest in participating in a WWI project already. First steps may be identifying openly licensed datasets to be contributed, key vocabularies and ontologies to apply, and ideas for visualizations that would leverage the use of Linked Open Data. For anything to happen here, someone will need to take the lead in organizing (not me, we’re still trying to build some tools around the Civil War Data 150 concept!). Good notes were posted on the LODLAM blog about the conversation and how to convene future conversations. Anyone who gets involved with this, please spread the word and keep the LODLAM community apprised of your progress and ways to contribute.

We also had a workshop on using Google Refine by Carlos Arroyo from the Powerhouse Museum, with props to the FreeYourMetadata crew. Some lively sessions dug into just what and how Linked Data is and some of the pitfalls and potentials. Another session explored the importance and potential of local vocabularies, and how they can contribute to Linked Data implementations. One great example was the vocabularies surrounding Maori artifacts (Taonga) at Te Papa, and how publishing those datasets can aid other museums around the world to better describe and provide digital access to Maori collections.

As I’ve attended various LODLAM meetups since June, I’ve noticed clear momentum from one to another as these conversations progress rapidly, with those further along helping those of us just learning. After LODLAM-DC I realized the importance of including library, archive, and museum vendors in all of these gatherings. At LODLAM-NZ I could see the potential of bringing together developers in the GLAM sector and those utilizing Linked Data in commercial settings. In places like San Francisco, where commercial interests are already leading the charge on Linked Data (which is not a bad thing) and there’s an active Semantic Web developer community, the GLAM sector may be playing catchup. But the sheer number of datasets potentially available as open data coming from the GLAM sector, together with the expertise of managing massive amounts of structured data, creates a space ripe for collaboration and experimentation, and these lines will continue to blur.

Open Humanities Working Group Update

December 20, 2011 in WG Humanities, Working Groups

The following update is from the Open Humanities Working Group, courtesy of James Harriman-Smith. To help you keep up with everything that’s going on across the OKF, we are publishing weekly updates from different Working Groups.

Salvete. Ahem. The latest and most important news from the Open Humanities Working Group is that we now have a blog, intended to help coordinate all of the Foundation’s projects in the humanities: http://humanities.okfn.org. This follows on from the merger of the Open Literature mailing list into the Open Humanities one.

On the site you will find:

If you have a spare moment, please do have a look and give me any feedback about content and design you have. You might also like to tell us about an upcoming event, join our mailing list, or drop in for our next general meeting on Wednesday 18th January at 5pm GMT.

Opening up Domesday Book

December 19, 2011 in Public Domain

The following guest post is by Anna Powell-Smith from the Open Domesday project. Anna is a member of our brand new Working Group on Open Humanities.

Domesday Book might be one of the most famous government datasets ever created. Which makes it all the stranger that it’s not freely available online – at the National Archives, you have to pay £2 per page to download copies of the text.

Domesday is pretty much unique. It records the ownership of almost every acre of land in England in 1066 and 1086 – a feat not repeated in modern times. It records almost every household. It records the industrial resources of an entire nation, from castles to mills to oxen.

As an event, held in the traumatic aftermath of the Norman conquest, the Domesday inquest scarred itself deeply into the mindset of the nation – and one historian wrote that on his deathbed, William the Conqueror regretted the violence required to complete it. As a historical dataset, it is invaluable and fascinating.

In my spare time, I’ve been working on making Domesday Book available online at Open Domesday. In this, I’ve been greatly aided by the distinguished Domesday scholar Professor John Palmer, and his geocoded dataset of settlements and people in Domesday, created with AHRC funding in the 1990s.

I’m very happy to announce that we’re now able to make full-page, high-resolution images of the Domesday folios available under CC-BY-SA. You can browse or download the folios en masse at the Internet Archive (recommended), or page-by-page at Open Domesday.

I’ve also been working on a RESTful API to Domesday Book, to accompany the release of the folios. Our API supports geographic queries, and returns the economic and social statistics for each settlement in Domesday – from the number of households to (where listed) the number of pigs, sheep and oxen.

As an example, here is the folio entry for Marsh Gibbon in Buckinghamshire, still a thriving village today:

Marsh Gibbon in Domesday Book

Domesday is not often descriptive, but this entry gives us a glimpse of the state of the English population. The entry tells us that Marsh Gibbon held 11 households, including 3 slaves, and it has woodland on which the locals paid a tax of 30 pigs. The owner in 1066 was one Aelric, and he still lives there “harshly and wretchedly”.

There are entries like this for nearly 15,000 places. We hope that this data release will lead to some interesting new applications (may a thousand iPad apps bloom) and research – like this population heatmap of Domesday England, created from Professor Palmer’s raw data by Andrew Bevan at University College London:

Population heatmap of Domesday England

To end on a downbeat note, it’s worrying that among historic texts, Domesday may become the exception, rather than the rule, by being available under an open licence. The only reason we are able to make the folios available at all is that Professor Palmer took his own images of the Ordnance Survey’s photozincographic copies some years ago, and has kindly agreed to release them for the benefit of others.

In particular, although the British Library has teamed up with Google to make thousands of historic texts available online, it seems the digitised copies will not be truly open, as Glyn Moody has warned.

But enough grumbling – I hope you enjoy the Domesday data. If nothing else, it’s something to show elderly relatives over Christmas! Please contact me with comments.

 

Opening Government Data in Bulgaria

December 16, 2011 in External, Open Government Data, WG EU Open Data, WG Open Government Data

The following guest post is by Boyan Yurukov, blogger and open government data activist.

In the beginning of 2011 some open data was released by the Bulgarian government on www.parliament.bg. Visitors could export information of bills and members of parliament as XML or CSV. They could also download the votes of individual MPs or parliamentary groups as Excel files. While what data was useful and an important step forward, I found problems in the format and the exported files. Also, one could find a lot more information on the website, that could not be exported as open structured data.

So I started a project to scrape the website, fix the available data, refine, enrich and link it. After several versions of the schema, the final dataset was released in the beginning of December. It contains over 11,000 data points and over 1.12 Gb of data. The items are as follows:

  • Profiles for each MP – general biography, previous parliamentary terms, participation in parliamentary groups, committees, “friendship groups” and delegation, supported bills, absences, external consultants, questions during plenary meetings.
  • Data on bills – laws, legislative proposals, decisions and official declarations.
  • Parliamentary groups and committees – current members and member history, proposed bills, external consultants, meeting schedule, agenda, transcripts and reports.
  • Parliamentary delegations and “friendship groups” – current members and member history.
  • Parliamentary sittings – program for the sitting with questions and legislative proposals; transcripts; voting history for each MP on each discussion point.
  • Parliamentary procurements – description, topic, procurement registry code.

The dataset can be downloaded as two ZIP files together with the XSD schema. The scraping scripts are also open sourced in GitHub. You can find all this open Bulgarian Parliament data on the DataHub.

Although refined, this data is not without its flaws. Some historical data on MPs’ biography and questions is missing. Also, transcripts are not structured, but in free text, making it almost impossible to parse. There is some hope that the parliamentary administration will release the transcripts in XML, but I’m not holding my breath. Currently the transcripts go back 20 years, and those back to the ’70s are being parsed and will be released soon. All other data is since 2001, except individual votes, which are since 2009.

This data can be quite useful for parliamentary journalism, but in itself consists only of raw XML files. This is why another project is being set up that aims at building a platform for analyzing and visualizing the refined dataset. It will be targeted at data journalists and visualization experts. It is sponsored by the Institute for Public Environment Development and all results will be released as open data. I hope that in the first quarter of 2012 the first beta will come out.

SNCF launches a debate on open transport data in France

December 15, 2011 in External, Open Data, WG Open Transport

The following guest post is by Pieter Colpaert from iRail npo and Pierre Chrzanowski, and was reviewed by Regards Citoyens. Pieter and Pierre are both members of our brand new Working Group on Open Transport – watch this space for a full announcement of the working group’s activities and details on how to get involved!”

At first sight, you may think that data.sncf.com is the new open data website of the SNCF, the National Corporation of French Railways. Not yet. The company preferred to launch a consultation website before opening up its data. Anyone can add their thoughts on open transport data on data.sncf.com.

In a country struggling to involve the transport industry in the open data movement, this initiative is most welcome. After the release of data.gouv.fr, we hope transport data will soon be part of the available datasets. The lack till today of open transport data in France led independent initiatives to extract the data without authorisation, placing them in legal insecurity. A change by SNCF is therefore really welcome.

Although SNCF seems to be ready for open data, other public transport operators in France are still reluctant. RATP, the state-owned subway operator for Paris area, recently refused to let other app developers use its map for free. This inspired CheckMyMetro, a startup which was forced to remove the RATP map from its smartphone application, to organize a subway map design contest.

As a lot of organizations are launching similar debates on open data, it is important that they rightfully apply the word “open” and that while doing this they know how to gain an added value for themselves and their customers. Data.scnf.com is a great opportunity to remind the SNCF and other transport actors in Europe of the actual meaning of the word “open” and to help introduce a productive open data policy.

Open data for multimodal transport

Today, commuters use different types of transport to go to work or to travel across Europe. For them, access to timetables, networks maps and real-time transport data is the key to organize their journey or to get informed of disruptions. Multimodal transport is part of the last European Commission transport policy which has announced the launch of a contest for the best European multimodal journey planner. The software behind these intermodal journey planners can be as intelligent as can be, but when there is no data, the software is useless.

Some countries are already doing their part. The UK Government recently committed itself to the release of high-value transport data. Which also seems to provide a good input to answer the data.sncf.com consultation. Here is the comprehensive list of transport data soon to be released: - Rail timetable information on a weekly basis - Real-time running data from Network Rail - Location about Great Britain Rail Network and GB rail network stations - Traveline National Dataset on a weekly basis (Great Britain buses) - Next Buses API of planned and real-time information at 350 000 GB bus stops

There are already many journey planner apps offered either by transport companies or developed by independent developer teams, but only a few can help you to organize your journey across the whole EU – deutschebahn offers the closest. Furthermore, with open data, there are new services to come that transport companies did not think about.

Transport innovation through real open data

By starting a debate on open data, data.sncf.com wants to take the first steps towards clearing the path for innovative services. The definition of open data is clear and not debatable. As defined by the Open Knowledge Foundation: “A piece of content or data is open if anyone is free to use, reuse, and redistribute it – subject only, at most, to the requirement to attribute and share-alike”. This means data need to be released for free in an open license and available in open formats. The French statements on open data also give a clear definition of what “open” means. SNCF could then choose to open its datasets either under the new French Open License or among other open licenses available like the ODbL, already in use in different French cities. On open formats, the 5 star-ranking of the W3C is a good reference. But open transport data is part of an industry and a new market. If we want to help developers to develop multimodal apps, the respect of standards is required.

Let’s hope this initiative from the SNCF is the beginning of a real shift towards open transport data in France and beyond.

You can participate to the SNCF debate here

The ePSIplatform is also working on a report on the re-use of transport data in Europe. You can reply to their questionnaire here.