These are some highly impressionistic notes taken at a workshop on a cross-gov data mashing lab which took place today at the Royal Society in London organized under the aegis of the Department of Transports[1]. The purpose of the lab would be to develop tools and demonstration projects which would illustrate the possibilities of data reuse as well as the legal (IP), social (lack of availability), and technological (not-standard formats) obstatcles that prevent this happening at present.

While the presentations (summarized below) were fine the main interest came from the Q&A and a couple of talks during lunch afterwards. In particular I had a chance to chat extensively with John Sheridan of the OPSI who explained that the primary obstacle for access to the large amount of PSI data (which is primarily under crown copyright) was not to do with IP — as it was covered by the ‘open’ click-use license — but that a) many, many documents weren’t available (online) b) it is very hard to determine for a given document whether it is covered by the global click use license.

[1]: for the event announcement see:

Frank Kelly

DfT examples

  • MIDAS (motorway incident detection … system)
    • initial aim was real-time control of speed limits
    • archived since 1997 (huge amounts of data)
    • mash up data to monitor achievement of psa target (i.e. journey time)
  • Transport direct
    • unique numbering for bustops
    • mysociety
  • Accession
    • two hospitals: going to close one what is the optimal new bus route to maximize a social welfare function
  • web mashing:
    • only backend can be done by government — lot of work, making db
    • front-end aren’t so difficult: busmonster (seattle)
  • psa target: improve road safety
    • make speed limit database freely available so that people can have it in car
    • needs map data
    • benefit to cost ratios over 100
    • but os is a trading fund (IPRs etc etc)
  • new sources:
    • oyster
    • mobile
    • norwich union: pay as you drive insurance
  • Raise issues of privacy, convenience, personalisation
    • but believe that with transport issue we can avoid these issues pretty easily
  • Data Grand Challenge: 3 items
    1. review evidence base informing govt’s data charging policy
    2. ensure data strategies adequately informed by understanding of socience and technology
    3. govt mashing lab (? — missed this one)
  • Role of govt is not to do the innovation but to give access to the data that will permit the innovation
    • having a mash-up lab (public-private partnership) would allow us to explore these issues

Brief Q&A

  • Under umbrella of govt we can access to e.g. os and do stuff. This will allow us to demonstrate the value of this and we can move on to widening access later.

John Darlington, Imperial College

  • web 2.0yy
  • multiple convergence
  • set up internet centre last year with seedcorn funding from imperial college
    • promote studies with stakeholders in these areas
    • technology program
    • economic and social issues
    • set up industrial forum — everyone talking under chatham rules, a lot could be achieved
  • new specialization:
    • systems/hardward ->
    • utility computing ->
    • service hosters ->
    • service developers ->
    • users
  • producers and consumers cross-over (everyone is both a consumer and a producer)
  • the promise of the semantic web: semantic operability (easily share across various communities)
  • conclusion
    • publicize data with appropriate slas
    • provide mashup and market toolkits
    • leave it to the internet

Nigel Shadbolt, University of Southampton (AKT)

  •, 65 research staff, 8 million, 6 years
  • controlled vocabularies (rdf)
  • endorsed by w3c now
  • the sum is greater than the parts: show what different funding bodies are using
  • CS AKTive: aggregate data who is doing what, where in CS labs in the UK
  • work on medical datasets and with military
  • AKTive PSI
    • draw together a set of heterogeneous info from a selection of public sector bodies
  • life in two london boroughs
    • don’t mess with existing workflow
    • automate data extraction
    • ontologies not high level but specific to each dataset
    • then interface in some manner
    • camden provided property gazette
    • working with os (they have things we want and we have things they want)
  • drivers for change
    • nature, abundance and ubiquity of data
    • increased experience of mash-ups and riche data exposure
    • realisatioin of the potential serendipitous reuse of data
    • regulatory and legislative changes

Chris Lightfoot,

Chris re-presented the slides produced for Travel-time Maps and their Uses which you can find at


ONS person:

  • are we duplicating private sector (credit agencies do data mashing)
  • are we duplicating public sector (HM revenues and customs)
  • what about privacy issues (will we get access to govt confidential information)

Matthew Locke, BBC. BBC Backstage

  • backstage
  • interesting story about not even being able to use their own tv scheduling data so they end up rebuying from tvanytime
  • reboot:
  • innovation labs:
    • more about indie community rather than lead-user community
  • open dialogue with innovation communities — end goal is commissioning new services for
  • had ring-fenced budget to commission projects from backstage or labs (have to embed that commitment politically, o/w you get the innovation but it doesn’t migrate into what you actually do)
  • what we’ve learnt: pro
    • shift to user-driven design
    • encourages open structures across inventories, assets, networks
    • encourages innovation as a social, collaborative, iterative process
    • perpetual beta — closing gap between research and implementation
  • what we’e learnt: con
    • need to manage expectations — can’t turn off the tap
    • need to that lead-users within organization can particpate
    • organizational hacking — so you can bring stuff in
    • change culture: move towards perpetual release rather than targets
  • 4 roles for hosts:
    1. hosting conversations
    2. host data
    3. own sandpit space
    4. own commissioning process

John Sheridan, Office of Public Sector Information

  • example of reuse of data from his experience
    • wanted to integrate QCA openquals database (list of FE courses) with metatagging software that he had
    • developers were keen. talked for about 6 months
    • QCA said they had got some consultants in to advise them and that was the end of the story
  • types of availability
    1. provides click-use system for obtaining a licence to reuse crown copyright material through an online licensing process
    2. charge-for data
  • there is a reuse revolution
  • merger with national archives
  • streamlining licensing

Paul Drummond, DfT, Transport Direct

  • what is transport direct
  • Industry supported us but it would not have happened without government support
  • 3 types of users supported by user interface:
    1. active
    2. passive
    3. constrained
  • little data under DfT control
  • 141 local authorities
  • buy in data
  • free for service users
  • stakeholder engagement (appears to mean engagement with industries that controls the data)
  • setting up data licenses involved a lot of time and a lot of effort (more than we had envisaged)
  • developed new data standards which are internationally recognized

Peter Miller, Ito!

  • Peter is from transport sector and his colleague is from the film industry
  • lots of pretty pictures showing geo+transport data
  • want data:
    • subsidised bus data
    • mobile phone data
    • have census data
    • next week will get taxi data for a major town for whole year
Website | + posts

Rufus Pollock is Founder and President of Open Knowledge.

1 thought on “DfT Workshop to Discuss Data Mashup Lab: Summary”

Comments are closed.