You are browsing the archive for jwalsh.

Avatar of jwalsh

by jwalsh

Open Data talk at Census Microdata workshop

May 25, 2011 in Talks, Uncategorized

Jo Walsh, Service Manager at EDINA and a member of the Open Knowledge Foundation board, writes: Yesterday I gave a last-minute talk on open data, the work of OKF and EDINA to a Census Microdata workshop in Edinburgh.

The slides consist of screenshots with links and cover the following.

CKAN – the Data Hub and the place to get all sorts of data that may be relevant to demographic analysis. A CKAN search for ‘census’ currently returns 27 relevant datasets. There are many CKANs, some are run by governments (including data.gov.uk) and many more run by community groups. Open Data Search looks at many different sources of open data including (i think) the network of CKANs.

Note that CKAN includes datasets that are not open, but one day may be open. So there is a companion service, “Is it Open Data?”; one can write to data providers through it, and the questions and answers are recorded in public. So if there are datasets which may be non-commercial or research-only that you really want to see opened, try “Is it Open Data”.

Now that we have got data, what are we going to do with it? Get the Data may help – this is a stackoverflow-type site intended for data wranglers rather than programmers. Ask questions here and look for relevant answers…

So all these things are projects of the Open Knowledge Foundation which builds infrastructure / tools and also does some open knowledge production – for example Open Shakespeare, which has done some print editions, and Where Does My Money Go?, visualising public spending and contributions in novel interactive ways.

Open Data Commons is another OKF project – it publishes the PDDL and ODBL licenses (inspired by free software licenses), which can be used to preserve the future freedom of your data.

OpenStreetmap is one open data project that’s now moving to the ODbL license – take the data and adapt it, but if you make improvements they should be contributed back to the original project. Taginfo shows that some people are adding census data to OSM.

EDINA the JISC datacentre based at the University of Edinburgh, provides several open data services which may support demographic data analysis and visualisation. Open Boundaries is the open data side of the long-standing UKBORDERS service for research access to boundary data. Digimap OpenStream provides web map tile services based on Ordnance Survey Open Data. Both these services are currently available to anyone with an .ac.uk email address. And the Unlock place search and text mining service provides some global open data coverage, free for anyone to use.

Avatar of jwalsh

by jwalsh

Sustaining open data business

May 22, 2011 in Bibliographic, Business, Open Access, Open Data

Jo Walsh, who works as a project manager at EDINA and sits on the Open Knowledge Foundation board, writes:

These thoughts on sustaining open data business were provoked by ORCID, a not-for-profit business set up by a group of large academic publishers and a few leading universities. Its aim is to provide a central directory of researchers, with profiles describing them.

ORCID is committing to provide open source software but not necessarily open data – offering some limited “non-commercial” activity of the service. Researchers can open their data by “claiming” it but what volume of them are going to do that? Do many more than 15% of academics publish their work in their local open access institutional repository?

I want to illustrate that it is perfectly possible, if not necessary, to support a business publishing open data. Strategies for successful open data companies:

  • Charge for quality – as geonames.org offer a cleaned up better authoritative version of a somewhat crowdsourced database
  • Charge for high volume – as SimpleGeo offer 10K per day calls to the service and charge a small fee after that.
  • Charge for private data storage – as Talis offer free triplestores for linked open data, and charge for a private data service.
  • Charge for analytical capacity – Fortius One offer the free GeoCommons web map making service and charge for the GeoIQ analysis package.

Of course one can always do consultancy and custom development to cover costs. Establishing a namespace, becoming a reference point for others; geoname linked data is used because it is widely used, because it arrived early in the domain.

In a survey of potential users, the most sizeable number of ORCID prospective users thought the data would only really be useful as open data. Charging for institutional access and sponsorship are seen as ways to sustain it. Yet there plenty of ways to sustain open data business, for-profit or not or in between. We might yet get a system that really serves academic publication rather than markets to it.

Avatar of jwalsh

by jwalsh

Notes and reflections from #ScotGovCamp

August 1, 2010 in Events, Ideas and musings, Open Government Data

Yesterday I went to ScotGovCamp in Edinburgh and had a lovely time. Spent more of it chatting in the hallway than participating in the sessions; but have detailed notes from the Open Data session led by Chris Taggart of Openly Local, and scatterings from elsewhere.

Open Data

Chris cites his membership of OKF’s Open Government Data Working Group, the London Datastore advisory body, and the Westminster Local Public Data Panel. Good, we now know we are dealing with a pretty serious guy.

His focus has been on the “English Experience” and he’s come to make contacts in Scotland. Citing as recent developments with impact yet to be fully felt, the Ordnance Survey Open Data release and the disclosure of Westminster MPs’ expenses. Looking for “drivers and levers” that will surface as yet unseen issues in local government.

It’s much less clear (at least here in the UK) how local, as opposed to central, communications and decision-making networks actually work. Local authorities are in an unclear legal situation – European PSI law should oblige local government to publish more data, but the knowledge of the law is often just not there (people are too busy).

OpenlyLocal has been going for a mere 15 months. It was inspired by a Manchester version of They Work For You and by the ScraperWiki project. OpenlyLocal collects information about local government data sources and critically the people involved, the social networks involved in decision making at council level. The site now has some amount of data (scraped from websites and republished as Linked Data) for 158 councils in England and Wales – but for only 4 in Scotland. One ultimate aim is to encourage local authorities to re-adopt the data, and the practises, being created by Chris and the contributors to OpenlyLocal. Other motivating things for publishing local administration info, as pure data:

  • Accessibility concerns. Publication of data, as opposed to pictures of data (like PDFs) avoids accessibility concerns. Creation of interfaces to data is expensive and incurs a maintenance burden…
  • Possible to tie in to other hyperlocal resources – a good example in Edinburgh is Greener Leith
  • Creation of an index, or directory, to existing council resources, that is easier to explore than a conventional website

Chris outlined 4 key reasons why open local data is important (though the reasons seem to alter with every re-telling).

  1. Transparency – we can see for ourselves, and draw our own conclusions.
  2. Engagement – citing Planning Alerts – casual engagement is possible, you don’t need to be obsessive
  3. Equality – “open data is about equality of access, because all this data is currently available for a price, and that’s not right”
  4. Relevance – to local temporal reality of affairs – less decoupled synthesis of prepared or reported data – just data.

“Quality of data is important and opening that helps (and is used as a blocker) but not as important as other points”

Can we make interfaces that work for our grandparents?

“There’s a much bigger step between creating nothing, and creating something, than between creating something stupid, and creating something great… just make a start, somewhere, anywhere.”

To local administrations – “it should cost nothing to release open data. If it doesn’t cost nothing, you’ve got a really bad outsourcing deal”.

To everyone else – “Fundamentally, it’s our data.”

Questions about quality

Recently, I’ve been thinking a lot about data quality within the geo ghetto, so it surprised me to hear several audience questions from local administrative workers, directly asking about data quality. How imperfect/unreliable/uncertain is the data? Given inevitable uncertainty, how is this doubt stopping us (or the decision makers for whom we are responsible) from opening data?

Data quality problems can have severe cost and social effects – one case cited was a database recording details of children, in which 5% of dates of birth were wrong, so 5% of people are being treated administratively as children when they are not, or treated as adults when they are not (at least according to the administrative definitions, processes etc).

It’s quite possible to measure quality, to test and to describe it. Data package tests, like software package tests, extracting what’s useful from the formal standards thinking on quality. But this is too much of a digression, some of which is here, some of which is on the way.

Law and Computers

An interesting session which i only caught the end of and is more fully described on the ScotGovCamp blog by my EDINA colleague, Nicola Osborne. My notes say this:

German reform in the early 19C. | Biblical census. 
Legislation | Standards | Influence
e-Care records, ATOS Origin
distributed versioning in citizen data - propagation,
provenance, merging.
Robot Queen? Automaton?
Target specification - e.g. music education,
department of education directive.
specifications, models, records management 
overspecification in law, cost, fear.
Westminster Information Act
(ontology-like)

Cuts

Dropped into the session on cuts, which wasn’t all gloom and doom, but more vendor optimism about shared services. Asked vendors about whether they made free software, or could find a place where business benefit to themselves and organisational benefit to their (public administration) clients could be created by freeing their software (in parallel to building shared hosted services). Not sure there was an answer.

Wondering about open demographic data, social credit data, and what’s the non-proprietary answer to Experian.

Good comments from Chris Taggart in this session too – “specialising in one thing, as a service provider. Low barriers to entry – low barriers to exit equally important”. Wondering about a JISC-like body for stewardship of shared services for local authorities. Would probably become a beast.

Fragments of insight

The big consultancies that form consortia to do government work, work by mimicry – by mirroring the hierarchical administrative structures that they are serving. But then internally, they actually do iterative micro-procurements – as in EU consortia the bulk of the actual work is done by very small providers. Many large and small companies work across local authorities, and it would be fascinating to see the map of who and where they are, which Chris is beginning to derive from spending data.

Shadow networks, shadow systems form, inevitably, in organisations at scale. But a paradox – the more superficial openness there is (coming from cultural change, or coming from legal or quasi-legal mandates, or meeting in the middle) the less is actually recorded. Data implies audit, audit invokes fear of loss. So organisation becomes about emotional concerns – perhaps it would be helpful to recognise this more?

Note, i corrected a bit of this, Equality rather than Quality, with which i must be temporarily obsessed. Thanks Chris for notes. Thanks Tim Howgego for insights.

Avatar of jwalsh

by jwalsh

Dig the new breed, Part III – wrapping it all up

June 11, 2010 in External, Ideas and musings, Uncategorized, WG Archaeology

This is the third in the amazing series of guest blogs from Ant Beck on the impact of linked open data for archaeology.

Part 1: New approaches to archaeological data analysis, as seen in the DART and STAR projects Part 2: Considering the ethics of sharing archaeological knowledge

OK, to recap we have:

  • A scientific movement that advocates open approaches to data, theory and practice
  • Emerging foundational interoperability using semantic web technology
  • The potential to remove a barrier and facilitate the submission of primary data

These three powerful factors could prove to be highly disruptive. In combination they have the potential to turn archaeological data and data repositories from static siloed islands (containing data that is increasingly stale) into an interlinked network of data nodes that reflect changes dynamically.

The linch-pin is the use of triplestores (RDF databases) that provide persistent identifiers. Persistent identifiers allow us to refer to a digital object (a statement, a file or set of files) in perpetuity, even if the underlying storage location moves. This means links between objects are persistent: therefore, when an observation or interpretation changes its effects are propagated through to all the data/events that link to it. I see organisations such as the ADS, Talis (an innovating semantic web technology provider which provide the Talis Platform which includes a free RDF hosting service for open data) and national heritage bodies providing such services.

Some open science projects are likely to adopt RDF as their de-facto data sharing format. RDF triples (subject, predicate, object) provide a schema transparent mechanism for data storage. They are not ideal for all data types (raster data structures for example) but when used with Ontology and SKOS, as demonstrated by STAR, they are powerful analytical, search and inference tools.

So, what is the importance of storing heritage data in RDF? Well, it depends which point of view you take. From a data management perspective there is no longer any need to migrate data formats. However, to facilitate re-use, different “views” of the RDF model can be generated and incorporated into traditional analytical software, such as GIS. Importantly, analysis stops being a “knowledge backwater”: new knowledge can be appended back into the triplestore.

Linked Data concepts in archaeology

Linked Data concepts in archaeology

From a data curation, re-use and analysis perspective the quality of the data has the potential to be dramatically improved. Deposition is no longer the final act of the excavation process: rather it is where the dataset can be integrated with other digital resources and analysed as part of the complex tapestry of heritage data. The data does not have to go stale: as the source data is re-interpreted and interpretation frameworks change these are dynamically linked through to the archives, hence, the data sets retain their integrity in light of changes in the surrounding and supporting knowledge system.

An example is probably useful at this juncture: In addition to many other things pottery provides essential dating evidence for archaeological contexts. However, pottery sequences are developed on a local basis by individuals with imperfect knowledge of the global situation. This means there is overlap, duplication and conflict between different pottery sequences which are periodically reconciled (your Type IIb sherd is the same as my Type IVd sherd and we can refine the dating range…… Hurrah… now let’s have another beer). This is the perennial process of lumping and splitting inherent in any classification system. Updated classifications and probable dates allow us to re-examine our existing classifications. One can reason over the data to find out which contexts, relationships and groups are impacted by a change in the dating sequences either by proxy or by logical inference (a change in the date of a context produces a logical inconsistency with a stratigraphically related group) While we’re on the topic of stratigraphy, an area of notorious tedium and poor quality data (often with conflicting relationships), RDF allows rapid logical consistency checking as stratigraphic relationships are basically a graph and RDF triples are a graph database. Publically deposited RDF data should be linked data: this means that all the primary data archives are linked to their supporting knowledge frameworks (such as a pottery sequence). When a knowledge framework changes the implications are propagated through to the related data dynamically. This means that policy, development control and research decisions are based upon data that reflects the most-up-to date information and knowledge….. cool huh.

Incorporating excavation data into RDF means that ontology and SKOS can be used to dynamically repurpose the data for policy formulation, planning impact, regional heritage control and mitigation purposes in conjunction with the data in the Sites and Monuments Record (SMR). Raw data can be integrated from multiple different sources with different degrees of spatial and attribute granularity and, where appropriate, generalised so that the data is fit for the end users’ purpose. From a policy perspective curatorial officers no longer have to battle to stop datasets becoming stale and add new datasets to the local SMR. The SMR will remain an essential dataset: even though it is a generalised resource it is the only location of a digital record for resources that are unlikely to be digitised in the future (unless there is a very unlikely reverse in funding patterns). Thus the curatorial officer can develop more effective regional research agendas based upon up-to-date and accurate data.

This has the potential to change the way Historic Environment Information Resources (HEIRs) are managed by curatorial officers and transform how developers (property and software), policy makers and the general public engage with and consume any data. They will be able to support innovative access to primary linked data resources by researchers, planners and most importantly the public. This is a significant and important change in role. In addition the heritage data can be mashed up with other data resources to produce tailor made resources for different end-user communities – following the model successfully employed by data.gov.uk.

Data re-use and mashups are also important for those undertaking research and analysis. The big difference will be for those who undertake research or collect data that transcends different traditional analytical scales. For example, the National Mapping Programme which aims to “enhance the understanding of past human settlement, by providing primary information and synthesis for all archaeological sites and landscapes visible on aerial photographs or other airborne remote sensed data” will provider deeper insights when it is integrated with other data. However, this integration can occur in real time and add tangible interpretative depth. If an interpreter is digitising data from an aerial photograph and they see two ditches cutting one another they are unlikely to be able to tell the relative stratigraphic sequence of the two features. Direct access to excavation or other data will allow the full relationships and their interpretative relevance to be deduced during data collection.

In the longer term consumers of archaeological data will be more used to dealing with primary data, will become more aware of its potential and demand more of the resource. This should produce a ground up re-appraisal of recording systems and a better understanding of archaeological hermeneutics. The interpretative interplay between theory, practice and data as part of a dynamic knowledge system is essential. Although this has been recognised, in reality theory, practice and data have never really been joined up. We don’t have to use a one size fits all approach to conducting excavations, but we can tailor bespoke systems that address local, regional and national research challenges. We can generate interesting and provocative data that can be used to test theory and inform practice and move away from recording systems mired in the theoretical and intellectual paradigms of the mid 70’s.

The virtuous circle is re-established; theory will influence practice, which will change the nature of the data, which will impact on interpretative frameworks, which will provide a body of knowledge against which theory can be tested.

Final comments

There is a new breed: there are people and organisations who don’t want to do what’s always been done. People who are empowered and don’t believe that established institutions and hierarchies are the gatekeepers of progress: organisations that can, and want to, change the way we ‘play the game’, people who want to collaborate. Organisations that want to share. Open approaches can help to make all this happen. This is all facilitated by disruptive technology which is increasingly mature, broadly available for free (or at a low cost) and with low barriers of use and re-use. In the nearly twenty years of studying and working in the heritage sector I’ve seen it change dramatically. I feel we are on the cusp of changing the way we engage with our data which could profoundly alter the way we understand the past, how we can communicate this in the present and how we can sustainably manage a complex resource for the future.

Avatar of jwalsh

by jwalsh

Dig the new breed, Part II – open archaeology and ethics

June 11, 2010 in External, Ideas and musings, WG Archaeology

The second in this great series of three guest blogs by Ant Beck. See Part 1 for applications of linked data and remote sensing in archaeology. Part 3 will wrap things up and talk about the disruptive implications of linked open data for impact of archaeology.

Open Science provides the framework for producing transparent and reproducible science by providing open access to raw data, algorithms and interpretations. Efforts such as STAR and STELLAR provide the foundation from which fine granularity excavation data can be made available as part of the semantic web and feed into Open Science analysis. This provides answers to the questions of how and why we should have open access to archaeological data. However, it does not provide answers to what data should be opened or if archaeological data should be opened at all. We move into the sphere of ethics and open archaeology.

Treasure seeking - CC-BY-SA-NC

Recently I have chatted to a number of people and organisations who want to open up heritage data. The conversations tend to have an ethical component. Like other disciplines, such as ecology, there are potential ethical issues in making heritage data open. The oft touted reason, in the UK at least, is that if access is given to this information then it will be exploited by “night hawkers” (irresponsible metal-detectorists) and other “treasure hunters” and sites (a term I don’t really like) will be destroyed.

This argument is polarised and plays to the lowest common denominator: it is based on the premise that “accessible knowledge will inevitably be abused” and eschews any of the benefits that data sharing can provide. Nor does it consider the nuanced ethical arguments concerning re-appropriation of artifacts collected under imperialist regimes or the ethical conundrum surrounding research into aboriginal or other indigenous communities (which, now that I’ve raised them I wont comment on them further). The Portable Antiquities Scheme has done much to improve this argument.

The elephant in the room in this debate concerns those archaeologists who have sat on their archive for decades. We know of its significance but it is not available for academic and research analysis and does not inform the planning process. This has enormous impact on local planning policy, public and academic understanding, theory, practice etc. Since, the 1990 introduction of Planning Policy Guidance 16 (PPG16: essentially commercial archaeology) in the UK, and the later Planning Policy Statement 5 has improved the situation a bit.

But I find the situation somewhat paradoxical. The UK curatorial systems expect that a generalised summary, or synthesis, of any investigation is deposited with the regional curatorial officers. This data is entered into the Sites and Monuments Record (SMR) and is publically accessible. Therefore, the public has access to a generalised dataset. The expectations for primary, or raw, data are different: it’s considered ethically appropriate to deposit fine granularity data (i.e. non-generalised, primary, data, such as those from excavation) with the Archaeology Data Service (ADS), however, there are issues raised if an individual wants to do this outside such formal structures (however, the Perry Oaks Project have released redacted versions of their site data).

Is this an issue of ethics, or where formal and informal work practices collide; or is this simply an issue of cost, where individuals and organisations have the will but not the finances? Alternatively, and possibly most likely, do archaeologists just feel uncomfortable making their fine grained data available to a mass audience without going through a representative authority such as the ADS? My feeling is that within the archaeology domain there is an informal belief that if data is deposited with a repository then the repository also takes the ethical responsibility if the data is released. Deposition so that data is available in perpetuity is part of business and academic best practice, however, deposition does not necessarily mean release and subsequent consumption by other parties (public or otherwise).

Whatever the answer the point remains: archaeologists, for right or wrong, consider the implications of placing fine grained data in the public domain and “Ethical considerations” have been identified as a “barrier” to deposition. However, there appears to be limited guidance as to how to resolve these issues. This means that many archaeologists are re-inventing the wheel. The challenge is to provide some supporting “thing” that makes it easy for individuals and organisations to get to a clear, and hopefully unambiguous, ethical position. Such a “thing” will reduce uncertainty thereby removing one of the barriers to data sharing. The current default position is the equivalent of doing nothing: surely this must change.

Supporting “stuff” which is recognised and approved by national heritage organisations and standards bodies will act as important lubricant to help individuals and groups to release data through informal channels. It should be recognised that the relationship between the “citizen”, the archaeologists and heritage data will change: citizen science and citizen data, will play more of a role in heritage than ever before. Hence, a focus on the informal is important: we don’t want more grey data so we? The Portable Antiquities Scheme is the “poster boy” for archaeological approaches to citizen science – although they do have a range of different user access levels.

I raised this as a topic for the Archaeology working group at the Open Knowledge Foundation. Response so far has been positive and has spilled over to colleagues in the curatorial sector and beyond (the discussion thread can be found here). We’ll be setting up a meeting to discuss these issues later in 2010. Both the Archaeology Data Service and the University of Leeds have kindly offered a venue.

There’s also a start at creating an ethics statement on open access to raw archaeological data – a statement that should be supportable by institutions and individual researchers alike. If you’d like to get involved, please join the Open Archaeology working group and mailing list – involvement could be helping to craft the ethics statement, asking your institution to contribute its own statement, helping to plan and document the workshop.

Avatar of jwalsh

by jwalsh

Open Knowledge Scotland, this Thursday May 13th, 3-7pm Edinburgh #okscotland

May 11, 2010 in Events, OKScotland, Open Data

After having a good look at Inspace set up for a talk session, decided to up the maximum capacity, so there’s no longer a waiting list for OKScotland.

If you’re planning to attend on Thursday afternoon then please do register now.

Here’s the draft OKScotland schedule for the afternoon – note that there is coffee chat and registration from 2:30.

We got Rufus Pollock up to Edinburgh to offer some introductory messages, then three themed sessions of short talks – Open Research Practise, Open Data for Scotland, Open Knowledge and Linked Data.

We’ll have an intensive “clinic” from Charlotte Waelde and Andres Guadamuz at the SCRIPT law and technology research centre. We’ll end the evening with drinks, possibly on the roof terrace, Edinburgh permitting.

Avatar of jwalsh

by jwalsh

Open Knowledge Scotland, May 13th, 3-7pm, Edinburgh

April 8, 2010 in Events, OKCon, OKScotland, Open Data, Open Government Data

Open Knowledge Scotland “brings together interested parties from across the open knowledge spectrum based in Scottish educational institutions, Scottish research organisations, Scottish local and national government, and members of the public for the purposes of teaching, learning and discussion”.

OKCon in London is now in its fifth year. It seemed like time to put together a spin-off event. OKScotland will be a sort of mini-OKCon – an afternoon and early evening event, starting at 3pm (2.30 for coffee) and ending with drinks.

There will be a fair bit of Open Space, with lightning talks by attendees, and a couple of longer “clinic” sessions, including one on open data licensing issues with Charlotte Waelde and Andres Guadamuz from the SCRIPT law and technology research centre.

We should have an interesting mix, with folk from the Scottish Government, the National Library of Scotland, and the Scottish OpenStreetmap community offering short talks, with others on open science, environmental data, and social science research themes. There’s still space for more short talks – if you’re local and motivated, please add your talk title while registering for OK Scotland.

Thanks for support and inspiration from IDEA lab at the School of Informatics, University of Edinburgh and EDINA, the JISC National Data Centre also hosted at Edinburgh.

Avatar of jwalsh

by jwalsh

A free software model for open knowledge

March 17, 2010 in CKAN, datapkg, Events, OKF, OKF Projects, Open Data Commons, Open Knowledge Definition, Talks

Notes describing the talk on the work of the Open Knowledge Foundation given last week at Jornadas SIG Libre.

OKF activity graph

I was happily surprised to be asked to give this open knowledge talk at an open source software conference. But it makes sense – the free software movement has created the conditions in which an open data movement is possible. There is lots to learn from open source process, in both a technical and organisational sense.

In English we have one word “free” where Spanish like most languages has two, gratis and libre, signifying separately “free of cost” and “freedom to”. The Open Source Institute coined Open Source as a branding or marketing exercise to avoid the primary meaning “free of cost”. So whenever I say “open” I want you to hear the word “libre” [Later i was told that libre can have meaning in at least 15 different ways]

The best way to talk about the work of the Open Knowledge Foundation is to look at its projects, which form an open knowledge stack similar to the OSGeo software stack.

Open Definition

The Open Knowledge Definition is based on the OSI Open Source Software Definition (which OSGeo uses as a reference for acceptable software licenses). No restrictions on field of endeavour – non-commercial-use licenses are not open as in the OKD. An open data license will pass the cake test.

Open Data Commons

Open Data Commons is run by Jordan Hatcher, who started work on the Open Database License with support from Talis, later extensive negotiation with the OpenStreetmap community. ODbL is a ShareAlike license for data, that obviates the problems of inapplicability of copyright to facts, and greediness of the ShareAlike clause when it comes to use of maps in PDFs, etc.

PDDL is a license that implements the Science Commons protocol for open access data, explicitly placing it in the public domain.

The Panton Principles are four precepts for publishers of scientific research data who wish that data to be freely reusable. Being openly able to inspect, critique and re-analyse data is critical to the effectiveness of scientific research.

Open Data Grid

The Open Data Grid is a project in early incubation; based on the Tahoe distributed filesystem. It’s in need of development effort on Tahoe to really get going. Provide secure storage for open datasets around the edges of infrastructure that people are already running. 4340727578_da9a6671a5_b

People are handwaving about the Cloud, but storage and backup are not problems that it is really meant to solve. People make different claims about the Cloud – cheaper, greener, more efficient, more flexible. Can we get these things in other ways?

There is a saying, “never underestimate the bandwidth of a truck full of DAT tapes”

Comprehensive Knowledge Archive Network (CKAN)

CKAN is inspired by free software package repositories, perl’s CPAN, R’s CRAN, python’s PyPi. It provides a wiki-like interface to create minimal metadata for packages with a versioned domain model and HTTP API.

CKAN supports groups, which can curate a package namespace – e.g. climate data – and assess priorities for turning into fully installable packages.

CKAN’s open source code is being used in the data package catalogue for the data.gov.uk project, part of the Making Public Data Public effort in the UK.

datapkg

The Debian of Data – datapkg takes Debian’s apt tool as inspiration for fully automatable install of data packages, with dependencies between them. This is currently in usable alpha stage with a python implementation.

Where Does My Money Go?

The next challenge really is to bring the concerns and the solutions to a mainstream public. Agustín Lobo spoke of “a personal consciousness but not an institutional consciousness” when it comes to open source and open data. Media coverage, exemplary government implementations, help to create this kind of consciousness.

Pressure for increased open access is coming from academia – for the research data underlying papers, for the right to data mine and correlate different sources, for library data open for re-use. Pressure is also coming from within museums, libraries and archives – memory institutions who want to increase exposure to their collections with new technology, and recognise that open data, linked to a network of resources, will work for sustainability and not against it.

The next generation of researchers, who are kids in school now, will grow up with an expectation that code and data are naturally open. It will be interesting to see what they make!

Meanwhile OpenStreetmap is feeding several startups, and more commercial presence in open data space will be of benefit. Illustrative that one does not have to be proprietary to be commercial.

Now higher-profile government projects opening data are helping to mainstream. To what extent is open a fashionable position, to what extent is open reflected throughout the way of working?

Open process; early release, public sharing of bugs, public discussion of plans – everything in Nat Torkington’s post on Truly Open Data. The opportunity to fail in public, to learn from others’ problems, and self-interestedly collaborate.


I had a great time at SIG Libre 10. Oscar Fonts’ talk on OpenSearch Geospatial interfaces to popular services has me itching to add an OpenSearch +Geo interface to CKAN, as well as to work on getting the apparent version skew in the Geo extensions resolved amicably.

Genís Roca spoke thought-provokingly on Retorno y rentabilidad (there isn’t really an equivalent English word – “rentability” – less exploitative or focused than profitability). Rentability, especially for online services, can come in ways that sustain an organisation predictably, and don’t involve fishing in the pockets of ultimate end-users.

Ivan Sanchez showed areas of OpenStreetmap Spain with stunning level of detail, trees and fences, MasterMap-quality coverage. I’m inspired to pick up JOSM and Markaartor to add building-level detail from out of copyright 1:500 Edinburgh town plans at the National Library of Scotland’s map services.

Agustin Lobo talked about the distributed work and cross-institutional support and benefit of the R project, and the impact of open source on open access to data in science. He mentioned a Nature open peer review experiment that was discarded – am thinking it wasn’t curated enough. The talk helped me to connect the OKF’s work to the rest of the Jornadas.

The shiny slides prezi.com which many people asked for details of – this should show embedded in the page I hope. I stupidly forgot to put URLs on the slides which is partly why i have written this blog.

Avatar of jwalsh

by jwalsh

Response to the consultation on opening access to Ordnance Survey data

March 15, 2010 in Open Geodata, Open Government Data, Policy

The Open Source Geospatial Foundation, or OSGeo, founded in 2006 is a not-for-profit organization whose mission is to support and promote the collaborative development of open geospatial technologies and data.

The Open Knowledge Foundation (OKF) is a not-for-profit organization founded in 2004 and dedicated to promoting open knowledge in all its forms.

What follows is a shared response to some of the questions raised by the consultation on the future of the Ordnance Survey’s data licensing and pricing model. This was sent using Ernest Marples’ open UK Geographic Data Consultation response service. See also the Simply Understand digestable, short version of the consultation document. March 17th, this Wednesday is the closing day of the response period.

Geographic information is critical to making effective use of open government data. Everything happens somewhere; to find data, and analyse it, location is invaluable context.

The Making Public Data Public programme is part of a global trend among administrations to provide state-collected information to citizens, free of cost or constraints.

Read the rest of this entry →

Avatar of jwalsh

by jwalsh

The cake test of freedom

March 15, 2010 in Ideas and musings, Open Definition, Open/Closed

At last week’s Jornadas SIG Libre in Girona, Ivan Sanchez of the Spanish OpenStreetmap community told me about the cake test of data freedom.

What is the cake test? Easy: geographic data, or a map, is open only if someone can make you a gift of a cake with your map on it.

prueba_de_la_tarta2 The cake test is inspired by the dissident test and the desert island test used by the Debian community to gauge software freedom for packages to be included in a free and open distribution.

For data to pass the cake test, you must be able to freely share the data with someone (the baker) who can re-use it for a profitable activity (the baking of cakes) and is then freely able to redistribute the resulting derived work (the cake).

The cake test can apply to all kinds of information resources, not just geodata. A resource that passes the cake test will be open in the sense of the Open Knowlege Definition. You could print a research paper onto a cake, a chart based on a dataset, some code describing an algorithm. Obviously a map just looks prettier on a cake.

The objective of the Cake Test is quite simple:

If a layperson can’t decide if one can or cannot give away a cake, or cannot do this easily, then the data or the maps cannot be freely used.

And you could be sure that if two datasets each passed the cake test, then it should be fine to give someone a cake decorated with parts of both of them – that is the intention of the data makers.

Is it open data? Does the data pass the cake test?