Support Us

You are browsing the archive for science.

Joint Submission to UN Data Revolution Group

Rufus Pollock - October 16, 2014 in Featured, News, Open Data, Open Government Data, Policy

The following is the joint Submission to the UN Secretary General’s Independent Expert Advisory Group on a Data Revolution from the World Wide Web Foundation, Open Knowledge, Fundar and the Open Institute, October 15, 2014. It derives from and builds on the Global Open Data Initiative’s Declaration on Open Data.

To the UN Secretary General’s Independent Expert Advisory Group on a Data Revolution

Societies cannot develop in a fair, just and sustainable manner unless citizens are able to hold governments and other powerful actors to account, and participate in the decisions fundamentally affecting their well-being. Accountability and participation, in turn, are meaningless unless citizens know what their government is doing, and can freely access government data and information, share that information with other citizens, and act on it when necessary.

A true “revolution” through data will be one that enables all of us to hold our governments accountable for fulfilling their obligations, and to play an informed and active role in decisions fundamentally affecting their well-being.

We believe such a revolution requires ambitious commitments to make data open; invest in the ability of all stakeholders to use data effectively; and to commit to protecting the rights to information, free expression, free association and privacy, without which data-driven accountability will wither on the vine.

In addition, opening up government data creates new opportunities for SMEs and entrepreneurs, drives improved efficiency and service delivery innovation within government, and advances scientific progress. The initial costs (including any lost revenue from licenses and access charges) will be repaid many times over by the growth of knowledge and innovative data-driven businesses and services that create jobs, deliver social value and boost GDP.

The Sustainable Development Goals should include measurable, time-bound steps to:

1. Make data open by default

Government data should be open by default, and this principle should ultimately be entrenched in law. Open means that data should be freely available for use, reuse and redistribution by anyone for any purpose and should be provided in a machine-readable form (specifically it should be open data as defined by the Open Definition and in line with the 10 Open Data Principles).

  • Government information management (including procurement requirements and research funding, IT management, and the design of new laws, policies and procedures) should be reformed as necessary to ensure that such systems have built-in features ensuring that open data can be released without additional effort.
  • Non-compliance, or poor data quality, should not be used as an excuse for non-publication of existing data.
  • Governments should adopt flexible intellectual property and copyright policies that encourage unrestricted public reuse and analysis of government data.

2. Put accountability at the core of the data revolution

A data revolution requires more than selective release of the datasets that are easiest or most comfortable for governments to open. It should empower citizens to hold government accountable for the performance of its core functions and obligations. However, research by the Web Foundation and Open Knowledge shows that critical accountability data such as company registers, land record, and government contracts are least likely to be freely available to the public.

At a minimum, governments endorsing the SDGs should commit to the open release by 2018 of all datasets that are fundamental to citizen-state accountability. This should include:

  • data on public revenues, budgets and expenditure;
  • who owns and benefits from companies, charities and trusts;
  • who exercises what rights over key natural resources (land records, mineral licenses, forest concessions etc) and on what terms;
  • public procurement records and government contracts;
  • office holders, elected and un-elected and their declared financial interests and details of campaign contributions;
  • public services, especially health and education: who is in charge, responsible, how they are funded, and data that can be used to assess their performance;
  • constitution, laws, and records of debates by elected representatives;
  • crime data, especially those related to human rights violations such as forced disappearance and human trafficking;
  • census data;
  • the national map and other essential geodata.

    • Governments should create comprehensive indices of existing government data sets, whether published or not, as a foundation for new transparency policies, to empower public scrutiny of information management, and to enable policymakers to identify gaps in existing data creation and collection.

 3. Provide no-cost access to government data

One of the greatest barriers to access to ostensibly publicly-available information is the cost imposed on the public for access–even when the cost is minimal. Most government information is collected for governmental purposes, and the existence of user fees has little to no effect on whether the government gathers the data in the first place.

  • Governments should remove fees for access, which skew the pool of who is willing (or able) to access information and preclude transformative uses of the data that in turn generates business growth and tax revenues.

  • Governments should also minimise the indirect cost of using and re-using data by adopting commonly owned, non-proprietary (or “open”) formats that allow potential users to access the data without the need to pay for a proprietary software license.

  • Such open formats and standards should be commonly adopted across departments and agencies to harmonise the way information is published, reducing the transaction costs of accessing, using and combining data.

4. Put the users first

Experience shows that open data flounders without a strong user community, and the best way to build such a community is by involving users from the very start in designing and developing open data systems.

  • Within government: The different branches of government (including the legislature and judiciary, as well as different agencies and line ministries within the executive) stand to gain important benefits from sharing and combining their data. Successful open data initiatives create buy-in and cultural change within government by establishing cross-departmental working groups or other structures that allow officials the space they need to create reliable, permanent, ambitious open data policies.

  • Beyond government: Civil society groups and businesses should be considered equal stakeholders alongside internal government actors. Agencies leading on open data should involve and consult these stakeholders – including technologists, journalists, NGOs, legislators, other governments, academics and researchers, private industry, and independent members of the public – at every stage in the process.

  • Stakeholders both inside and outside government should be fully involved in identifying priority datasets and designing related initiatives that can help to address key social or economic problems, foster entrepreneurship and create jobs. Government should support and facilitate the critical role of both private sector and public service intermediaries in making data useful.

5. Invest in capacity

Governments should start with initiatives and requirements that are appropriate to their own current capacity to create and release credible data, and that complement the current capacity of key stakeholders to analyze and reuse it. At the same time, in order to unlock the full social, political and economic benefits of open data, all stakeholders should invest in rapidly broadening and deepening capacity.

  • Governments and their development partners need to invest in making data simple to navigate and understand, available in all national languages, and accessible through appropriate channels such as mobile phone platforms where appropriate.

  • Governments and their development partners should support training for officials, SMEs and CSOs to tackle lack of data and web skills, and should make complementary investments in improving the quality and timeliness of government statistics.

6. Improve the quality of official data

Poor quality, coverage and timeliness of government information – including administrative and sectoral data, geospatial data, and survey data – is a major barrier to unlocking the full value of open data.

  • Governments should develop plans to implement the Paris21 2011 Busan Action Plan, which calls for increased resources for statistical and information systems, tackling important gaps and weaknesses (including the lack of gender disaggregation in key datasets), and fully integrating statistics into decision-making.

  • Governments should bring their statistical efforts into line with international data standards and schemas, to facilitate reuse and analysis across various jurisdictions.

  • Private firms and NGOs that collect data which could be used alongside government statistics to solve public problems in areas such as disease control, disaster relief, urban planning, etc. should enter into partnerships to make this data available to government agencies and the public without charge, in fully anonymized form and subject to robust privacy protections.

7. Foster more accountable, transparent and participatory governance

A data revolution cannot succeed in an environment of secrecy, fear and repression of dissent.

  • The SDGs should include robust commitments to uphold fundamental rights to freedom of expression, information and association; foster independent and diverse media; and implement robust safeguards for personal privacy, as outlined in the UN Covenant on Civil and Political Rights.

  • In addition, in line with their commitments in the UN Millennium Declaration (2000) and the Declaration of the Open Government Partnership (2011), the SDGs should include concrete steps to tackle gaps in participation, inclusion, integrity and transparency in governance, creating momentum and legitimacy for reform through public dialogue and consensus.


This submission derives and follows on from the Global Open Data Inititiave’s Global Open Data Declaration which was jointly created by Fundar, Open Institute, Open Knowledge and World Wide Web Foundation and the Sunlight Foundation with input from civil society organizations around the world.

The full text of the Declaration can be found here:

Announcing the Open Science Podcasts

Joris Pekel - May 31, 2012 in Open Science

Since the start of the Panton Principles, we have had several different Panton Discussions with different people talking about Open Data in Science.

Panton Discussions

A couple of them have been recorded on video. These recordings have now also been made available as podcasts which allow you to listen to them while travelling, working or just relaxing.

The Open Knowledge Foundation is involved in lots of interesting discussions. In the future, we will make sure to bring an audio or video recording device to as many of these discussions as possible to make them available for everybody. For now, please enjoy this wealth of interesting discussions and topics

All recordings are available under a CC-BY license.

For future podcasts, we are looking for people who are interested in helping with sound equalising, publishing and maybe even creating a true radio show out of it with a short introduction etc. If you are interested in this, please get in touch with joris.pekel [at]

Dig the New Breed: How open approaches can empower archaeologists- Part I

Rufus Pollock - June 10, 2010 in External, WG Archaeology

Very happy to post the first in an amazing series of OKFN guest blogs by Ant Beck, a member of the Open Archaeology working group. Ant discusses the DART project and the STAR project, both of which employed Linked Data in a heritage context. Later we’ll get into the ethics of open heritage, and a vision for the future of archaeological data.

The title “Dig the New Breed” is taken from the presentation I gave at the Open Knowledge Conference 2010. I did this for two reasons: It’s a terrible play on words (dig is employed as a synonym for “excavation” and “To like”) and I like name-checking “The Jam”. As this series of posts has taken form, it’s changed from being a piece about Open Science and Ethics into something about how disruptive technologies can be implemented to transform how the heritage sector operates.

STAR & STELLAR – Anyone for linked heritage data?


DARTProject Flickr page

I recently attended a STAR project workshop and saw a glimpse of the future. The Semantic Technologies for Archaeological Resources (STAR) project investigated “the potential of semantic terminology tools for widening and improving access to heritage resources, exploring the possibilities of combining a high level, core ontology with domain thesauri and natural language processing techniques”. The project has looked at extracting structured knowledge from “grey literature” using Natural Language Processing (NLP) tools – all very worthy and interesting but not something I’m directly excited by as “grey literature” is essentially tertiary data (an extraction of synthetic data derived from the primary record). In addition they have developed an RDF based approach to query data stored in heterogeneous excavation databases. WOW!

And in case you missed that…. querying data stored in heterogeneous excavation databases. Essentially they have resolved syntactic (platform/format), schematic (structural) and semantic (language) heterogeneities by generating mappings of key fields (i.e. a sub-set of the source data) to the English Heritage extension of the CIDOC Conceptual Reference Model (CRM), extracting the data as RDF and providing semantic interoperability through Knowledge Organization Systems (KOS) represented in SKOS format from standard heritage thesauri. In essence, they extract RDF from relational databases using hand crafted mappings to both SKOS and ontology articulating semantics and canonical concepts respectively.

The combination of RDF, ontology and SKOS have allowed the team to produce a demonstrator capable of cross searching different excavation databases with “difficult queries”. The team demonstrated that they could address questions such as, show me contexts that satisfy the following criteria:

  • Roman corn drying ovens with palaeobotanical analysis
  • Charred plant remains and charcoal from 4 post structures
  • Post holes that contain ritual deposits

Granted there are limitations: it currently supports a sub-set of the data collected during excavation and the RDF model is viewed as an interim tool with users going back to the source databases to conduct further analysis. However, the concept has been definitively demonstrated. Great stuff! The impact of this work is profound: the SKOS and ontology will allow inferencing/reasoning over the data which will transform the way the data can be re-used, analysed and generalised (more on this in a bit).

The Glamorgan team have a follow on project called Semantic Technologies Enhancing Links and Linked data for Archaeological Resources (STELLAR) funded by the AHRC. One of the aims of STELLAR is to develop “best practice guidelines and tools … both for mapping/extracting archaeological data as RDF and for generating archaeological Linked Data”. This will take the research developed in STAR and provide tools so that it can be deployed to mainstream archaeological data. I’m really looking forward to seeing the roll-out of this technology.

DART and Open Science


DART is an acronym for Detection of Archaeological Residues using remote sensing Techniques. DART is a three year Science and Heritage initiative funded by AHRC and EPSRC, led by the School of Computing at the University of Leeds. The project aims to improve the understanding of the physical, chemical, biological and environmental factors that determine whether an archaeological feature (pit, ditch, posthole etc.) can be detected by a sensor (camera, Ground Penetrating Radar, etc.). DART brings together consultants and researchers from the areas of computer vision, geophysics, remote sensing, knowledge engineering and soil science.

Archaeological sites and features are created by localized processes of formation and deformation. There are a range of imaging instruments that can be used to detect these archaeological residues, although, the knowledge required to determine what, when, how and why to use each different type of sensor is patchy. Seasonal, environmental and vegetation dynamics play a part, although the complexities of interaction and how they modify “contrast signatures” derived from the existing formation and deformation processes is uncertain.

This is important so I will provide an example: as a mud-brick built farmstead erodes, the silt, sand, clay, large clasts and organics in the mud-brick along with other anthropogenic debris are incorporated into the soil. This produces a localised variation in soil size and structure. This in turn impacts on drainage and localised crop stress and vigour. These localised variations can all provide measurable differences, or contrasts, that indicate the presence of archaeology.

For example, archaeological residues can affect drainage of the soil, which then affects the appearance of crops. Different drainage characteristics result in different soil moisture retention properties, and local variations in crop stress/vigour can be observed as differences in crop height or crop colour (essentially crop marks). Archaeological contrasts can be expressed through, for example, variations in chemistry, magnetic field, resistance, topography, temperature and spectral reflectance.

The DART project is trying to identify physical, chemical and biological contrast factors that may allow us to detect archaeological residues (both directly and by proxy) under different land-use and environmental conditions. We address the following research issues:

  • What are the factors that produce archaeological contrasts?
  • How do these contrast processes vary over space and time?
  • What processes cause these variations?
  • How can we best detect these contrasts (sensors and conditions)?

DART is committed to open science principles and aims to act as an exemplar for how data, tools, and analysis can be made available to the wider academic, heritage and general community. Data, software, algorithms and services developed throughout the project will be made available for re-use with appropriate open licences.

Licensing is an issue as license incompatibility can severely restrict re-use. Science Commons is establishing protocols in this area. Publicly accessible dissemination is preferred, however, where necessary domain specific or institutional repositories will be utilised for long-term preservation. Cameron Neylon is part of the project consortium and provides steer on these issues.

The whole point of taking an open science position on this project is so that we can maximise the benefit and impact. The research problem is large and complex: one project will not solve it. Inevitably the science will need refining; adequate articulation will require long term data collection under different conditions, followed by iterative hypothesis testing and modelling. The challenge is to get this information in the quickest, cheapest and easiest ways. An Open Science approach means that DART is openly collaborating with researchers and individuals throughout the world. The body of work developed within DART can be easily re-used by others: our results can be tested as the data and algorithms will be in the public domain, which means that they can be rapidly evaluated and easily re-used. Unlocking the “body of knowledge” and “know how” surrounding a programme of research should significantly reduce the barriers to re-use. This may generate a critical mass of surrounding research, which can only improve the underlying models and science. Providing scientists with the methodology of how to make the wheel will not only stop us reinventing it, but will also improve the manufacturing process.

Get Updates