Support Us

You are browsing the archive for Featured.

We need open carbon emissions data now!

May 13, 2013 in Access to Information, Campaigning, Featured, Featured Project, Open Data, Policy, WG Sustainability, Working Groups

Last week the average concentration of carbon dioxide in the atmosphere reached 400 parts per million, a level which is said to be unprecedented in human history.

Leading scientists and policy makers say that we should be aiming for no more than 350 parts per million to avoid catastrophic runaway climate change.

But what’s in a number? Why is the increase from 399 to 400 significant?

While the actual change is mainly symbolic (and some commentators have questioned whether we’re hovering above or just below 400), the real story is that we are badly failing to cut emissions fast enough.

Given the importance of this number, which represents humanity’s progress towards tackling one of the biggest challenges we currently face – the fact that it has been making the news around the world is very welcome indeed.

Why don’t we hear about the levels of carbon dioxide in the atmosphere from politicians or the press more often? While there are regularly headlines about inflation, interest and unemployment, numbers about carbon emissions rarely receive the level of attention that they deserve.

We want this to change. And we think that having more timely and more detailed information about carbon emissions is essential if we are to keep up pressure on the world’s governments and companies to make the cuts that the world needs.

As our Advisory Board member Hans Rosling puts it, carbon emissions should be on the world’s dashboard.

Over the coming months we are going to be planning and undertaking activities to advocate for the release of more timely and granular carbon emissions data. We are also going to be working with our global network to catalyse projects which use it to communicate the state of the world’s carbon emissions to the public.

If you’d like to join us, you can follow #OpenCO2 on Twitter or sign up to our open-sustainability mailing list:

Image credit: Match smoke by AMagill on Flickr. Released under Creative Commons Attribution license.

Global Community Stories #3

May 13, 2013 in Featured, OKF Australia, OKF Austria, OKF Belgium, OKF Brazil, OKF Greece, OKF Nepal, OKF Spain, OKF Switzerland, OKFN Local

 

 

Open Data Maker Vienna - April 2013

For your delectation, we bring you the third installment of Global Community Stories – a round up of the fantastic projects and activities of our Local Groups across the world, including a Wikipedia Editathon for girls in Nepal, a multitude of events in Belgium, Big Data Week across Spain, a Swiss Government pilot project, a multicultural open data event in Edinburgh, and a tiny town in Austria taking the lead in releasing data sets – the race is on!

Following the incredibly kind donation of OpenBelgium.be to our Open Knowledge community by Wunderkraut, OKF Belgium is preparing to take on maintenance of the site and grow the community that they began. They’ve been busy developing other collaborations too; a meet up with Random Hacks of Kindness is coming up June 1-2, as well as developing appsforgeo.be. Their impressive upcoming events include a fully booked master class on Open Culture data, a presentation at the Flemish government to civil servants, as well as Apps for Flanders on June 14, and a General Assembly in June too. They’ve been keeping an eye on the public sphere too, and are organising a debate on new business models to allow financial sustainability through art following a lawsuit by the Belgian copyright organisation Sabam against ISP for not wanting to cooperate on copyright tax on internet subscriptions.

In Austria, the OKF community is supporting the fight for a freedom of information act…

 Together with other civil society initiatives, the Austrian Chapter of OKFN is supporting this movement by organising a series of workshops for all stakeholders on the upcoming freedom of information law, reaching out to civil servants, citizens and politicans. They’ll be providing an opportunity for every stakeholder group to discuss and define their point of view, empowering change-makers across the sphere to broaden their influence, and they’ll be looking to develop the debate around freedom of information in a similar way to which the topic of open data was discussed some years ago.

 One little village in Austria deserves a special mention – Engerwitzdorf, a town of only 8000 inhabitants, has released 116 data sets – more than the entire federal government of Austria! They’ve been honoured for their work by being nominated for the Document Freedom Award by the Free Software Foundation Europe – congratulations! OKF Austria will joining in the celebrations through organising Engerwitzdorf’s first OKF MeetUp.

In Switzerland, government data is being made more accessible…

In Switzerland, the OKF Swiss Chapter has been developing a pilot project called Open Government Data at the Confederation – or, OGD@ Federation for short. Through the project, a group of government agencies will be attempting to bundle their data together via an open source platform, and they’ll be presenting this on May 22. We’ll keep you updated with how it goes, and for readers in Switzerland, you can register here.

OKF Spain has been expanding rapidly…

..having reached 149 members on their mailing list and recently having organised a successful Big Data Week in Madrid and Barcelona! It doesn’t sound like they’re sitting on their laurels though, as they have another three day event coming up in Barcelona, Madrid, Sevilla and Valladolid about data journalism which will include a hackathon, a barcamp and several workshops. They have an impressive line up of speakers too, including James Ball from the Guardian, Manuel Aristarán from the Knight Foundation, and OKF Central’s own Michael Bauer, so if you can, swing by!

They also undertook the invaluable task of translating into Spanish Laura’s blog post, “Open Knowledge: much more than Open Data” – which has now become “Conocimiento Abierto: Mucho más que Open Data.” This is a wonderful way of getting our message out to a whole new audience – thanks!

Laura’s post was also a hit with our OKF Greece Chapter, who kindly translated it into Greek. Translations of posts on the okfn.org into any language at all are very much welcome; if you do any translations, please do let us know so we can publicise it too, and we very much appreciate your efforts!

OKF Greece have also been busy organising an #OpenHealth event, and also took part in a Wikimedia workshop together with the Greek Wikipedia community. They recently completed the incredibly useful task of translating the Open Spending handbook into Greek, and you can now find the OKF Greece group on Facebook, too!

In Scotland, Germans and Brits came together…

Last week, the University of Edinburgh hosted the wonderfully multicultural event of German-British Open Data event. Scholarship holders from the Foundation of German Business came together for the weekend of talks, under the title “Open Data — Better Society?” and you can find a great round up of the talks and conclusions on the OKF Scotland blog.

OKF Nepal have been focusing on getting girls into ICT…

OKF Nepal recently teamed up with Wikipedia Nepal to organise a Wikipedia Editathon, which took place on the International Day of Girls in ICT. A truly great initiative, addressing a key issue facing the tech movement. OKFN Nepal’s Prakash Neupane also took to the stage to explain about the Open Knowledge Foundation’s mission, and from the photos it looks like all involved had a wonderful time. We look forward to hearing from the next event!

Congratulations all, for some incredible activities from across the globe!

(and keep an eye out for some exciting upcoming events- OKF Brazil are organising an event on Open Science at the beginning of June, and OKF Australia are organising a Beautiful Data GovHack at the end of May !)

Announcing CKAN 2.0

May 10, 2013 in CKAN, Featured, Featured Project, News, OKF Projects, Open Data, Open Government Data, Releases, Technical

CKAN is a powerful, open source, open data management platform, used by governments and organizations around the world to make large collections of data accessible, including the UK and US government open data portals.

Today we are very happy and excited to announce the final release of CKAN 2.0. This is the most significant piece of CKAN news since the project began, and represents months of hectic work by the team and other contributors since before the release of version 1.8 last October, and of the 2.0 beta in February. Thank you to the many CKAN users for your patience – we think you’ll agree it’s been worth the wait.

[Screenshot: Front page]

CKAN 2.0 is a significant improvement on 1.x versions for data users, programmers, and publishers. Enormous thanks are due to the many users, data publishers, and others in the data community, who have submitted comments, code contributions and bug reports, and helped to get CKAN to where it is. Thanks also to OKF clients who have supported bespoke work in various areas that has become part of the core code. These include data.gov, the US government open data portal, which will be re-launched using CKAN 2.0 in a few weeks. Let’s look at the main changes in version 2.0. If you are in a hurry to see it in action, head on over to demo.ckan.org, where you can try it out.

Summary

CKAN 2.0 introduces a new sleek default design, and easier theming to build custom sites. It has a completely redesigned authorisation system enabling different departments or bodies to control their own workflow. It has more built-in previews, and publishers can add custom previews for their favourite file types. News feeds and activity streams enable users to keep up with changes or new datasets in areas of interest. A new version of the API enables other applications to have full access to all the capabilities of CKAN. And there are many other smaller changes and bug fixes.

Design and theming

The first thing that previous CKAN users notice will be the greatly improved page design. For the first time, CKAN’s look and feel has been carefully designed from the ground up by experienced professionals in web and information design. This has affected not only the visual appearance but many aspects of the information architecture, from the ‘breadcrumb trail’ navigation on each page, to the appearance and position of buttons and links to make their function as transparent as possible.

[Screenshot: dataset page]

Under the surface, an even more radical change has affected how pages are themed in CKAN. Themes are implemented using templates, and the old templating system has been replaced with the newer and more flexible Jinja2. This makes it much easier for developers to theme their CKAN instance to fit in with the overall theme or branding of their web presence.

Authorisation and workflow: introducing CKAN ‘Organizations’

Another major change affects how users are authorised to create, publish and update datasets. In CKAN 1.x, authorisation was granted to individual users for each dataset. This could be augmented with a ‘publisher mode’ to provide group-level access to datasets. A greatly expanded version of this mode, called ‘Organizations’, is now the default system of authorisation in CKAN. This is much more in line with how most CKAN sites are actually used.

[Screenshot: Organizations page]

Organizations make it possible for individual departments, bodies, groups, etc, to publish their own data in CKAN, and to have control over their own publishing workflow. Different users can have different roles within an Organization, with different authorisations. Linked to this is the possibility for each dataset to have different statuses, reflecting their progress through the workflow, and to be public or private. In the default set-up, Organization user roles include Members (who can read the Organization’s private datsets), Editors (who can add, edit and publish datasets) and Admins (who can add and change roles for users).

More previews

In addition to the existing image previews and table, graph and map previews for spreadsheet data, CKAN 2.0 includes previews for PDF files (shown below), HTML (in an iframe), and JSON. Additionally there is a new plugin extension point that makes it possible to add custom previews for different data types, as described in this recent blog post.

[Screenshot: PDF preview]

News feeds and activity streams

CKAN 2.0 provides users with ways to see when new data or changes are made in areas that they are interested in. Users can ‘follow’ datasets, Organizations, or groups (curated collections of datasets). A user’s personalised dashboard includes a news feed showing activity from the followed items – new datasets, revised metadata and changes or additions to dataset resources. If there are entries in your news feed since you last read it, a small flag shows the number of new items, and you can opt to receive notifications of them via e-mail.

Each dataset, Organization etc also has an ‘activity stream’, enabling users to see a summary of its recent history.

[Screenshot: News feed]

Programming with CKAN: meet version 3 of the API

CKAN’s powerful application programming interface (API) makes it possible for other machines and programs to automatically read, search and update datasets. CKAN’s API was previously designed according to REST principles. RESTful APIs are deservedly popular as a way to expose a clean interface to certain views on a collection of data. However, for CKAN we felt it would be better to give applications full access to CKAN’s own internal machinery.

A new version of the API – version 3 – trialled in beta in CKAN 1.8, replaced the REST design with remote procedure calls, enabling applications or programmers to call the same procedures as CKAN’s own code uses to implement its user interface. Anything that is possible via the user interface, and a good deal more, is therefore possible through the API. This proved popular and stable, and so, with minor tweaks, it is now the recommended API. Old versions of the API will continue to be provided for backward compatibility.

Documentation, documentation, documentation

CKAN comes with installation and administration documentation which we try to keep complete and up-to-date. The major changes in the rest of CKAN have thus required a similarly concerted effort on the documentation. It’s great when we hear that others have implemented their own installation of CKAN, something that’s been increasing lately, and we hope to see even more of this. The docs have therefore been overhauled for 2.0. CKAN is a large and complex system to deploy and work on improving the docs continues: version 2.1 will be another step forward. Where people do run into problems, help remains available as usual on the community mailing lists.

… And more

There are many other minor changes and bug fixes in CKAN 2.0. For a full list, see the CKAN changelog.

Installing

To install your own CKAN, or to upgrade an existing installation, you can install it as a package on Ubuntu 12.04 or do a source installation. Full installation and configuration instructions are at docs.ckan.org.

Try it out

You can try out the main features at demo.ckan.org. Please let us know what you think!

Announcing the Open Humanities Award Winners

May 8, 2013 in Featured, Open GLAM, Open Humanities

awards-logo

Earlier this year, as part of the DM2E project, we put out a call to humanities academics and technologists to see if they could come up with innovative ideas for small technology projects that would further humanities research by using open content, open data and/or open source.

We’re very pleased to announce that the winners are Dr Bernhard Haslhofer (University of Vienna) and Dr Robyn Adams (Centre for Editing Lives and Letters, University College London). Both winners will receive financial support to help them undertake the work they proposed and will be blogging about the progress of their project. You can follow their progress via the DM2E blog.


Award 1: Semantic tagging for old maps… and other things

Screen Shot 2013-05-07 at 11.02.15

The first Award goes to Dr Bernhard Haslhofer of Vienna University. His project will involve building on an open source web application he has been working on called Maphub.

Dr Haslhofer told us a little bit about the inspiration for his project:

“People love old maps” is a statement that we heard a lot from curators in libraries. This combined with the assumption that many people also have knowledge to share or stories to tell about historical maps, was our motivation to build Maphub.

In essence Maphub is an open source Web application that, first of all, pulls out digitized historical maps from closed environments, adds zooming functionality, and assigns Web URIs so that people can talk about them online. It also supports two main use cases:

(i) georeferencing maps by linking points on the map to Geonames locations; (ii) commenting on maps or map regions by creating annotations. While users are entering their comments, Maphub analyzes the entered text on the fly and suggests so-called semantic tags, which the user accepts or rejects.

Semantic tags appear like “normal” tags on the user interface, but are in fact links to DBpedia resources. In that way, the user links her annotations and therefore also the underlying historical map with resources from two open data sources. Besides consuming open data during the annotation authoring process, Maphub also contributes collected knowledge back as open data by exposing all annotations following the W3C Open Annotation specification. In that way, Maphub supports people in a loop of using and producing open data in the context of historical maps.

Dr Haslhofer looks forward to seeing how collaborations will blossom between these various web annotation systems:

We believe that people also love other things on the Web and that Web annotation tools should support semantic tagging as well. Therefore, we will make it available as a plugin for Annotorious. Annotorious is a JavaScript image annotation library that can be used in any Website, and is also compatible with the Open Knowledge’s Foundations’s Annotator.

Annotorious and Maphub have common origins and the Open Humanities will support us in unifing parallel development streams into a single, reusable annotation tool that works for digitized maps but also for other media. We will also conduct another user study to inform the design of that function for other application contexts.


Award 2: Joined Up Early Modern Diplomacy: Linked Data from the Correspondence of Thomas Bodley

Thomas_Bodley

The second award goes to Dr Robyn Adams of the Centre for Editing Lives and Letters, University College London. The project will re-purpose the open resource that Dr Adams has been building with a team of others: the Diplomatic Correspondence of Thomas Bodley.

The project will use ‘additional’ information that was encoded into the digitisation of early modern letters that took place at the Centre for Editing Lives and Letters. In the initial incarnation of the project this data which included biographical and geographical information contained within letters was not used (although it was encoded).

Dr Adams told us a little bit about what she plans on doing with the money from the Awards:

With the prize funding from the Open Humanities Awards, we propose to mine the data that was generated but not fully used in the first phase of the project. This data is a rich source of biographical and geographical information, the visualization of which evokes the complex and diverse texture of the late sixteenth-century European diplomatic and military landscape. Bodley’s position in The Hague as the only English representative on the Dutch Council of State put him at the centre of a heterogeneous nexus of correspondents a time long before the Republic of Letters burgeoned in the subsequent century.

The project will interrogate three data fields within the larger data set of Bodley’s diplomatic correspondence in order to generate visualizations; the network of correspondents and recipients, and the people and places mentioned within the letters. These visualizations will be incorporated into the project website, where they will enhance and extend the knowledge derived from the existing corpus of correspondence. The visualizations, which will have scope to be playful while drawn from scrupulous scholarship, will offer an alternative pathway for scholars and the interested public to understand that in this period especially, the political, university and kinship networks were fundamental to advancement and prosperity.

“In mapping the relational activity between data sets,” Dr Adams went on, “I hope to further illuminate and reanimate Bodley’s position within the Elizabethan compass. Furthermore, I hope to demonstrate that fruitful routes of enquiry can result if scholars commit to going the extra mile to encode and record data in their research that may not have immediate relevance to their own studies.”


We offer our heartiest congratulations to the both Dr Haslhofer and Dr Adams both of whom will be presenting their work at the forthcoming Web as Literature conference at the British Library and this year’s OKCon in Geneva. Follow the progress of the Awards recipients via the DM2E project website.

OKCon 2013 Call for Proposals – out now!

May 7, 2013 in Events, Featured, Join us, OKCon, Talks

_MG_5387-web

  • Event. OKCon 2013 – 17th-18th September 2013, Geneva, Switzerland.
  • Call for Proposals. Find the call, FAQs and the submission form on the OKCon 2013 Call for Proposal webpage.
  • Deadline. The deadline to submit your proposals is May 24th, 23:59:59 GMT. Results will be published by 17th June, 23:59:59 GMT.
  • Tickets. Early Bird tickets are on sale until 23rd June!

 

Following the announcement of the dates for this year’s Open Knowledge Conference (OKCon), we have been asked by many people from our community how they can get involved. We are now glad and excited to give you the news: the Call for Proposals is launched today!

OKCon 2013 will be an intense 2-day event (taking place on 17th-18th September in Geneva, Switzerland). Its programme will be curated in part directly by the organisers, nominating Invited Speakers, and partly together with you – our community – thanks to your proposals.

We have identified six specific topics to discuss and explore on this year’s theme of Open Data – Broad, Deep, Connected which we hope will inspire and excite you as much as it does us:

  • Open Data, Government and Governance
  • Open Development and Sustainability
  • Open Science and Research
  • Open Culture
  • Technology, Tools and Business
  • Evidence and Stories

We have compiled a how-to guide, with FAQs and the submission form – please find them all on the OKCon 2013 Call for Proposal webpage. We are looking forward to your ideas!

The Call for Proposals starts today (7th May) and ends on 24th May, at 23:59:59 GMT. Read all about OKCon’s Call for Proposals and more on the conference website.

Open Knowledge: much more than open data

May 1, 2013 in Featured, Ideas and musings, Join us, OKF, Open Data, Our Work

Book, Ball and Chain

We’ve often used “open knowledge” simply as a broad term to cover any kind of open data or content from statistics to sonnets, and more. However, there is another deeper, and far more important, reason why we are the “Open Knowledge” Foundation and not, for example, the “Open Data” Foundation. It’s because knowledge is something much more than data.

Open knowledge is what open data becomes when it’s useful, usable and used. At the Open Knowledge Foundation we believe in open knowledge: not just that data is open and can be freely used, but that it is made useful – accessible, understandable, meaningful, and able to help someone solve a real problem. —Open knowledge should be empowering – it should be enabling citizens and organizations understand the world, create insight and effect positive change.

It’s because open knowledge is much more than just raw data that we work both to have raw data and information opened up (by advocating and campaigning) and also by making, creating the tools to turn that raw material into knowledge that people can act upon. For example, we build technical tools, open source software to help people work with data, and we create handbooks which help people acquire the skills they need to do so. This combination, that we are both evangelists and makers, is extremely powerful in helping us change the world.

Achieving our vision of a world transformed through open knowledge, a world where a vibrant open knowledge commons empowers citizens and enables fair and sustainable societies, is a big challenge. We firmly believe it can done, with a global network of amazing people and organisations fighting for openness and making tools and more to support the open knowledge ecosystem, although it’s going to take a while!

We at the Open Knowledge Foundation are committed to this vision of a global movement building an open knowledge ecosystem, and we are here for the long term. We’d love you to join us in improving the world through open knowledge; there will be many different ways you can help coming up during the months ahead, so get started now by keeping in touch – by signing up to receive our Newsletter, or finding a local group or meetup near you.

Welcoming Greece Local Group as Open Knowledge Foundation Chapter

April 29, 2013 in Featured, OKF, OKF Greece, OKFN Local

It’s with great excitement that we can announce that OKFN Greece, after 1.5 years as a Local Group in our global network, have established themselves as an official Chapter of the Open Knowledge Foundation. This means that our Greek friends are now through their own legal entity a more integral part of the organization.

The last year and a half has been fast-paced for the Local Group in Greece, and their progression towards becoming a Chapter is nothing less than exemplary.

Getting started by bringing people together

They started in 2011 by organizing several Meetups, including invited guests such as former OKF Community Manager Kat Braybrooke and Dr. Soren Auer, coordinator of the LOD2 Project and member of the OKFN advisory board, to get things started. On the side they also initiated collaborations with Creative Commons Hellas (via Marinos Papadopoulos) and the Wikimedia Greece Community (via Kostas Stampoulis).

Additionally, the group initiated various mini hack-days. A spending visualization hack-day was organized to coincide with a visit from the OKF’s Open Spending Project Coordinator Lucy Chambers, which led to the production of several interesting sets of visualization samples. Wikipedia in Medicine hack-day was held later in the Aristotle University of Thessaloniki Medical School to train and encourage medical scientists to contribute valuable and accurate open medical content to Wikipedia.

Connecting with stakeholders

As a means to connect with other networks, OKFN Greece has participated in a series of networking events across the country, including: Free and Open Source Software Communities Meeting (Serres, May 2012), Ignite Athens Show (Athens, October 2012) , e-Learning Expo (Athens, October 2012), Wikimedia Greece Community Conference (Athenks, April 2013), and co-organized #opnHealth (Thessaloniki, April 2013).

Developing projects in many fields

OKFN Greece has lately developed the Greek version of DBpedia Spotlight and also published the Greek versions of Wordnet and Wiktionary linked datasets. The DayLikeToday is a timeline visualization which presents what happens in a day like today from Wikipedia’s data via DBpedia.

Other projects include publishing a huge dataset containing the bibliographic information of the Veria public library as a linked open dataset, being part of the cloud diagram and particularly the Greek sub-cloud (http://open-data.okfn.gr/linked-data), based on the work of the group’s members – with all source code released under an open license on the OKFN Greece github.

Their latest work is the Greek open data hub, which was praised by the Vice-President of the European Commission, Neelie Kroes. Lastly, the translation of the Open Data Handbook (printed booklet funded by the mEducator project) was a great occasion for the group to join the linguistic linked data group. Subsequently the CKAN and the OpenSpending platform were also translated in Greek.

New local Working Groups

Most recently, as the group’s activities started to grow and become more complex, they took the decision to split up the workload into a few working groups, exactly as we do with the Working Groups of the main OKF organization. The aim of OKF Greece working groups is to provide a support mechanism, a space for reflection, and a space for the development and promotion of tools from different communities with common interests in open data and open knowledge throughout Greece. The working groups will remain closely involved in the international OKFN, sharing their ideas with the main OKF Working Groups.

Moving towards a bright future

OKFN Greece wants to play a central role in the open knowledge landscape of the future – in Greece and beyond. As an official Chapter of Open Knowledge Foundation they now have a much better and firmer foundation on which they can better participate in local decision-making processes together with the Greek authorities and the state of Greece. All in all the future looks bright – congrats and good work, OKFN Greece!

What Do We Mean By Small Data

April 26, 2013 in Featured, Ideas and musings, Labs, Open Data, Small Data

Earlier this week we published the first in a series of posts on small data: “Forget Big Data, Small Data is the Real Revolution”. In this second in the series, we discuss small data in more detail providing a rough definition and drawing parallels with the history of computers and software.

What do we mean by “small data”? Let’s define it crudely as:

“Small data is the amount of data you can conveniently store and process on a single machine, and in particular, a high-end laptop or server”

Why a laptop? What’s interesting (and new) right now is the democratisation of data and the associated possibility of large-scale distributed community of data wranglers working collaboratively. What matters here then is, crudely, the amount of data that an average data geek can handle on their own machine, their own laptop.

A key point is that the dramatic advances in computing, storage and bandwidth have far bigger implications for “small data” than for “big data”. The recent advances have increased the realm of small data, the kind of data that an individual can handle on their own hardware, far more relatively than they have increased the realm of “big data”. Suddenly working with significant datasets – datasets containing tens of thousands, hundreds of thousands or millions of rows can be a mass-participation activity.

(As should be clear from the above definition – and any recent history of computing – small (and big) are relative terms that change as technology advances – for example, in 1994 a terabyte of storage cost several hundred thousand dollars, today its under a hundred. This also means today’s big is tomorrow’s small).

Our situation today is similar to microcomputers in the late 70s and early 80s or the Internet in the 90s. When microcomputers first arrived, they seemed puny in comparison to the “big” computing and “big” software then around and there was nothing strictly they could do that existing computing could not. However, they were revolutionary in one fundamental way: they made computing a mass-participation activity. Similarly, the Internet was not new in the 1990s – it had been around in various forms for several decades – but it was at that point it became available at a mass-scale to the average developer (and ultimately citizen). In both cases “big” kept on advancing too – be it supercomputers or the high-end connectivity – but the revolution came from “small”.

This (small) data revolution is just beginning. The tools and infrastructure to enable effective collaboration and rapid scaling for small data are in their infancy, and the communities with the capacities and skills to use small data are in their early stages. Want to get involved in the small data forward revolution — sign up now

This is the second in a series of posts about the power of Small Data – follow the Open Knowledge Foundation blog, Twitter or Facebook to learn more and join the debate at #SmallData on Twitter.

Just 5 days to go for The Public Domain Review Fundraiser!

April 25, 2013 in Featured, Public Domain, Public Domain Review

The Public Domain Review Fundraiser ends on Wednesday 1st May, just 5 days away!

Since we launched the fundraising campaign 7 weeks ago we’ve seen a fantastic response which has got us so far to an amazing 98% of our target… very very nearly there. We are making a final push in these remaining days to make these last few hundred dollars, and we hope maybe also make a substantial leap past our goal!

If you haven’t donated yet but you’d like to be part of the amazing drive we are seeing to keep the project alive, then wait no longer! The time has come.

To learn more about the campaign and make your donation visit:

http://publicdomainreview.org/support/

And remember that if you donate $40 or more you’ll have the opportunity to be receive our beautiful Public Domain Review Tote Bag!

Please also continue to spread the word as much as you can!


Frictionless Data: making it radically easier to get stuff done with data

April 24, 2013 in Featured, Ideas and musings, Labs, Open Data, Open Standards, Small Data, Technical

Frictionless Data is now in alpha at http://data.okfn.org/ – and we’d like you to get involved.

Our mission is to make it radically easier to make data used and useful – our immediate goal is make it as simple as possible to get the data you want into the tool of your choice.

This isn’t about building a big datastore or a data management system – it’s simply saving people from repeating all the same tasks of discovering a dataset, getting it into a format they can use, cleaning it up – all before they can do anything useful with it! If you’ve ever spent the first half of a hackday just tidying up tabular data and getting it ready to use, Frictionless Data is for you.

Our work is based on a few key principles:

  • Narrow focus — improve one small part of the data chain, standards and tools are limited in scope and size
  • Build for the web – use formats that are web “native” (JSON) and work naturally with HTTP (plain-text, CSV is streamable etc)
  • Distributed not centralised — designed for a distributed ecosystem (no centralized, single point of failure or dependence)
  • Work with existing tools — don’t expect people to come to you, make this work with their tools and their workflows (almost everyone in the world can open a CSV file, every language can handle CSV and JSON)
  • Simplicity (but sufficiency) — use the simplest formats possible and do the minimum in terms of metadata but be sufficient in terms of schemas and structure for tools to be effective

We believe that making it easy to get and use data and especially open data is central to creating a more connected digital data ecosystem and accelerating the creation of social and commercial value. This project is about reducing friction in getting, using and connecting data, making it radically easier to get data you need into the tool of your choice. Frictionless Data distills much of our learning over the last 7 years into some specific standards and infrastructure.

What’s the Problem?

Today, when you decide to cook, the ingredients are readily available at local supermarkets or even already in your kitchen. You don’t need to travel to a farm, collect eggs, mill the corn, cure the bacon etc – as you once would have done! Instead, thanks to standard systems of measurement, packaging, shipping (e.g. containerization) and payment, ingredients can get from the farm direct to my local shop or even my door.

But with data we’re still largely stuck at this early stage: every time you want to do an analysis or build an app you have to set off around the internet to dig up data, extract it, clean it and prepare it before you can even get it into your tool and begin your work proper.

What do we need to do for the working with data to be like cooking today – where you get to spend your time making the cake (creating insights) not preparing and collecting the ingredients (digging up and cleaning data)?

The answer: radical improvements in the “logistics” of data associated with specialisation and standardisation. In analogy with food we need standard systems of “measurement”, packaging, and transport so that its easy to get data from its original source into the application where you can start working with it.

Frictionless DAta idea

What’s Frictionless Data going to do?

We start with an advantage: unlike for physical goods transporting digital information from one computer to another is very cheap! This means the focus can be on standardizing and simplifying the process of getting data from one application to another (or one form to another). We propose work in 3 related areas:

  • Key simple standards. For example, a standardized “packaging” of data that makes it easy to transport and use (think of the “containerization” revolution in shipping)
  • Simple tooling and integration – you should be able to get data in these standard formats into or out of Excel, R, Hadoop or whatever tool you use
  • Bootstrapping the system with essential data – we need to get the ball rolling

frictionless data components diagram

What’s Frictionless Data today?

1. Data

We have some exemplar datasets which are useful for a lot of people – these are:

  • High Quality & Reliable

    • We have sourced, normalized and quality checked a set of key reference datasets such as country codes, currencies, GDP and population.
  • Standard Form & Bulk Access

    • All the datasets are provided in a standardized form and can be accessed in bulk as CSV together with a simple JSON schema.
  • Versioned & Packaged

    • All data is in data packages and is versioned using git so all changes are visible and data can becollaboratively maintained.

2. Standards

We have two simple data package formats, described as ultra-lightweight, RFC-style specifications. They build heavily on prior work. Simplicity and practicality were guiding design criteria.

Frictionless Data: package standard diagram

Data package: minimal wrapping, agnostic about the data its “packaging”, designed for extension. This flexibility is good as it can be used as a transport for pretty much any kind of data but it also limits integration and tooling. Read the full Data Package specification.

Simple data format (SDF): focuses on tabular data only and extends data package (data in simple data format is a data package) by requiring data to be “good” CSVs and the provision of a simple JSON-based schema to describe them (“JSON Table Schema”). Read the full Simple Data Format specification.

3. Tools

It’s early days for Frictionless Data, so we’re still working on this bit! But there’s a need for validators, schema generators, and all kinds of integration. You can help out – see below for details or check out the issues on github.

Doesn’t this already exist?

People have been working on data for a while – doesn’t something like this already exist? The crude answer is yes and no. People, including folks here at the Open Knowledge Foundation, have been working on this for quite some time, and there are already some parts of the solution out there. Furthermore, much of these ideas are directly borrowed from similar work in software. For example, the Data Packages spec (first version in 2007!) builds heavily on packaging projects and specifications like Debian and CommonJS.

Key distinguishing features of Frictionless Data:

  • Ultra-simplicity – we want to keep things as simple as they possibly can be. This includes formats (JSON and CSV) and a focus on end-user tool integration, so people can just get the data they want into the tool they want and move on to the real task
  • Web orientation – we want an approach that fits naturally with the web
  • Focus on integration with existing tools
  • Distributed and not tied to a given tool or project – this is not about creating a central data marketplace or similar setup. It’s about creating a basic framework that would enable anyone to publish and use datasets more easily and without going through a central broker.

Many of these are shared with (and derive from) other approaches but as a whole we believe this provides an especially powerful setup.

Get Involved

This is a community-run project coordinated by the Open Knowledge Foundation as part of Open Knowledge Foundation Labs. Please get involved:

  • Spread the word! Frictionless Data is a key part of the real data revolution – follow the debate on #SmallData and share our posts so more people can get involved

Please create an account to get started.

Sign up to the Open Knowledge Newsletter

Get Updates