Support Us

You are browsing the archive for CKAN.

Take a CKAN Tour

Heather Leson - May 1, 2014 in CKAN, Events, OKF Projects

From baby name datasets and apps via the South Australian government to new City of Surrey, B.C., (Canada) site, there are many instances of CKAN around the world. CKAN is the data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data. It is used by various levels of governments, civil societies and organization to make their data transparent and available.

In this 1-hour video hangout Irina Bolychevsky, Services Director gives us an overview of CKAN with live demo’s of several CKAN sites including data.gov.uk, publicdata.eu and data.glasgow.gov.uk. She also answered community questions.

ckan-logo
Get Involved

CKAN has a wide community of contributors working to remix and extend the software. Two examples of code that folks have contributed includes Ckanext-spatial and ckanext-realtime (github links).

The CKAN core committers host regular online developer meetings. These are every Tuesday and Thursday 13:00 – 14: 00 EDT reviewing pull requests and discussing architecture. We meet up on ckan developer mailing list, being on the #ckan irc channel in freenode (to the the google hangout link for meetings!) and commenting on github tickets. All welcome.

Community questions tend to be asked on StackOverflow using the CKAN tag on Stack Overflow. You can also file issues/contribute code on github.

Contact us

If you want to talk about CKAN development, please come and say hi on the ckan-dev mailing list or the #ckan IRC channel on irc.freenode.org. If you have service inquiries, you can reach out to the team: services at ckan dot org

Upcoming Community Sessions: CKAN, Community Feedback

Heather Leson - April 28, 2014 in CKAN, Events, Network, Open Knowledge Foundation Local Groups, Our Work, Working Groups

Happy week! We are hosting two Community Sessions this week. You have expressed an interest in learning more about CKAN. As well, We are continuing our regular Community Feedback sessions.

Boy and the world image

Take a CKAN Tour:

This week we will give an overview and tour of CKAN – the leading open source open data platform used by the national governments of the US, UK, Brazil, Canada, Australia, France, Germany, Austria and many more. This session will cover why data portals are useful, what they provide and showcase examples and best practices from CKAN’s varied user base! Bring your questions on how to get started and best practices.

Guest: Irina Bolychevsky, Services Director (Open Knowledge) Questions are welcome via G+ or Twitter.

  • Date: Wednesday, April 30, 2014
  • Time: 7:30 PT /10:30 ET /14:30 UTC /15:30 BST/16:30 CEST
  • Duration: 1 hour
  • Register and Join via G+ (The Hangout will be recorded.)
Community Feedback Session

We promised to schedule another Community Feedback Session. It is hard to find a common time for folks. We will work on timeshifting these for next sessions. This is a chance to ask questions, give input and help shape Open Knowledge.

Please join Laura, Naomi and I for the next Community Feedback Session. Bring your ideas and questions.

  • Date: Wednesday, April 30, 2014
  • Time:9:00 PT/12:00EDT/16:00 UTC /17:00 BST/18:00 CEST
  • Duration:1 hour
  • Join via Meeting Burner

We will use Meeting Burner and IRC. (Note: We will record both of these.)

How to join meeting Burner: Audio instructions Option 1 Dial-in to the following conference line: Number 1- (949) 229 – 4400 # Option 2 You may join the conference bridge with your computer’s microphone/speakers or headset

How to join IRC: http://wiki.okfn.org/How_to_use_IRC/_Clients_and_Tips

More about the new Open Knowledge Brand

Host a Community Session in May

We are booking Community Sessions for May. These Open Knowledge online events can be in a number of forms: a scheduled IRC chat, a community google hangout, a technical sprint or an editathon. The goal is to connect the community to learn and share their stories and skills. If you would like to suggest a session or host one, please contact heather dot leson at okfn dot org.

More details about Community Sessions

(Photo: Heather Leson (San Francisco))

Building an archaeological project repository II: Where are the research data repositories?

Guest - April 17, 2014 in CKAN, Open Science, WG Archaeology

This is a guest post by Anthony Beck, Honorary fellow, and Dave Harrison, Research fellow, at the University of Leeds School of Computing

DART_UML_DART_2011_2013_RAW

Data repository as research tool

In a previous post, we examined why Open Science is necessary to take advantage of the huge corpus of data generated by modern science. In our project Detection of Archaeological residues using Remote sensing Techniques, or DART, we adopted Open Science principles and made all the project’s extensive data available through a purpose-built data repository built on the open-source CKAN platform. But with so many academic repositories, why did we need to roll our own? A final post will look at how the portal was implemented.

DART: data-driven archaeology

DART’s overall aim is to develop analytical methods to differentiate archaeological sediments from non-archaeological strata, on the basis of remotely detected phenomena (e.g. resistivity, apparent dielectric permittivity, crop growth, thermal properties etc). DART is a data rich project: over a 14 month period, in-situ soil moisture, soil temperature and weather data were collected at least once an hour; ground based geophysical surveys and spectro-radiometry transects were conducted at least monthly; aerial surveys collecting hyperspectral, LiDAR and traditional oblique and vertical photographs were taken throughout the year, and laboratory analyses and tests were conducted on both soil and plant samples. The data archive itself is in the order of terabytes.

Analysis of this archive is ongoing; meanwhile, this data and other resources are made available through open access mechanisms under liberal licences and are thus accessible to a wide audience. To achieve this we used the open-source CKAN platform to build a data repository, DARTPortal, which includes a publicly queryable spatio-temporal database (on the same host), and can support access to individual data as well as mining or analysis of integrated data.

This means we can share the data analysis and transformation processes and demonstrate how we transform data into information and synthesise this information into knowledge (see, for example, this Ipython notebook which dynamically exploits the database connection). This is the essence of Open Science: exposing the data and processes that allow others to replicate and more effectively build on our science.

Lack of existing infrastructure

Pleased though we are with our data repository, it would have been nice not to have to build it! Individual research projects should not bear the burden of implementing their own data repository framework. This is much better suited to local or national institutions where the economies of scale come into their own. Yet in 2010 the provision of research data infrastructure that supported what DART did was either non-existent or poorly advertised. Where individual universities provided institutional repositories, these were focused on publications (the currency of prestige and career advancement) and not on data. Irrespective of other environments, none of the DART collaborating partners provided such a data infrastructure.

Data sharing sites like Figshare did not exist – and when it did exist the size of our hyperspectral data, in particular, was quite rightly a worry. This situation is slowly changing, but it is still far from ideal. The positions taken by Research Councils UK and the Engineering and Physical Science Research Council (EPSRC) on improving access to data are key catalysts for change. The EPSRC statement is particularly succinct:

Two of the principles are of particular importance: firstly, that publicly funded research data should generally be made as widely and freely available as possible in a timely and responsible manner; and, secondly, that the research process should not be damaged by the inappropriate release of such data.

This has produced a simple economic issue – if research institutions can not demonstrate that they can manage research data in the manner required by the funding councils then they will become ineligible to receive grant funding from that council. The impact is that the majority of universities are now developing their own, or collaborating on communal, data repositories.

But what about formal data deposition environments?

DART was generously funded through the Science and Heritage Programme supported by the UK Arts and Humanities Research Council (AHRC) and the EPSRC. This means that these research councils will pay for data archiving in the appropriate domain repository, in this case the Archaeology Data Service (ADS). So why produce our own repository?

Deposition to the ADS would only have occurred after the project had finished. With DART, the emphasis has been on re-use and collaboration rather than primarily on archiving. These goals are not mutually exclusive: the methods adopted by DART mean that we produced data that is directly suitable for archiving (well documented ASCII formats, rich supporting description and discovery metadata, etc) whilst also allowing more rapid exposure and access to the ‘full’ archive. This resulted in DART generating much richer resource discovery and description metadata than would have been the case if the data was simply deposited into the ADS.

The point of the DART repository was to produce an environment which would facilitate good data management practice and collaboration during the lifetime of the project. This is representative of a crucial shift in thinking, where projects and data collectors consider re-use, discovery, licences and metadata at a much earlier stage in the project life cycle: in effect, to create dynamic and accessible repositories that have impact across the broad stakeholder community rather than focussing solely on the academic community. The same underpinning philosophy of encouraging re-use is seen at both FigShare and DataHub. Whilst formal archiving of data is to be encouraged, if it is not re-useable, or more importantly easily re-useable, within orchestrated scientific workflow frameworks then what is the point.

In addition, it is unlikely that the ADS will take the full DART archive. It has been said that archaeological archives can produce lots of extraneous or redundant ‘stuff’. This can be exacerbated by the unfettered use of digital technologies – how many digital images are really required for the same trench? Whilst we have sympathy with this argument, there is a difference between ‘data’ and ‘pretty pictures’: as data analysts, we consider that a digital photograph is normally a data resource and rarely a pretty picture. Hence, every image has value.

This is compounded when advances in technology mean that new data can be extracted from ‘redundant’ resources. For example, Structure from Motion (SfM) is a Computer Vision technique that extracts 3D information from 2D objects. From a series of overlapping photographs, SfM techniques can be used to extract 3D point clouds and generate orthophotographs from which accurate measurements can be taken. In the case of SfM there is no such thing as redundancy, as each image becomes part of a ‘bundle’ and the statistical characteristics of the bundle determine the accuracy of the resultant model. However, one does need to be pragmatic, and it is currently impractical for organisations like the ADS to accept unconstrained archives. That said, it is an area that needs review: if a research object is important enough to have detailed metadata created about it, then it should be important enough to be archived.

For DART, this means that the ADS is hosting a subset of the archive in long-term re-use formats, which will be available in perpetuity (which formally equates to a maximum of 25 years), while the DART repository will hold the full archive in long term re-use formats until we run out of server money. We are are in discussion with Leeds University to migrate all the data objects over to the new institutional repository with sparkling new DOIs and we can transfer the metadata held in CKAN over to Open Knowledge’s public repository, the dataHub. In theory nothing should be lost.

How long is forever?

The point on perpetuity is interesting. Collins Dictionary defines perpetuity as ‘eternity’. However, the ADS defines ‘digital’ perpetuity as 25 years. This raises the question: is it more effective in the long term to deposit in ‘formal’ environments (with an intrinsic focus on preservation format over re-use), or in ‘informal’ environments (with a focus on re-use and engagement over preservation (Flickr, Wikimedia Commons, DART repository based on CKAN, etc)? Both Flickr and Wikimedia Commons have been around for over a decade. Distributed peer to peer sharing, as used in Git, produces more robust and resilient environments which are equally suited to longer term preservation. Whilst the authors appreciate that the situation is much more nuanced, particularly with the introduction of platforms that facilitate collaborative workflow development, this does have an impact on long-term deployment.

Choosing our licences

Licences are fundamental to the successful re-use of content. Licences describe who can use a resource, what they can do with this resource and how they should reference any resource (if at all).

Two lead organisations have developed legal frameworks for content licensing, Creative Commons (CC) and Open Data Commons (ODC). Until the release of CC version 4, published in November 2013, the CC licence did not cover data. Between them, CC and ODC licences can cover all forms of digital work.

At the top level the licences are permissive public domain licences (CC0 and PDDL respectively) that impose no restrictions on the licensees use of the resource. ‘Anything goes’ in a public domain licence: the licensee can take the resource and adapt it, translate it, transform it, improve upon it (or not!), package it, market it, sell it, etc. Constraints can be added to the top level licence by employing the following clauses:

  • BY – By attribution: the licensee must attribute the source.
  • SA – Share-alike: if the licensee adapts the resource, they must release the adapted resource under the same licence.
  • NC – Non commercial: the licensee must not use the work within a commercial activity without prior approval. Interestingly, in many area of the world, the use of material in university lectures may be considered a commercial activity. The non-commercial restriction about the nature of the activity, not the legal status of the institution doing the work.
  • ND – No derivatives: the licensee can not derive new content from the resource.

Each of these clauses decreases the ‘open-ness’ of the resource. In fact, the NC and ND clause are not intrinsically open (they restrict both who can use and what you can do with the resource). These restrictive clauses have the potential to produce license incompatibilities which may introduce profound problems in the medium to long term. This is particularly relevant to the SA clause. Share-alike means that any derived output must be licensed under the same conditions as the source content. If content is combined (or mashed up) – which is essential when one is building up a corpus of heritage resources – then content created under a SA clause can not be combined with content that includes a restrictive clause (BY, NC or ND) that is not in the source licence. This licence incompatibility has a significant impact on the nature of the data commons. It has the potential to fragment the data landscape creating pockets of knowledge which are rarely used in mainstream analysis, research or policy making. This will be further exacerbated when automated data aggregation and analysis systems become the norm. A permissive licence without clauses like Non-commercial, Share-alike or No-derivatives removes such licence and downstream re-user fragmentation issues.

For completeness, specific licences have been created for Open Government Data. The UK Government Data Licence for public sector information is essentially an open licence with a BY attribution clause.

At DART we have followed the guidelines of The Open Data Institute and separated out creative content (illustrations, text, etc.) from data content. Hence, the DART content is either CC-BY or ODC-BY respectively. In the future we believe it would be useful to drop the BY (attribution) clause. This would stop attribute stacking (if the resource you are using is a derivative of a derivative of a derivative of a ….. (you get the picture), at what stage do you stop attribution) and anything which requires bureaucracy, such as attributing an image in a powerpoint presentation, inhibits re-use (one should always assume that people are intrinsically lazy). There is a post advocating ccZero+ by Dan Cohen. However, impact tracking may mean that the BY clause becomes a default for academic deposition.

The ADS uses a more restrictive bespoke default licence which does not map to national or international licence schemes (they also don’t recognise non CC licences). Resources under this licence can only be used for teaching, learning, and research purposes. Of particular concern is their use of the NC clause and possible use of the ND clause (depending on how you interpret the licence). Interestingly, policy changes mean that the use of data under the bespoke ADS licence becomes problematic if university teaching activities are determined to be commercial. It is arguable that the payment of tuition fees represents a commercial activity. If this is true then resources released under the ADS licence can not be used within university teaching which is part of a commercial activity. Hence, the policy change in student tuition and university funding has an impact on the commercial nature of university teaching which has a subsequent impact on what data or resources universities are licensed to use. Whilst it may never have been the intention of the ADS to produce a licence with this potential paradox, it is a problem when bespoke licences are developed, even if they were originally perceived to be relatively permissive licences. To remove this ambiguity it is recommended that submissions to the ADS are provided under a CC licence which renders the bespoke ADS licence void.

In the case of DART, these licence variations with the ADS should not be a problem. Our licences are permissive (by attribution is the only clause we have included). This means the ADS can do anything they want with our resources as long as they cite the source. In our case this would be the individual resource objects or collections on the DART portal. This is a good thing, as the metadata on the DART portal is much richer than the metadata held by the ADS.

Concerns about opening up data, and responses which have proved effective

Christopher Gutteridge (University of Southampton) and Alexander Dutton (University of Oxford) have collated a Google doc entitled ‘Concerns about opening up data, and responses which have proved effective‘. This document describes a number of concerns commonly raised by academic colleagues about increasing access to data. For DART two issues became problematic that were not covered by this document:

  • The relationship between open data and research novelty and the impact this may have on a PhD submission.
  • Journal publication – specifically that a journal won’t publish a research paper if the underlying data is open.

The former point is interesting – does the process of undertaking open science, or at least providing open data, undermine the novelty of the resultant scientific process? With open science it could be difficult to directly attribute the contribution, or novelty, of a single PhD student to an openly collaborative research process. However, that said, if online versioning tools like Git are used, then it is clear who has contributed what to a piece of code or a workflow (the benefits of the BY clause). This argument is less solid when we are talking solely about open data. Whilst it is true that other researchers (or anybody else for that matter) have access to the data, it is highly unlikely that multiple researchers will use the same data to answer exactly the same question. If they do ask the same question (and making the optimistic assumption that they reach the same conclusion), it is still highly unlikely that they will have done so by the same methods; and even if they do, their implementations will be different. If multiple methods using the same source data reach the same conclusion then there is an increased likelihood that the conclusion is correct and that the science is even more certain. The underlying point here is that 21st-century scientific practice will substantially benefit from people showing their working. Exposure of the actual process of scientific enquiry (the algorithms, code, etc.) will make the steps between data collection and publication more transparent, reproduceable and peer-reviewable – or, quite simply, more scientific. Hence, we would argue that open data and research novelty is only a problem if plagiarism is a problem.

The journal publication point is equally interesting. Publications are the primary metric for academic career progression and kudos. In this instance it was the policy of the ‘leading journal in this field’ that they would not publish a paper from a dataset that was already published. No credible reasons were provided for this clause – which seems draconian in the extreme. It does indicate that no one size fits all approach will work in the academic landscape. It will also be interesting to see how this journal, which publishes work which is mainly funded by EPSRC, responds to the EPSRC guidelines on open data.

This is also a clear demonstration that the academic community needs to develop new metrics that are more suited to 21st century research and scholarship by directly link academic career progression to other source of impact that go beyond publications. Furthermore, academia needs some high-profile exemplars that demonstrate clearly how to deal with such change. The policy shift and ongoing debate concerning ‘Open access’ publications in the UK is changing the relationship between funders, universities, researchers, journals and the public – a similar debate needs to occur about open data and open science.

The altmetrics community is developing new metrics for “analyzing, and informing scholarship” and have described their ethos in their manifesto. The Research Councils and Governments have taken a much greater interest in the impact of publically funded research. Importantly public, social and industry impact are as important as academic impact. It is incumbent on universities to respond to this by directly linking academic career progression through to impact and by encouraging improved access to the underlying data and procesing outputs of the research process through data repositories and workflow environments.

Skillshares and Stories: Upcoming Community Sessions

Heather Leson - April 3, 2014 in CKAN, Events, Network, OKF Brazil, OKF Projects, Open Access, Open Knowledge Foundation Local Groups, School of Data

We’re excited to share with you a few upcoming Community Sessions from the School of Data, CKAN, Open Knowledge Brazil, and Open Access. As we mentioned earlier this week, we aim to connect you to each other. Join us for the following events!

What is a Community Session: These online events can be in a number of forms: a scheduled IRC chat, a community google hangout, a technical sprint or hackpad editathon. The goal is to connect the community to learn and share their stories and skills.

We held our first Community Session yesterday. (see our Wiki Community Session notes) The remaining April events will be online via G+. These sessions will be a public Hangout to Air. The video will be available on the Open Knowledge Youtube Channel after the event. Questions are welcome via Twitter and G+.

All these sessions are Wednesdays at 10:30 – 11:30 am ET/ 14:30 – 15:30 UTC.

Mapping with Ketty and Ali: a School of Data Skillshare (April 9, 2014)

Making a basic map from spreadsheet data: We’ll explore tools like QGIS (a free and Open-source Geographic Information System), Tilemill (a tool to design beautiful interactive web maps) Our guest trainers are Ketty Adoch and Ali Rebaie.

To join the Mapping with Ketty and Ali Session on April 9, 2014

Q & A with Open Knowledge Brazil Chapter featuring Everton(Tom) Zanella Alvarenga (April 16, 2014)

Around the world, local groups, Chapters, projects, working groups and individuals connect to Open Knowledge. We want to share your stories.

In this Community Session, we will feature Everton (Tom) Zanella Alvarenga, Executive Director.

Open Knowledge Foundation Brazil is a newish Chapter. Tom will share his experiences growing a chapter and community in Brazil. We aim to connect you to community members around the world. We will also open up the conversation to all things Community. Share your best practices

Join us on April 16, 2014 via G+

Take a CKAN Tour (April 23, 2014)

This week we will give an overview and tour of CKAN – the leading open source open data platform used by the national governments of the US, UK, Brazil, Canada, Australia, France, Germany, Austria and many more. This session will cover why data portals are useful, what they provide and showcase examples and best practices from CKAN’s varied user base! Our special guest is Irina Bolychevsky, Services Director (Open Knowledge Foundation).

Learn and share your CKAN stories on April 23, 2014

(Note: We will share more details about the April 30th Open Access session soon!)

Resources

Building an archaeological project repository I: Open Science means Open Data

Guest - February 24, 2014 in CKAN, Open Science, WG Archaeology

This is a guest post by Anthony Beck, Honorary fellow, and Dave Harrison, Research fellow, at the University of Leeds School of Computing.

In 2010 we authored a series of blog posts for the Open Knowledge Foundation subtitled ‘How open approaches can empower archaeologists’. These discussed the DART project, which is on the cusp of concluding.

The DART project collected large amounts of data, and as part of the project, we created a purpose-built data repository to catalogue this and make it available, using CKAN, the Open Knowledge Foundation’s open-source data catalogue and repository. Here we revisit the need for Open Science in the light of the DART project. In a subsequent post we’ll look at why, with so many repositories of different kinds, we felt that to do Open Science successfully we needed to roll our own.

Open data can change science

Open inquiry is at the heart of the scientific enterprise. Publication of scientific theories – and of the experimental and observational data on which they are based – permits others to identify errors, to support, reject or refine theories and to reuse data for further understanding and knowledge. Science’s powerful capacity for self-correction comes from this openness to scrutiny and challenge. (The Royal Society, Science as an open enterprise, 2012)

The Royal Society’s report Science as an open enterprise identifies how 21st century communication technologies are changing the ways in which scientists conduct, and society engages with, science. The report recognises that ‘open’ enquiry is pivotal for the success of science, both in research and in society. This goes beyond open access to publications (Open Access), to include access to data and other research outputs (Open Data), and the process by which data is turned into knowledge (Open Science).

The underlying rationale of Open Data is this: unfettered access to large amounts of ‘raw’ data enables patterns of re-use and knowledge creation that were previously impossible. The creation of a rich, openly accessible corpus of data introduces a range of data-mining and visualisation challenges, which require multi-disciplinary collaboration across domains (within and outside academia) if their potential is to be realised. An important step towards this is creating frameworks which allow data to be effectively accessed and re-used. The prize for succeeding is improved knowledge-led policy and practice that transforms communities, practitioners, science and society.

The need for such frameworks will be most acute in disciplines with large amounts of data, a range of approaches to analysing the data, and broad cross-disciplinary links – so it was inevitable that they would prove important for our project, Detection of Archaeological residues using Remote sensing Techniques (DART).

DART: data-driven archaeology

DART aimed is to develop analytical methods to differentiate archaeological sediments from non-archaeological strata, on the basis of remotely detected phenomena (e.g. resistivity, apparent dielectric permittivity, crop growth, thermal properties etc). The data collected by DART is of relevance to a broad range of different communities. Open Science was adopted with two aims:

  • to maximise the research impact by placing the project data and the processing algorithms into the public sphere;
  • to build a community of researchers and other end-users around the data so that collaboration, and by extension research value, can be enhanced.

‘Contrast dynamics’, the type of data provided by DART, is critical for policy makers and curatorial managers to assess both the state and the rate of change in heritage landscapes, and helps to address European Landscape Convention (ELC) commitments. Making the best use of the data, however, depends on openly accessible dynamic monitoring, along the lines of that developed for the Global Monitoring for Environment and Security (GMES) satellite constellations under development by the European Space Agency. What is required is an accessible framework which allows all this data to be integrated, processed and modelled in a timely manner.

It is critical that policy makers and curatorial managers are able to assess both the state and the rate of change in heritage landscapes. This need is wrapped up in national commitments to the European Landscape Convention (ELC). Making the best use of the data, however, depends on openly accessible dynamic monitoring, along similar lines to that proposed by the European Space Agency for the Global Monitoring for Environment and Security (GMES) satellite constellations. What is required is an accessible framework which allows all this data to be integrated, processed and modelled in a timely manner. The approaches developed in DART to improve the understanding and enhance the modelling of heritage contrast detection dynamics feeds directly into this long-term agenda.

Cross-disciplinary research and Open Science

Such approaches cannot be undertaken within a single domain of expertise. This vision can only be built by openly collaborating with other scientists and building on shared data, tools and techniques. Important developments will come from the GMES community, particularly from precision agriculture, soil science, and well documented data processing frameworks and services. At the same time, the information collected by projects like DART can be re-used easily by others. For example, DART data has been exploited by the Royal Agricultural University (RAU) for use in such applications as carbon sequestration in hedges, soil management, soil compaction and community mapping. Such openness also promotes collaboration: DART partners have been involved in a number of international grant proposals and have developed a longer term partnership with the RAU.

Open Science advocates opening access to data, and other scientific objects, at a much earlier stage in the research life-cycle than traditional approaches. Open Scientists argue that research synergy and serendipity occur through openly collaborating with other researchers (more eyes/minds looking at the problem). Of great importance is the fact that the scientific process itself is transparent and can be peer reviewed: as a result of exposing data and the processes by which these data are transformed into information, other researchers can replicate and validate the techniques. As a consequence, we believe that collaboration is enhanced and the boundaries between public, professional and amateur are blurred.

Challenges ahead for Open Science

Whilst DART has not achieved all its aims, it has made significant progress and has identified some barriers in achieving such open approaches. Key to this is the articulation of issues surrounding data-access (accreditation), licensing and ethics. Who gets access to data, when, and under what conditions, is a serious ethical issue for the heritage sector. These are obviously issues that need co-ordination through organisations like Research Councils UK with cross-cutting input from domain groups. The Arts and Humanities community produce data and outputs with pervasive social and ethical impact, and it is clearly important that they have a voice in these debates.

“Share, improve and reuse public sector data” – French Government unveils new CKAN-based data.gouv.fr

Guest - December 26, 2013 in CKAN, OKF France, Open Data, Open Government Data

This is a guest post from Rayna Stamboliyska and Pierre Chrzanowski of the Open Knowledge Foundation France

Etalab, the Prime Minister’s task force for Open Government Data, unveiled on December 18 the new version of the data.gouv.fr platform (1). OKF France salutes the work the Etalab team has accomplished, and welcomes the new features and the spirit of the new portal, rightly summed up in the website’s baseline, “share, improve and reuse public sector data”.

OKF France was represented by Samuel Goëta at the data.gouv.fr launch event OKF France was represented at the data.gouv.fr launch event by Samuel Goëta in the presence of Jean-Marc Ayrault, Prime Minister of France, Fleur Pellerin, Minister Delegate for Small and Medium Enterprises, Innovation, and the Digital Economy and Marylise Lebranchu, Minister of the Reform of the State. Photo credit: Yves Malenfer/Matignon

Etalab has indeed chosen to offer a platform resolutely turned towards collaboration between data producers and re-users. The website now enables everyone not only to improve and enhance the data published by the government, but also to share their own data; to our knowledge, a world first for a governmental open data portal. In addition to “certified” data (i.e., released by departments and public authorities), data.gouv.fr also hosts data published by local authorities, delegated public services and NGOs. Last but not least, the platform also identifies and highlights other, pre-existing, Open Data portals such as nosdonnees.fr (2). A range of content publishing features, a wiki and the possibility of associating reuses such as visualizations should also allow for a better understanding of the available data and facilitate outreach efforts to the general public.

We at OKF France also welcome the technological choices Etalab made. The new data.gouv.fr is built around CKAN, the open source software whose development is coordinated by the Open Knowledge Foundation. All features developed by the Etalab team will be available for other CKAN-based portals (e.g., data.gov or data.gov.uk). In turn, Etalab may more easily master innovations implemented by others.

The new version of the platform clearly highlights the quality rather than quantity of datasets. This paradigm shift was expected by re-users. On one hand, datasets with local coverage have been pooled thus providing nation-wide coverage. On the other hand, the rating system values datasets with the widest geographical and temporal coverage as well as the highest granularity.

Screenshot from data.gouv.fr home page

The platform will continue to evolve and we hope that other features will soon complete this new version, for example:

  • the ability to browse data by facets (data producers, geographical coverage or license, etc.);
  • a management system for “certified” (clearly labelled institutional producer) and “non-certified” (data modified, produced, added by citizens) versions of a dataset;
  • a tool for previewing data, as natively proposed by CKAN;
  • the ability to comment on the datasets;
  • a tool that would allow to enquire about a dataset directly at the respective public administration.

Given this new version of data.gouv.fr, it is now up to the producers and re-users of public sector data to demonstrate the potential of Open Data. This potential can only be fully met with the release of fundamental public sector data as a founding principle for our society. Thus, we are still awaiting for the opening of business registers, detailed expenditures as well as non-personal data on prescriptions issued by healthcare providers.

Lastly, through the new data.gouv.fr, administrations are no longer solely responsible for the common good that is public sector data. Now this responsibility is shared with all stakeholders. It is thus up to all of us to demonstrate that this is the right choice.


(1) This new version of data.gouv.fr is the result of codesign efforts that the Open Knowledge Foundation France participated in.

(2) Nosdonnees.fr is co-managed by Regards Citoyens and OKF France.

Read Etalab’s press release online here

2013 – A great year for CKAN

Darwin Peltan - December 24, 2013 in CKAN

2013 has seen CKAN and the CKAN community go from strength to strength. Here are some of the highlights.

Screenshot from CKAN demo site

February

May

June

July

August

  • CKAN 2.1 released with new capabilities for managing bulk datasets amongst many other improvements

September

October

  • Substantial new version of CKAN’s geospatial extension, including pycsw and MapBox integration and revised and expanded docs.

November

  • Future City Glasgow launch open.glasgow.gov.uk prototype as part of their TSB funded Future Cities Demonstrator programme

December

Looking forward

The CKAN community is growing incredibly quickly so we’re looking forward to seeing what people do with CKAN in 2014.

So if your city, region or state hasn’t already done so, why not make 2014 the year that you launch your own CKAN powered open data portal?

Download CKAN or contact us if you need help getting started.

This post was cross posted from the CKAN blog

Open Data professional services now available on G-Cloud 4

Open Knowledge - November 8, 2013 in CKAN, Our Work, Services

We are pleased to announce that the Open Knowledge Foundation are now an approved supplier on the G-Cloud 4 Services Framework.

This means that it’s now even easier for UK government organisations to commission the Open Knowledge Foundation.

We are offering a range of services via G-Cloud including setting up and deploying a CKAN open data portal – perfect if you want to start publishing open data quickly.

We can also offer technical consultancy and support if you need bespoke features developed for your open data portal.

Our full list of services available via G-Cloud 4 are:

If you have any questions or would like to work with us via G-Cloud please get in touch on services@okfn.org.

CKAN hackathon: Hello from Ireland!

Denis Parfenov - October 7, 2013 in CKAN, OKF Ireland, Sprint / Hackday

This post was written by Denis Parfenov, our Ireland Ambassador, and Flora Fleischer, a member of the new Local Group for Ireland.

Last Saturday, the ‘Open Data Ireland’ community and the Open Knowledge Foundation Network held a ‘CKAN Hackathon’. This event was kindly sponsored by Fingal County Council, ESRI Ireland and The Irish Organisation for Geographic Information.

Dublin Castle

Developers, designers, journalists, academics, policy makers, creative thinkers, civil servants, entrepreneurs and active citizens all came together to revive open data in Ireland and to establish an epicenter for encouragement and development of open knowledge in Ireland by launching the official Open Knowledge Foundation Ireland Local Group.

Groups were formed around 4 specific tasks:

(1) deploying a Central Open Data Portal that provides the people of Ireland with a single access point to the information collected by their government,
(2) auditing and validating existing public domain data for inclusion in such portal,
(3) preparing the Open Knowledge Foundation Ireland recommendations for inclusion into the first Irish Open Government Partnership National Action Plan and,
(4) creating an educational hub about the power of open data

20130928_151936

On the day, Group 1 managed to secure hosting and deploy CKAN 2.1, and link temporarily to a new portal site (http://ckan.curatedublin.com/) until it redirects to data.opendata.ie. It now comes complete with filestore, datastore, harvester and spatial extensions! Existing and new data sets have been transferred to the new portal. It’s still a work in progress but people in Ireland can now access a list of 275 open data sets about Ireland via the search function.

The second group started early on the day to search, audit and validate all available data pertaining to Ireland. The group searched relentlessly and identified 166 open data sets, 16 open data catalogues and 29 open data APIs available from various websites in and about Ireland. They worked together with Group 1 to determine the metadata requirements, and then proceeded to review and validate the information and usability of each data set. The group explored potential use cases of how the data available through the portal can be combined to find answers to questions that could enhance the lives of the people in Ireland, such as which local school to choose. For people who were new to the Open Data Ireland community this task was a great hands-on way of learning about the issues around open data.

20130928_134727

The third group came up with some new recommendations after collectively reviewing the draft report on the consultation for Ireland’s participation in the Open Government Partnership. The group talked through the advantages and challenges around making data public. In their submission to the OGP National Action Plan they recommend the creation of an Open Data Institute Ireland linked to the already well established Open Data Institute in the UK, to catalyse the evolution of an open data culture to create economic, environmental, and social value. (See why we need an Open Data Institute in Ireland.)

Thanks go to ODI’s partner, long-term friend and supporter of the Open Data community in Ireland, CTO of ‘Open Data Solutions’ Jason Hare (Raleigh, NC) for attending and supporting the group in preparation of the submission.

The fourth group did a great job at setting up an intuitive and contemporary website to help the average citizen to understand what open data is, what it can do for us and how we can be empowered by it. The site also gives practical tips on how to get involved. The team set up a Google website, and migration to opendata.ie is a work in progress.

20130928_093118

The last group made sure that we were capturing this very, very successful CKAN Hackathon for the outside world. The group never failed to fill in and support other groups, providing assistance whenever necessary. A great job was done making it a fun and successful event!

Thanks to everyone who participated in CKAN Hackathon ‘in the room’ or online! Together, we co-founded Open Knowledge Foundation Local Initiative in Ireland on September 28th, 2013!

We now have a flickr site capturing the event in pics and if you’d like to follow-up with what has been happening on Twitter while we were hard at work, you can do that, too, at storify.

The next ‘Open Data Ireland’ meetup will take place in TCube on Thursday, 24th October 2013. Doors open at 18:30

Images: Dublin Castle by Wojtek Gurak, CC-BY-NC; CKAN Hackathon by OKF Ireland, CC-BY-NC-SA

CKAN Hackathon and Local Group launch, Dublin

Denis Parfenov - September 27, 2013 in CKAN, OKF Ireland, Sprint / Hackday

The following is cross-posted from the Open Government Partnership blog

A CKAN hackathon is taking place on Saturday, 28th September at TCube in Dublin, bringing together IT specialists, political representatives and members of the public with an interest in making data open.

Developers, designers, journalists, academics, policy makers, creative thinkers, civil servants, entrepreneurs and interested parties are invited to the event which aims to provide the people of Ireland with a single access point to the information collected by their government by deploying a Central Open Data Portal. Open, usable and available knowledge will lead to greater transparency for Irish citizens and accountability from Irish representatives.

We strongly believe that comprehensive and meaningful information has the potential to empower better evidence-based decision-making for all of us: about the food we buy and eat, the services we enlist, choices about healthcare and education that we make, the pension plans we decide to invest in, and the public representatives we elect. Better information empowers us to be better consumers, clients, patients, students, investors and active citizens.

The event is co-organised by the ‘Open Data Ireland’ community and the Open Knowledge Foundation with the support of Fingal County Council, ESRI Ireland and The Irish Organisation for Geographic Information (IRLOGI).

The hackathon will review information that is already publicly available and launch a local Open Knowledge Foundation Network Local Group which will encourage the development of open knowledge in Ireland.

You can register for the event here, and follow #okfnIRL for updates on the Open Knowledge Foundation’s activities in Ireland.

Get Updates