Support Us

You are browsing the archive for OKF Projects.

Upcoming Community Sessions: CKAN, Community Feedback

Heather Leson - April 28, 2014 in CKAN, Events, Network, Open Knowledge Foundation Local Groups, Our Work, Working Groups

Happy week! We are hosting two Community Sessions this week. You have expressed an interest in learning more about CKAN. As well, We are continuing our regular Community Feedback sessions.

Boy and the world image

Take a CKAN Tour:

This week we will give an overview and tour of CKAN – the leading open source open data platform used by the national governments of the US, UK, Brazil, Canada, Australia, France, Germany, Austria and many more. This session will cover why data portals are useful, what they provide and showcase examples and best practices from CKAN’s varied user base! Bring your questions on how to get started and best practices.

Guest: Irina Bolychevsky, Services Director (Open Knowledge) Questions are welcome via G+ or Twitter.

  • Date: Wednesday, April 30, 2014
  • Time: 7:30 PT /10:30 ET /14:30 UTC /15:30 BST/16:30 CEST
  • Duration: 1 hour
  • Register and Join via G+ (The Hangout will be recorded.)
Community Feedback Session

We promised to schedule another Community Feedback Session. It is hard to find a common time for folks. We will work on timeshifting these for next sessions. This is a chance to ask questions, give input and help shape Open Knowledge.

Please join Laura, Naomi and I for the next Community Feedback Session. Bring your ideas and questions.

  • Date: Wednesday, April 30, 2014
  • Time:9:00 PT/12:00EDT/16:00 UTC /17:00 BST/18:00 CEST
  • Duration:1 hour
  • Join via Meeting Burner

We will use Meeting Burner and IRC. (Note: We will record both of these.)

How to join meeting Burner: Audio instructions Option 1 Dial-in to the following conference line: Number 1- (949) 229 – 4400 # Option 2 You may join the conference bridge with your computer’s microphone/speakers or headset

How to join IRC: http://wiki.okfn.org/How_to_use_IRC/_Clients_and_Tips

More about the new Open Knowledge Brand

Host a Community Session in May

We are booking Community Sessions for May. These Open Knowledge online events can be in a number of forms: a scheduled IRC chat, a community google hangout, a technical sprint or an editathon. The goal is to connect the community to learn and share their stories and skills. If you would like to suggest a session or host one, please contact heather dot leson at okfn dot org.

More details about Community Sessions

(Photo: Heather Leson (San Francisco))

Building an archaeological project repository II: Where are the research data repositories?

Guest - April 17, 2014 in CKAN, Open Science, WG Archaeology

This is a guest post by Anthony Beck, Honorary fellow, and Dave Harrison, Research fellow, at the University of Leeds School of Computing

DART_UML_DART_2011_2013_RAW

Data repository as research tool

In a previous post, we examined why Open Science is necessary to take advantage of the huge corpus of data generated by modern science. In our project Detection of Archaeological residues using Remote sensing Techniques, or DART, we adopted Open Science principles and made all the project’s extensive data available through a purpose-built data repository built on the open-source CKAN platform. But with so many academic repositories, why did we need to roll our own? A final post will look at how the portal was implemented.

DART: data-driven archaeology

DART’s overall aim is to develop analytical methods to differentiate archaeological sediments from non-archaeological strata, on the basis of remotely detected phenomena (e.g. resistivity, apparent dielectric permittivity, crop growth, thermal properties etc). DART is a data rich project: over a 14 month period, in-situ soil moisture, soil temperature and weather data were collected at least once an hour; ground based geophysical surveys and spectro-radiometry transects were conducted at least monthly; aerial surveys collecting hyperspectral, LiDAR and traditional oblique and vertical photographs were taken throughout the year, and laboratory analyses and tests were conducted on both soil and plant samples. The data archive itself is in the order of terabytes.

Analysis of this archive is ongoing; meanwhile, this data and other resources are made available through open access mechanisms under liberal licences and are thus accessible to a wide audience. To achieve this we used the open-source CKAN platform to build a data repository, DARTPortal, which includes a publicly queryable spatio-temporal database (on the same host), and can support access to individual data as well as mining or analysis of integrated data.

This means we can share the data analysis and transformation processes and demonstrate how we transform data into information and synthesise this information into knowledge (see, for example, this Ipython notebook which dynamically exploits the database connection). This is the essence of Open Science: exposing the data and processes that allow others to replicate and more effectively build on our science.

Lack of existing infrastructure

Pleased though we are with our data repository, it would have been nice not to have to build it! Individual research projects should not bear the burden of implementing their own data repository framework. This is much better suited to local or national institutions where the economies of scale come into their own. Yet in 2010 the provision of research data infrastructure that supported what DART did was either non-existent or poorly advertised. Where individual universities provided institutional repositories, these were focused on publications (the currency of prestige and career advancement) and not on data. Irrespective of other environments, none of the DART collaborating partners provided such a data infrastructure.

Data sharing sites like Figshare did not exist – and when it did exist the size of our hyperspectral data, in particular, was quite rightly a worry. This situation is slowly changing, but it is still far from ideal. The positions taken by Research Councils UK and the Engineering and Physical Science Research Council (EPSRC) on improving access to data are key catalysts for change. The EPSRC statement is particularly succinct:

Two of the principles are of particular importance: firstly, that publicly funded research data should generally be made as widely and freely available as possible in a timely and responsible manner; and, secondly, that the research process should not be damaged by the inappropriate release of such data.

This has produced a simple economic issue – if research institutions can not demonstrate that they can manage research data in the manner required by the funding councils then they will become ineligible to receive grant funding from that council. The impact is that the majority of universities are now developing their own, or collaborating on communal, data repositories.

But what about formal data deposition environments?

DART was generously funded through the Science and Heritage Programme supported by the UK Arts and Humanities Research Council (AHRC) and the EPSRC. This means that these research councils will pay for data archiving in the appropriate domain repository, in this case the Archaeology Data Service (ADS). So why produce our own repository?

Deposition to the ADS would only have occurred after the project had finished. With DART, the emphasis has been on re-use and collaboration rather than primarily on archiving. These goals are not mutually exclusive: the methods adopted by DART mean that we produced data that is directly suitable for archiving (well documented ASCII formats, rich supporting description and discovery metadata, etc) whilst also allowing more rapid exposure and access to the ‘full’ archive. This resulted in DART generating much richer resource discovery and description metadata than would have been the case if the data was simply deposited into the ADS.

The point of the DART repository was to produce an environment which would facilitate good data management practice and collaboration during the lifetime of the project. This is representative of a crucial shift in thinking, where projects and data collectors consider re-use, discovery, licences and metadata at a much earlier stage in the project life cycle: in effect, to create dynamic and accessible repositories that have impact across the broad stakeholder community rather than focussing solely on the academic community. The same underpinning philosophy of encouraging re-use is seen at both FigShare and DataHub. Whilst formal archiving of data is to be encouraged, if it is not re-useable, or more importantly easily re-useable, within orchestrated scientific workflow frameworks then what is the point.

In addition, it is unlikely that the ADS will take the full DART archive. It has been said that archaeological archives can produce lots of extraneous or redundant ‘stuff’. This can be exacerbated by the unfettered use of digital technologies – how many digital images are really required for the same trench? Whilst we have sympathy with this argument, there is a difference between ‘data’ and ‘pretty pictures’: as data analysts, we consider that a digital photograph is normally a data resource and rarely a pretty picture. Hence, every image has value.

This is compounded when advances in technology mean that new data can be extracted from ‘redundant’ resources. For example, Structure from Motion (SfM) is a Computer Vision technique that extracts 3D information from 2D objects. From a series of overlapping photographs, SfM techniques can be used to extract 3D point clouds and generate orthophotographs from which accurate measurements can be taken. In the case of SfM there is no such thing as redundancy, as each image becomes part of a ‘bundle’ and the statistical characteristics of the bundle determine the accuracy of the resultant model. However, one does need to be pragmatic, and it is currently impractical for organisations like the ADS to accept unconstrained archives. That said, it is an area that needs review: if a research object is important enough to have detailed metadata created about it, then it should be important enough to be archived.

For DART, this means that the ADS is hosting a subset of the archive in long-term re-use formats, which will be available in perpetuity (which formally equates to a maximum of 25 years), while the DART repository will hold the full archive in long term re-use formats until we run out of server money. We are are in discussion with Leeds University to migrate all the data objects over to the new institutional repository with sparkling new DOIs and we can transfer the metadata held in CKAN over to Open Knowledge’s public repository, the dataHub. In theory nothing should be lost.

How long is forever?

The point on perpetuity is interesting. Collins Dictionary defines perpetuity as ‘eternity’. However, the ADS defines ‘digital’ perpetuity as 25 years. This raises the question: is it more effective in the long term to deposit in ‘formal’ environments (with an intrinsic focus on preservation format over re-use), or in ‘informal’ environments (with a focus on re-use and engagement over preservation (Flickr, Wikimedia Commons, DART repository based on CKAN, etc)? Both Flickr and Wikimedia Commons have been around for over a decade. Distributed peer to peer sharing, as used in Git, produces more robust and resilient environments which are equally suited to longer term preservation. Whilst the authors appreciate that the situation is much more nuanced, particularly with the introduction of platforms that facilitate collaborative workflow development, this does have an impact on long-term deployment.

Choosing our licences

Licences are fundamental to the successful re-use of content. Licences describe who can use a resource, what they can do with this resource and how they should reference any resource (if at all).

Two lead organisations have developed legal frameworks for content licensing, Creative Commons (CC) and Open Data Commons (ODC). Until the release of CC version 4, published in November 2013, the CC licence did not cover data. Between them, CC and ODC licences can cover all forms of digital work.

At the top level the licences are permissive public domain licences (CC0 and PDDL respectively) that impose no restrictions on the licensees use of the resource. ‘Anything goes’ in a public domain licence: the licensee can take the resource and adapt it, translate it, transform it, improve upon it (or not!), package it, market it, sell it, etc. Constraints can be added to the top level licence by employing the following clauses:

  • BY – By attribution: the licensee must attribute the source.
  • SA – Share-alike: if the licensee adapts the resource, they must release the adapted resource under the same licence.
  • NC – Non commercial: the licensee must not use the work within a commercial activity without prior approval. Interestingly, in many area of the world, the use of material in university lectures may be considered a commercial activity. The non-commercial restriction about the nature of the activity, not the legal status of the institution doing the work.
  • ND – No derivatives: the licensee can not derive new content from the resource.

Each of these clauses decreases the ‘open-ness’ of the resource. In fact, the NC and ND clause are not intrinsically open (they restrict both who can use and what you can do with the resource). These restrictive clauses have the potential to produce license incompatibilities which may introduce profound problems in the medium to long term. This is particularly relevant to the SA clause. Share-alike means that any derived output must be licensed under the same conditions as the source content. If content is combined (or mashed up) – which is essential when one is building up a corpus of heritage resources – then content created under a SA clause can not be combined with content that includes a restrictive clause (BY, NC or ND) that is not in the source licence. This licence incompatibility has a significant impact on the nature of the data commons. It has the potential to fragment the data landscape creating pockets of knowledge which are rarely used in mainstream analysis, research or policy making. This will be further exacerbated when automated data aggregation and analysis systems become the norm. A permissive licence without clauses like Non-commercial, Share-alike or No-derivatives removes such licence and downstream re-user fragmentation issues.

For completeness, specific licences have been created for Open Government Data. The UK Government Data Licence for public sector information is essentially an open licence with a BY attribution clause.

At DART we have followed the guidelines of The Open Data Institute and separated out creative content (illustrations, text, etc.) from data content. Hence, the DART content is either CC-BY or ODC-BY respectively. In the future we believe it would be useful to drop the BY (attribution) clause. This would stop attribute stacking (if the resource you are using is a derivative of a derivative of a derivative of a ….. (you get the picture), at what stage do you stop attribution) and anything which requires bureaucracy, such as attributing an image in a powerpoint presentation, inhibits re-use (one should always assume that people are intrinsically lazy). There is a post advocating ccZero+ by Dan Cohen. However, impact tracking may mean that the BY clause becomes a default for academic deposition.

The ADS uses a more restrictive bespoke default licence which does not map to national or international licence schemes (they also don’t recognise non CC licences). Resources under this licence can only be used for teaching, learning, and research purposes. Of particular concern is their use of the NC clause and possible use of the ND clause (depending on how you interpret the licence). Interestingly, policy changes mean that the use of data under the bespoke ADS licence becomes problematic if university teaching activities are determined to be commercial. It is arguable that the payment of tuition fees represents a commercial activity. If this is true then resources released under the ADS licence can not be used within university teaching which is part of a commercial activity. Hence, the policy change in student tuition and university funding has an impact on the commercial nature of university teaching which has a subsequent impact on what data or resources universities are licensed to use. Whilst it may never have been the intention of the ADS to produce a licence with this potential paradox, it is a problem when bespoke licences are developed, even if they were originally perceived to be relatively permissive licences. To remove this ambiguity it is recommended that submissions to the ADS are provided under a CC licence which renders the bespoke ADS licence void.

In the case of DART, these licence variations with the ADS should not be a problem. Our licences are permissive (by attribution is the only clause we have included). This means the ADS can do anything they want with our resources as long as they cite the source. In our case this would be the individual resource objects or collections on the DART portal. This is a good thing, as the metadata on the DART portal is much richer than the metadata held by the ADS.

Concerns about opening up data, and responses which have proved effective

Christopher Gutteridge (University of Southampton) and Alexander Dutton (University of Oxford) have collated a Google doc entitled ‘Concerns about opening up data, and responses which have proved effective‘. This document describes a number of concerns commonly raised by academic colleagues about increasing access to data. For DART two issues became problematic that were not covered by this document:

  • The relationship between open data and research novelty and the impact this may have on a PhD submission.
  • Journal publication – specifically that a journal won’t publish a research paper if the underlying data is open.

The former point is interesting – does the process of undertaking open science, or at least providing open data, undermine the novelty of the resultant scientific process? With open science it could be difficult to directly attribute the contribution, or novelty, of a single PhD student to an openly collaborative research process. However, that said, if online versioning tools like Git are used, then it is clear who has contributed what to a piece of code or a workflow (the benefits of the BY clause). This argument is less solid when we are talking solely about open data. Whilst it is true that other researchers (or anybody else for that matter) have access to the data, it is highly unlikely that multiple researchers will use the same data to answer exactly the same question. If they do ask the same question (and making the optimistic assumption that they reach the same conclusion), it is still highly unlikely that they will have done so by the same methods; and even if they do, their implementations will be different. If multiple methods using the same source data reach the same conclusion then there is an increased likelihood that the conclusion is correct and that the science is even more certain. The underlying point here is that 21st-century scientific practice will substantially benefit from people showing their working. Exposure of the actual process of scientific enquiry (the algorithms, code, etc.) will make the steps between data collection and publication more transparent, reproduceable and peer-reviewable – or, quite simply, more scientific. Hence, we would argue that open data and research novelty is only a problem if plagiarism is a problem.

The journal publication point is equally interesting. Publications are the primary metric for academic career progression and kudos. In this instance it was the policy of the ‘leading journal in this field’ that they would not publish a paper from a dataset that was already published. No credible reasons were provided for this clause – which seems draconian in the extreme. It does indicate that no one size fits all approach will work in the academic landscape. It will also be interesting to see how this journal, which publishes work which is mainly funded by EPSRC, responds to the EPSRC guidelines on open data.

This is also a clear demonstration that the academic community needs to develop new metrics that are more suited to 21st century research and scholarship by directly link academic career progression to other source of impact that go beyond publications. Furthermore, academia needs some high-profile exemplars that demonstrate clearly how to deal with such change. The policy shift and ongoing debate concerning ‘Open access’ publications in the UK is changing the relationship between funders, universities, researchers, journals and the public – a similar debate needs to occur about open data and open science.

The altmetrics community is developing new metrics for “analyzing, and informing scholarship” and have described their ethos in their manifesto. The Research Councils and Governments have taken a much greater interest in the impact of publically funded research. Importantly public, social and industry impact are as important as academic impact. It is incumbent on universities to respond to this by directly linking academic career progression through to impact and by encouraging improved access to the underlying data and procesing outputs of the research process through data repositories and workflow environments.

Skillshares and Stories: Upcoming Community Sessions

Heather Leson - April 3, 2014 in CKAN, Events, Network, OKF Brazil, OKF Projects, Open Access, Open Knowledge Foundation Local Groups, School of Data

We’re excited to share with you a few upcoming Community Sessions from the School of Data, CKAN, Open Knowledge Brazil, and Open Access. As we mentioned earlier this week, we aim to connect you to each other. Join us for the following events!

What is a Community Session: These online events can be in a number of forms: a scheduled IRC chat, a community google hangout, a technical sprint or hackpad editathon. The goal is to connect the community to learn and share their stories and skills.

We held our first Community Session yesterday. (see our Wiki Community Session notes) The remaining April events will be online via G+. These sessions will be a public Hangout to Air. The video will be available on the Open Knowledge Youtube Channel after the event. Questions are welcome via Twitter and G+.

All these sessions are Wednesdays at 10:30 – 11:30 am ET/ 14:30 – 15:30 UTC.

Mapping with Ketty and Ali: a School of Data Skillshare (April 9, 2014)

Making a basic map from spreadsheet data: We’ll explore tools like QGIS (a free and Open-source Geographic Information System), Tilemill (a tool to design beautiful interactive web maps) Our guest trainers are Ketty Adoch and Ali Rebaie.

To join the Mapping with Ketty and Ali Session on April 9, 2014

Q & A with Open Knowledge Brazil Chapter featuring Everton(Tom) Zanella Alvarenga (April 16, 2014)

Around the world, local groups, Chapters, projects, working groups and individuals connect to Open Knowledge. We want to share your stories.

In this Community Session, we will feature Everton (Tom) Zanella Alvarenga, Executive Director.

Open Knowledge Foundation Brazil is a newish Chapter. Tom will share his experiences growing a chapter and community in Brazil. We aim to connect you to community members around the world. We will also open up the conversation to all things Community. Share your best practices

Join us on April 16, 2014 via G+

Take a CKAN Tour (April 23, 2014)

This week we will give an overview and tour of CKAN – the leading open source open data platform used by the national governments of the US, UK, Brazil, Canada, Australia, France, Germany, Austria and many more. This session will cover why data portals are useful, what they provide and showcase examples and best practices from CKAN’s varied user base! Our special guest is Irina Bolychevsky, Services Director (Open Knowledge Foundation).

Learn and share your CKAN stories on April 23, 2014

(Note: We will share more details about the April 30th Open Access session soon!)

Resources

The School of Data Journalism 2014!

Milena Marin - April 3, 2014 in Data Journalism, Events, Featured, School of Data

DJH_5 copy

We’re really excited to announce this year’s edition of the School of Data Journalism, at the International Journalism Festival in Perugia, 30th April – 4th May.

It’s the third time we’ve run it (how time flies!), together with the European Journalism Centre, and it’s amazing seeing the progress that has been made since we started out. Data has become an increasingly crucial part of any journalists’ toolbox, and its rise is only set to continue. The Data Journalism Handbook, which was born at the first School of Data Journalism is Perugia, has become a go-to reference for all those looking to work with data in the news, a fantastic testament to the strength of the data journalism community.

As Antoine Laurent, Innovation Senior Project Manager at the EJC, said:

“This is really a must-attend event for anyone with an interest in data journalism. The previous years’ events have each proven to be watershed moments in the development of data journalism. The data revolution is making itself felt across the profession, offering new ways to tell stories and speak truth to power. Be part of the change.”

Here’s the press release about this year’s event – share it with anyone you think might be interested – and book your place now!


PRESS RELEASE FOR IMMEDIATE RELEASE

April 3rd, 2014

Europe’s Biggest Data Journalism Event Announced: the School of Data Journalism

The European Journalism Centre, Open Knowledge and the International Journalism Festival are pleased to announce the 3rd edition of Europe’s biggest data journalism event, the School of Data Journalism. The 2014 edition takes place in Perugia, Italy between 30th of April – 4th of May as part of the International Journalism Festival.

#ddjschool #ijf13

A team of about 25 expert panelists and instructors from New York Times, The Daily Mirror, Twitter, Ask Media, Knight-Mozilla and others will lead participants in a mix of discussions and hands-on sessions focusing on everything from cross-border data-driven investigative journalism, to emergency reporting and using spreadsheets, social media data, data visualisation and mapping techniques for journalism.

Entry to the School of Data Journalism panels and workshops is free. Last year’s editions featured a stellar team of panelists and instructors, attracted hundreds of journalists and was fully booked within a few days. The year before saw the launch of the seminal Data Journalism Handbook, which remains the go-to reference for practitioners in the field.

Antoine Laurent, Innovation Senior Project Manager at the EJC said:

“This is really a must-attend event for anyone with an interest in data journalism. The previous years’ events have each proven to be watershed moments in the development of data journalism. The data revolution is making itself felt across the profession, offering new ways to tell stories and speak truth to power. Be part of the change.”

Guido Romeo, Data and Business Editor at Wired Italy, said:

“I teach in several journalism schools in Italy. You won’t get this sort of exposure to such teachers and tools in any journalism school in Italy. They bring in the most avant garde people, and have a keen eye on what’s innovative and new. It has definitely helped me understand what others around the world in big newsrooms are doing, and, more importantly, how they are doing it.”

The full description and the (free) registration to the sessions can be found on http://datajournalismschool.net You can also find all the details on the International Journalism Festival website: http://www.journalismfestival.com/programme/2014

ENDS

Contacts: Antoine Laurent, Innovation Senior Project Manager, European Journalism Centre: laurent@ejc.net Milena Marin, School of Data Programme Manager, Open Knowledge Foundation, milena.marin@okfn.org

Notes for editors

Website: http://datajournalismschool.net Hashtag: #DDJSCHOOL

The School of Data Journalism is part of the European Journalism Centre’s Data Driven Journalism initiative, which aims to enable more journalists, editors, news developers and designers to make better use of data and incorporate it further into their work. Started in 2010, the initiative also runs the website DataDrivenJournalism.net as well as the Doing Journalism with Data MOOC, and produced the acclaimed Data Journalism Handbook.

About the International Journalism Festival (www.journalismfestival.com) The International Journalism Festival is the largest media event in Europe. It is held every April in Perugia, Italy. The festival is free entry for all attendees for all sessions. It is an open invitation to listen to and network with the best of world journalism. The leitmotiv is one of informality and accessibility, designed to appeal to journalists, aspiring journalists and those interested in the role of the media in society. Simultaneous translation into English and Italian is provided.

About Open Knowledge (www.okfn.org) Open Knowledge, founded in 2004, is a worldwide network of people who are passionate about openness, using advocacy, technology and training to unlock information and turn it into insight and change. Our aim is to give everyone the power to use information and insight for good. Visit okfn.org to learn more about the Foundation and its major projects including SchoolOfData.org and OpenSpending.org.

About the European Journalism Centre (www.ejc.net) The European Journalism Centre is an independent, international, non-profit foundation dedicated to maintaining the highest standards in journalism in particular and the media in general. Founded in 1992 in Maastricht, the Netherlands, the EJC closely follows emerging trends in journalism and watchdogs the interplay between media economy and media culture. It also hosts each year more than 1.000 journalists in seminars and briefings on European and international affairs.

Happy Spring Cleaning, Community Style

Heather Leson - April 1, 2014 in Community Stories, Events, Featured, Network, OKF Projects, OKFestival, Open Knowledge Foundation, Open Knowledge Foundation Local Groups, Our Work, Working Groups

OKF_HK

Crazy about happy? Call it spring fever, but I am slightly addicted to the beautiful creativity of people around the world and their Happy videos (map). We are just one small corner of the Internet and want to connect you to Open Knowledge. To do this, we, your community managers, need to bring in the Happy. How can we connect you, meet your feedback, continue the spirit of global Open Data Day, and celebrate our upcoming 10 year anniversary as Open Knowledge? Tall order, but consider this.

Open Knowledge is a thriving network. We exist because of all of you and the incremental efforts each of you make on a wide-range of issues around the world. The way forward is to flip the community around. We will focus on connecting you to each other. Call it inspired by Happy or the Zooinverse mission, but we heard your input into the community survey and want to meet it.

Coffee smiley by spaceageboy

So, here are 4 key ways we aim to connect you:

1. Community Tumblr

Greece, MENA, and Tanzania – these are just some of the locations of Open Knowledge Stories on the Community Tumblr. We know that many of you have stories to tell. Have something to say or share? Submit a story. Just one look at the recent WordPress about 10 moments around the world gives me inspiration that the stories and impact exist, we just need to share more.

The Open Knowledge Community Tumblr

2. Wiki Reboot

As with every spring cleaning, you start by dusting a corner and end up at the store buying bookshelves and buckets of paint. The Open Knowledge wiki has long been ridden with spam and dust bunnies. We’ve given it a firm content kick to make it your space. We are inspired by the OpenStreetMap community wiki.

What next? Hop on over and create your Wiki User account – Tell us about yourself, See ways to Get Involved and Start Editing. We think that the wiki is the best way to get a global view of all things Open Knowledge and meet each other. Let’s make this our community hub.

3. Community Sessions

We have a core goal to connect you to each other. This April we are hosting a number of online community events to bring you together. Previously, we had great success with a number of online sessions around Open Data Day and OKFestival.

The Community Sessions can be in a number of forms: a scheduled IRC chat, a community Google hangout, a technical sprint or hackpad editathon. We are using the wiki to plan. All events will be announced on the blog and be listed in the main Open Knowledge events calendar.

Wiki planning for the Community Sessions:

The first session is Wednesday, April 2, 2014 at 14:30 UTC/10:30 ET. We will host an IRC chat all about the wiki. To join, hop onto irc.freenode.net #okfn. IRC is a free text-based chat service.

4. OkFestival

OKFestival is coming soon. You told us that events is one of the biggest ways that you feel connected to Open Knowledge. As you many know, there are regular online meetups for School of Data, CKAN and OpenSpending Communities. Events connect and converge all of us with location and ideas.

Are you planning your own events where you live or on a particular open topic? We can help in a few ways:

  • Let us know about the events you’re running! Let’s discover together how many people are joining Open knowledge events all around the world!
  • Never organized an event before or curious to try a new type of gathering? Check out our Events Handbook for tips and tricks and contact our Events Team if you have questions or feedback about it
  • Want to connect with other community members to talk about your events, share skills, create international series of events together? Ping our global mailing list!

Have some ideas on how we can bring on the happy more? Drop us a line on the okfn-discuss mailing list or reach out directly – heather DOT leson AT okfn DOT org.

(Photo by SpaceAgeBoy)

Building an archaeological project repository I: Open Science means Open Data

Guest - February 24, 2014 in CKAN, Open Science, WG Archaeology

This is a guest post by Anthony Beck, Honorary fellow, and Dave Harrison, Research fellow, at the University of Leeds School of Computing.

In 2010 we authored a series of blog posts for the Open Knowledge Foundation subtitled ‘How open approaches can empower archaeologists’. These discussed the DART project, which is on the cusp of concluding.

The DART project collected large amounts of data, and as part of the project, we created a purpose-built data repository to catalogue this and make it available, using CKAN, the Open Knowledge Foundation’s open-source data catalogue and repository. Here we revisit the need for Open Science in the light of the DART project. In a subsequent post we’ll look at why, with so many repositories of different kinds, we felt that to do Open Science successfully we needed to roll our own.

Open data can change science

Open inquiry is at the heart of the scientific enterprise. Publication of scientific theories – and of the experimental and observational data on which they are based – permits others to identify errors, to support, reject or refine theories and to reuse data for further understanding and knowledge. Science’s powerful capacity for self-correction comes from this openness to scrutiny and challenge. (The Royal Society, Science as an open enterprise, 2012)

The Royal Society’s report Science as an open enterprise identifies how 21st century communication technologies are changing the ways in which scientists conduct, and society engages with, science. The report recognises that ‘open’ enquiry is pivotal for the success of science, both in research and in society. This goes beyond open access to publications (Open Access), to include access to data and other research outputs (Open Data), and the process by which data is turned into knowledge (Open Science).

The underlying rationale of Open Data is this: unfettered access to large amounts of ‘raw’ data enables patterns of re-use and knowledge creation that were previously impossible. The creation of a rich, openly accessible corpus of data introduces a range of data-mining and visualisation challenges, which require multi-disciplinary collaboration across domains (within and outside academia) if their potential is to be realised. An important step towards this is creating frameworks which allow data to be effectively accessed and re-used. The prize for succeeding is improved knowledge-led policy and practice that transforms communities, practitioners, science and society.

The need for such frameworks will be most acute in disciplines with large amounts of data, a range of approaches to analysing the data, and broad cross-disciplinary links – so it was inevitable that they would prove important for our project, Detection of Archaeological residues using Remote sensing Techniques (DART).

DART: data-driven archaeology

DART aimed is to develop analytical methods to differentiate archaeological sediments from non-archaeological strata, on the basis of remotely detected phenomena (e.g. resistivity, apparent dielectric permittivity, crop growth, thermal properties etc). The data collected by DART is of relevance to a broad range of different communities. Open Science was adopted with two aims:

  • to maximise the research impact by placing the project data and the processing algorithms into the public sphere;
  • to build a community of researchers and other end-users around the data so that collaboration, and by extension research value, can be enhanced.

‘Contrast dynamics’, the type of data provided by DART, is critical for policy makers and curatorial managers to assess both the state and the rate of change in heritage landscapes, and helps to address European Landscape Convention (ELC) commitments. Making the best use of the data, however, depends on openly accessible dynamic monitoring, along the lines of that developed for the Global Monitoring for Environment and Security (GMES) satellite constellations under development by the European Space Agency. What is required is an accessible framework which allows all this data to be integrated, processed and modelled in a timely manner.

It is critical that policy makers and curatorial managers are able to assess both the state and the rate of change in heritage landscapes. This need is wrapped up in national commitments to the European Landscape Convention (ELC). Making the best use of the data, however, depends on openly accessible dynamic monitoring, along similar lines to that proposed by the European Space Agency for the Global Monitoring for Environment and Security (GMES) satellite constellations. What is required is an accessible framework which allows all this data to be integrated, processed and modelled in a timely manner. The approaches developed in DART to improve the understanding and enhance the modelling of heritage contrast detection dynamics feeds directly into this long-term agenda.

Cross-disciplinary research and Open Science

Such approaches cannot be undertaken within a single domain of expertise. This vision can only be built by openly collaborating with other scientists and building on shared data, tools and techniques. Important developments will come from the GMES community, particularly from precision agriculture, soil science, and well documented data processing frameworks and services. At the same time, the information collected by projects like DART can be re-used easily by others. For example, DART data has been exploited by the Royal Agricultural University (RAU) for use in such applications as carbon sequestration in hedges, soil management, soil compaction and community mapping. Such openness also promotes collaboration: DART partners have been involved in a number of international grant proposals and have developed a longer term partnership with the RAU.

Open Science advocates opening access to data, and other scientific objects, at a much earlier stage in the research life-cycle than traditional approaches. Open Scientists argue that research synergy and serendipity occur through openly collaborating with other researchers (more eyes/minds looking at the problem). Of great importance is the fact that the scientific process itself is transparent and can be peer reviewed: as a result of exposing data and the processes by which these data are transformed into information, other researchers can replicate and validate the techniques. As a consequence, we believe that collaboration is enhanced and the boundaries between public, professional and amateur are blurred.

Challenges ahead for Open Science

Whilst DART has not achieved all its aims, it has made significant progress and has identified some barriers in achieving such open approaches. Key to this is the articulation of issues surrounding data-access (accreditation), licensing and ethics. Who gets access to data, when, and under what conditions, is a serious ethical issue for the heritage sector. These are obviously issues that need co-ordination through organisations like Research Councils UK with cross-cutting input from domain groups. The Arts and Humanities community produce data and outputs with pervasive social and ethical impact, and it is clearly important that they have a voice in these debates.

Gauging the needs and challenges of the global open data community

Guest - February 21, 2014 in Global Open Data Initiative, Open Government Data

facebook-cover

This is a guest blog post by Julia Keserü, International Policy Manager at the Sunlight Foundation, which partners alongside ao. the Open Knowledge Foundation in the Global Open Data Initiative. Originally featured on the blog of the initiative.

A few months back, the Global Open Data Initiative (GODI) sought input from the transparency community to learn more about the needs and challenges associated with open data. We wanted to know what definitions, guidelines and resources the community relies on, what is missing to improve the work of our fellow practitioners and how a global initiative might be helpful to boost reform.

Through a survey and a series of interviews, we gathered anecdotes, lessons and inspiration from about 80 individuals in 32 different countries with diverse professional backgrounds – research/education, business/consulting, advocacy. What follows is a summary of our most interesting findings. For more, take a look at the full report here.

Open data – standards, guides and definitions

Most interviewees agreed that the basic definition of open data is government proactively publishing data online. However, in many countries, data is frequently perceived as a product of civil society organizations’ efforts – through freedom of information requests or website scraping – rather than a timely and trustable resource provided by governments. Practical openness is also seen as being contingent on the usability of data to those who are seeking to create change with it.

Despite widespread agreement that standards are important, in practice, the interviewees did not seem to be overly focused on them. In some regions, such as Latin America, practitioners are often unaware that open data standards and guidelines existed, due in part to the limited availability of Spanish language resources. Many noted that the term open data is too dry and technical, which might impede evangelizing efforts.

The community

Global networks seem to play an extremely important role in sharing knowledge and learning from each others’ experiences. Many are eager for GODI to help connect the different strands of the open data movement and provide a place for people to come and find potential partners and collaborators. A few mentioned a need to connect those working on open data at the national level to the international conversation and spread the word beyond the existing transparency community.

Interacting with governments

As expected, knowledge of open data is typically isolated within relevant departments and branches of government. Opening up data for ensuring transparency and accountability is still too often met with resistance and suspicion. Several organizations and individuals noted that their ability to interact and engage with public officials diminishes notably when they are seeking politically sensitive datasets — like company registers, budgets, or campaign finance information. There was widespread agreement that achieving data disclosure policies required a combination of both legislative and persuasive tactics.

Challenges

Unsurprisingly, the challenges faced by the majority of people we heard from could be boiled down to politics, access to data, data quality, and engagement. Many faced political resistance from governments unwilling to release data in the first place and the lack of good freedom of information laws in many countries is still inhibiting the development of open data.

On top of these, there is a certain confusion around open data and big data, and the community is in desperate need of credible impact studies that can provide a strong theory of change. Some regions, such as the African continent, are historically known to be burdened by issues of poor infrastructure and connectivity – data needs to be presented in more innovative ways there.

Opportunities for the open data community

There was a general consensus that a better networked global open data community could improve the way organizations collaborate, find partners, and prevent duplicating efforts. Many agreed that a large civil society alliance could offer the clout necessary to push for national agendas around open government. It could also help the reform agenda by articulating an open data solution that fits into the domain of transparency and create a feedback loop for accountability.

And lastly: the open data community would benefit immensely from a more clearly defined evidence base and theory of change associated with open data. We need proof that open data can be valuable in a variety of country contexts and for a variety of reasons such as economic development, accountable government or more effective public sector management.

Enter the Partnership for Open Data’s Impact Stories Competition!

Rahul Ghosh - February 20, 2014 in Open Data, Partnership for Open Data

We want to know how opening up data impact those in developing countries. The Partnership for Open Data (POD) is a partnership of institutions to research, support, train and promote open data in the context of low and middle income countries. We invite you to share with us your stories about how open data has positively impacted you, or those around you; technologically, politically, commercially, environmentally, socially, or in any other way.

banner POD

How has Open Data impacted you and your community?

Over the last decade, Open Data initiatives have become increasingly popular with both governments and civil society organisations. Through these initiatives they hope to tap into its potential benefits of innovation, delivery of better services more cost effectively, combating climate change, improving urban planning, and reducing corruption; to name but just a few of the possibilities.

This is the chance to tell your inspiring stories, and get them published.

A grand prize of $1000 (USD) is on offer and there are 2 x $500 (USD) runner up prizes!

This competition is being run by the Open Knowledge Foundation as part of the Partnership for Open Data, a joint initiative of the Open Knowledge Foundation, the Open Data Institute, and The World Bank.

Click to Enter the Competition.

Closing Date: 24th March 2014

Winners will be announced within three weeks of the closing date. Terms and Conditions apply.

Mapping the Open Spending Data Community

Neil Ashton - January 6, 2014 in Featured, Open Spending

Mapping the Open Spending Data Community

We’re pleased to announce the official release of “Mapping the Open Spending Data Community” by Anders Pedersen and Lucy Chambers, an in-depth look at how citizens, journalists, and civil society organisations around the world are using data on government finances to further their civic missions.

The investigation began in 2012 with three goals:

  • To identify Civil Society organisations (CSOs) around the world who are interested in working with government financial data
  • To connect these CSOs with each other, with open data communities, and with other key stakeholders to exchange knowledge, experiences, and best practices in relation to spending data
  • To discover how CSOs currently work with spending data, how they would like to use it, and what they would like to achieve

This report is the result. It brings together key case studies from organisations who have done pioneering work in using technology to work with public finance data in each of budgets, spending, and procurements, and it presents a curated selection of tools and other advice in an appendix.

As part of this research, we’ve also produced a four-part video series “Athens to Berlin“, which you can watch to meet some of the fascinating characters in the world of CSOs working with government spending data and to learn firsthand about their successes and their challenges.

Originally Published on the Open Spending Blog Jan 3rd, 2014

Extended: Open Data Scoping Terms of Reference

Heather Leson - December 31, 2013 in Open Data, Open Data Partnership For Development

The Open Data Partnership for Development Scoping Terms of Reference deadline has been extended until January 13, 2014. We have received some great submissions and want to give more people the best opportunity to tackle the project. Truly, we recognize that the holiday season is a busy time.

The Open Data Partnership for Development Scoping Terms of Reference opened on December 11, 2013 and will close on January 13, 2014 at 17:00 GMT.

Updated Open Data Partnership for Development – Scoping Terms of Reference

Help us get a current state Open Data Activity snapshot to guide our decisions for the Open Data Partnership for Development programmes. Proposals for a Scoping Analysis will address two objectives:

  • (i) identify potential funders and the key delivery partners in the Open Data ecosystem, and
  • (ii) map the existing efforts to support open data in developing countries and their status.

More about Open Data Partnership for Development

Happy New Year.