csv,conf,v3 - Open Knowledge Blog

The third manifestation of everyone’s favorite community conference about data—csv,conf,v3—happened earlier this May in Portland, Oregon. The conference brought together data makers/doers/hackers from various backgrounds to share knowledge and stories about data in a relaxed, convivial, alpaca-friendly (see below) environment. Several Open Knowledge International staff working across our Frictionless Data, OpenSpending, and Open Data for Development projects made the journey to Portland to help organize, give talks, and exchange stories about our lives with data. Thanks to Portland and the Eliot Center for hosting us. And, of course, thanks to the excellent keynote speakers Laurie Allen, Heather Joseph, Mike Bostock, and Angela Bassa who provided a great framing for the conference through their insightful talks.

Here’s what we saw.

Talks We Gave

The first priority for the team was to present on the current state of our work and Open Knowledge International’s mission more generally.

In his talk, Continuous Data Validation for Everybody, developer Adrià Mercader updated the crowd on the launch and motivation of goodtables.io:

It was a privilege to be able to present our work at one of my favourite conferences. One of the main things attendees highlight about csv,conf is how diverse it is: many different backgrounds were represented, from librarians to developers, from government workers to activists.

Across many talks and discussions, the need to make published data more useful to people came up repeatedly. Specifically, how could we as a community help people publish better quality data?

Our talk introducing goodtables.io presented what we think will be a dominant approach to approaching this question: automated validation. Building on successful practices in software development like automated testing, goodtables.io integrates within the data publication process to allow publishers to identify issues early and ensure data quality is maintained over time.

The talk was very well received, and many people reached out to learn more about the platform. Hopefully, we can continue the conversation to ensure that automated (frictionless) data validation becomes the standard on all data publication workflows.

David Selassie Opoku presented When Data Collection Meets Non-technical CSOs in Low-Income Areas:

csv,conf was a great opportunity to share highlights of the OD4D (and School of Data) team’s data collection work. The diverse audience seemed to really appreciate insights on working with non-technical CSOs in low-income areas to carry out data collection. In addition to highlighting the lessons from the work and its potential benefit to other regions of the world, I got to connect with data literacy organisations such as Data Carpentry who are currently expanding their work in Africa and could help foster potential data literacy training partnerships.

As a team working with CSOs in low-income areas like Africa, School of Data stands to benefit from continuing conversations with data “makers” in order to present potential use cases. A clear example I cited in my talk was Kobo Toolbox, which continues to mitigate several daunting challenges of data collection through abstraction and simple user interface design. Staying in touch with the csv,conf community may highlight more such scenarios which could lead to the development of new tools for data collection.

Paul Walsh, in his talk titled Open Data and the Question of Quality (slides) talked about lessons learned from working on a range of government data publishing projects and we can do as citizens to demand better quality data from our governments:

Talks We Saw

Of course, we weren’t there only to present; we were there to learn from others as well. Before the conference, through our Frictionless Data project, we have been lucky to be in contact with various developers and thinkers around the world who also presented talks at the conference. Eric Busboom presented Metatab, an approach to packaging metadata in spreadsheets. Jasper Heefer of Gapminder talked about DDF, a data description format and associated data pipeline tool to help us live a more fact-based existence. Bob Gradeck of the Western Pennsylvania Regional Data Center talked about data intermediaries in civic tech, a topic near and dear to our hearts here at Open Knowledge International.

Favorite Talks

Paul’s:

“Data in the Humanities Classroom” by Miriam Posner
“Our Cities, Our Data” by Kate Rabinowitz
“When Data Collection Meets Non-technical CSOs in Low Income Areas” by David Selassie Opoku

David’s:

“Empowering People By Democratizing Data Skills” by Erin Becker
“Teaching Quantitative and Computational Skills to Undergraduates using Jupyter Notebooks” by Brian Avery
“Applying Software Engineering Practices to Data Analysis” by Emil Bay
“Open Data Networks with Fieldkit” by Eric Buth

Jo’s:

“Smelly London: visualising historical smells through text-mining, geo-referencing and mapping” by Deborah Leem
“Open Data Networks with Fieldkit” by Eric Buth
“The Art and Science of Generative Nonsense” Mouse Reeve
“Data Lovers in in a Dangerous Time” by Bendan O’Brien

Data Tables

This csv,conf was the first csv,conf to have a dedicated space for working with data hands-on. In past events, attendees left with their heads buzzing full of new ideas, tools, and domains to explore but had to wait until returning home to try them out. This time we thought: why wait? During the talks, we had a series of hands-on workshops where facilitators could walk through a given product and chat about the motivations, challenges, and other interesting details you might not normally get to in a talk. We also prepared several data “themes” before the conference meant to bring people together on a specific topic around data.

In the end, these themes proved a useful starting point for several of the facilitators and provided a basis for a discussion on cultural heritage data following on from a previous workshop on the topic. The facilitated sessions went well.

Our own Adam Kariv walked through Data Package Pipelines, his ETL tool for data based on the Data Package framework. Jason Crawford demonstrated Fieldbook, a tool for managing easily managing a database in-browser as you would a spreadsheet. Bruno Vieira presented Bionode, going into fascinating detail on the mechanics of Node.js Streams. Nokome Bentley walked through a hands-on introduction to accessible, reproducible data analysis using Stencila, a way to create interactive, data-driven documents using the language of your choice to enable reproducible research. Representatives from data.world, an Austin startup we worked with on an integration for Frictionless Data also demonstrated uploading datasets to data.world. The final workshop was conducted by several members of the Dat team, including co-organizer Max Ogden, with a super enthusiastic crowd.

Competition from the day’s talks was always going to be fierce, but it seems that many attendees found some value in the more intimate setting provided by Data Tables.

Thanks

If you were there at csv,conf in Portland, we hope you had a great time. Of course, our thanks go to the Gordon and Betty Moore Foundation and to Sloan Foundation for enabling me and my fellow organizers John Chodacki, Max Ogden, Martin Fenner, Karthik, Elaine Wong, Danielle Robinson, Simon Vansintjan, Nate Goldman and Jo Barratt who all put so much personal time and effort to bringing this all together.