Have you ever looked back at a graph of fluorescence change in neurons or gene expression data in C. elegans from years ago and wondered how exactly you got that result? Would you have enough findable notes at hand to repeat that experiment? Do you have a quick, repeatable method for preparing your data to be published with your manuscripts (as required by many journals and funders)? If these questions give you pause, we are interested in helping you!

 

For many data users, getting insight from data is not always a straightforward process. Data is often hard to find, archived in difficult to use formats, poorly structured or incomplete. These issues create friction and make it difficult to use, publish, and share data. The Frictionless Data initiative aims to reduce friction in working with data, with a goal to make it effortless to transport data among different tools and platforms for further analysis.

 

The Frictionless Data for Reproducible Research project, part of the Open Knowledge Foundation and funded by the Sloan Foundation, is focused on helping researchers and the research community resolve data workflow issues.

Over the last several  years, Frictionless Data has produced specifications, software, and best practices that address identified needs for improving data-driven research such as generalized, standard metadata formats, interoperable data, and open-source tooling for data validation. 

 

For researchers, Frictionless Data tools, specifications, and software can be used to:

    • Improve the quality of your dataset
    • Quickly find and fix errors in your data
    • Put your data collection and relevant information that provides context about your data in one container before you share it
    • Write a schema – a blueprint that tells others how your data is structured, and what type of content is to be expected in it
    • Facilitate data reuse by creating machine-readable metadata
    • Make your data more interoperable so you can import it into various tools like Excel, R, or Python
    • Publish your data to repositories more easily
    • See our open source repositories here
    • Read more about how to get started with our Field Guide tutorials

 

Importantly, these tools can be used on their own, or adapted into your own personal and organisational workflows. For instance, neuroscientists can implement Frictionless Data tooling and specs can help keep track of imaging metadata from the microscope to analysis software to publication; optimizing ephys data workflow from voltage recording, to tabular data, to analyzed graph; or to make data more easily shareable for smoother publishing with a research article.

We want to learn about your multifacet workflow and help make your data more interoperable between the various formats and tools you use.

We are looking for researchers and research-related groups to join Pilots, and are particularly keen to work with: scientists creating data, data managers in a research group, statisticians and data scientists, data wranglers in a database, publishers, and librarians helping researchers manage their data or teaching data best practices. The primary goal of this work will be to work collaboratively with scientists and scientific data to enact exemplar data practice, supported by Frictionless Data specifications and software, to deliver on the promise of data-driven, reproducible research. We will work with you, integrating with your current tools and methodologies, to enhance your workflows and provide increased efficiency and accuracy of your data-driven research.  

 

Want to know more? Through our past Pilots, we worked directly with organisations to solve real problems managing data:

  • In an ongoing Pilot with the Biological and Chemical Oceanography Data Management Office (BCO-DMO), we helped BCO-DMO develop a data management UI, called Laminar, which incorporates Frictionless Data Package Pipelines on the backend. BCO-DMO’s data managers are now able to receive data in various formats, import the data into Laminar, and perform several pipeline processes, and then host the clean, transformed data for other scientists to (re)use.  The next steps in the Pilot are to incorporate GoodTables into the Laminar pipeline to validate the data as it is processed. This will help ensure data quality and will also improve the processing experience for the data managers.
  • In a Pilot with the University of Cambridge, we worked with Stephen Eglen to capture complete metadata about retinal ganglion cells in a data package. This metadata included the type of ganglion cell, the species, the radius of the soma, citations, and raw images. 
  • Collaborating with the Cell Migration Standard Organization (CMSO), we investigated the standardization of cell tracking data. CMSO used the Tabular Data Package to make it easy to import their data into a Pandas dataframe (in Python) to allow for dynamic data visualization and analysis.

To find out more about Frictionless data visit frictionlessdata.io or email the team frictionlessdata@okfn.org.

+ posts

Lilly is the Product Manager for the Frictionless Data for Reproducible Research project. She has her PhD in neuroscience from Oregon Health and Science University, where she researched brain injury in fruit flies and became an advocate for open science and open data. Lilly believes that the future of research is open, and is using Frictionless Data tooling within the researcher community to make science more reproducible.