We are excited to share project updates from our 2020 Frictionless Data Tool Fund! Our five grantees are about half-way through their projects and have written updates below to share with the community. These grants have been awarded to projects using Frictionless Data to improve reproducible data workflows in various research contexts. Read on to find out what they have been working on and ways that you can contribute!
Carles Pina Estany: Schema Collaboration
The goal of the schema-collaboration tool fund is to create an online platform to enable data managers and researchers to collaborate on describing their data through writing Frictionless data package schemas. The basics can be seen and tested on the online instance of the platform: the data manager can create a package, assign data packages to researchers, add comments and send a link to the researchers which will use datapackage-ui to edit the package and save it, making it available for the data manager. The next steps are to add extra fields to datapackage-ui and to work on the integration between schema-collaboration and datapackage-ui to make maintenance easier. Carles also plans to have an output of the datapackage as a PDF to help data managers and researchers spot errors. Progress can be followed through the project Wiki and feedback would be welcome through Github issues.
Read more about Carles’ project here: https://frictionlessdata.io/blog/2020/07/16/tool-fund-polar-institute/
Simon Tyrrell: Frictionless Data for Wheat
As part of the Designing Future Wheat project, Simon and team have repositories containing a wide variety of heterogeneous data. They are trying to standardise how to expose these datasets and their associated metadata. The first of their portals stores its data in an iRODS (https://irods.org/) repository. They have recently completed the additions to our web module, eirods-dav, that uses the files, folders and metadata stored within this repository to automatically generate the Data Packages for the datasets. The next step is to look at expanding the data that is added to the Data Packages and similarly automatically expose tabular data as Tabular Data Packages. The eirods-dav GitHub repository is at https://github.com/billyfish/eirods-dav and any feedback or queries are very welcome.
Read more about Simon’s project here: https://frictionlessdata.io/blog/2020/08/17/frictionless-wheat/
Stephen Eglen: Analysis of spontaneous activity patterns in developing neural circuits using Frictionless Data tools
Stephen and Alexander have been busy over the summer integrating the frictionless tools into a workflow for analysis electrophysiological datasets. They have written converters to read in their ASCII- and HDF5-based data and convert them to frictionless containers. Along the way, they have given helpful feedback to the team about the core packages. They have settled on the python interface as the most feature rich implementation to work with. Alexander has now completed his analysis of the data, and we are currently working on a manuscript to highlight our research findings.
Read more about Stephen’s project here: https://frictionlessdata.io/blog/2020/08/03/tool-fund-cambridge-neuro/
Asura Enkhbayar: Metrics in Context
How much do we know about the measurement tools used to create scholarly metrics? While data models and standards are neither new nor uncommon to the scholarly space, “Metrics in Context” is all about the very apparatuses we use to capture the scholarly activity embedded in those metrics. In order to confidently use citations and altmetrics in research assessment or hiring and promotion decisions, we need to be able to provide standardized descriptions of the involved digital infrastructure and acts of capturing. Asura is currently refining the conceptual model for scholarly events in the digital space in order to be able to account for various types of activities (both traditional and alternative scholarly metrics). After a review of the existing digital landscape of scholarly infrastructure projects, he will dive into the implementation using Frictionless. You can find more details on the open roadmap on Github and feel free to submit questions and comments as issues!
Read more about Asura’s project here: https://frictionlessdata.io/blog/2020/09/17/tool-fund-metrics/
Nikhil Vats: Adding Data Package Specifications to InterMine’s im-tables
Nikhil is working with InterMine to add data package specifications to im-tables (a library to query biological data) so that users can export metadata along with query results. Right now, the metadata contains field names, their description links, types, paths, class description links and primary key(s). Nikhil is currently figuring out ways to get links for data sources, attribute descriptions and class descriptions from their fair terms (or description links). Next steps for the project include building the frontend for this feature in im-tables and getting the rest of required information like result file format (CSV, TSV, etc.) about data in the datapackage.json (metadata) file. You can contribute to this project by opening an issue here or reaching out at chat.intermine.org.
Read more about Nikhil’s project here: https://frictionlessdata.io/blog/2020/07/10/tool-fund-intermine/
Lilly is the Product Manager for the Frictionless Data for Reproducible Research project. She has her PhD in neuroscience from Oregon Health and Science University, where she researched brain injury in fruit flies and became an advocate for open science and open data. Lilly believes that the future of research is open, and is using Frictionless Data tooling within the researcher community to make science more reproducible.