How Open Data Editor makes Genomic data approachable for our bioinformatics community

In collaboration with Nyasita Ondari, Michael Landi and David Kiragu

When we first came across Open Data Editor (ODE), my team and I were intrigued by its potential to simplify how researchers interact with complex genomic datasets. Genomic data, by its nature, is messy and requires a good amount of wrangling before it can be meaningfully analysed. For novice researchers starting their bioinformatics and/or computational biology journey and those without strong computational backgrounds, the process can be incredibly intimidating. With ODE we saw a chance to lower that entry barrier.

At the Bioinformatics Hub of Kenya initiative (BHKi), we work with a broad range of learners — from undergraduate students to PhD researchers and even lecturers, many of whom don’t have access to institutional datasets or lack pathways to secure funding to generate new ones. It is therefore essential for us to teach them how to use open data or reuse data as a core part of our peer mentorship and open science sensitisation efforts. We applied to explore ODE not just to test its capabilities with tabular genomic data and community data, but to also evaluate how it could serve as a teaching and onboarding tool for reusing (open) data in our community with the main question in mind being “Could this platform make data cleaning and exploration approachable and less intimidating for new learners in our BHKi community?”

My experience with ODE was genuinely refreshing. I tested it using health and environmental datasets from my PhD—sources like DEFRA and the Global Burden of Disease (GBD). As someone new to working with GIS data and patient health records, I initially found tools like ArcGIS intimidating. It was hard to grasp how the data should be structured or layered, and the relationships between datasets weren’t obvious. ODE simplified everything. It made the data feel approachable and helped me visualise its potential. I related this to how early career researchers in our community — our target users — might gain confidence through such a tool. I also appreciated how ODE highlighted the importance of metadata: once I corrected formatting issues, data clarity and usability improved dramatically.

Other members of my team also shared their experiences. For instance, Kiragu who explored a large taxon report table from a metagenomic pipeline (167 columns, 23,869 rows) and a smaller parsed FASTA file from the NCBI virus database (approximately 100 rows, 10 metadata columns, and full sequence column) stated that ODE handled both datasets without errors, indicating its reliability and potential utility in bioinformatics analysis pipelines. This statement was corroborated by another colleague, Nyasita, who used ODE for our community data which we have collected over the past five years showing ODE’s usefulness in tidying and organising tabular data. Mike, who used the tool with fieldwork records found its usefulness in standardizing data collection protocols.

However, we also noticed some shortcomings. For instance, large text strings like DNA sequences didn’t display well in the metadata view and that ODE struggled to handle complex multi-table format with internal links for metabolomics data. Although these may be isolated cases, they could limit usability in genomics. With these challenges we would recommend a few more features to support multi-table schemas.

Additionally, a downloadable log file detailing the flagged issues and corrective actions would greatly enhance reproducibility, especially in real-world research projects and workflows where documenting cleaning steps is critical. Lastly, exporting the metadata schema in XML format would be invaluable for bioinformatics users like us, as it is both structured and machine-readable.

Throughout this pilot, the OKFN coaching team was very responsive and encouraging. Whenever we highlighted our challenges and needed clarification, Romina in particular was just an email away which really enhanced our experience and kept us motivated. The enthusiasm of the team was also amazing. The way Romina received and followed up on the experiences and narratives to make the tool better was inspiring. Sharing our story and motivation with Lucas and his translation of this to the blogs without taking away from experiences was also encouraging. Overall, the environment created by the whole team instilled confidence to freely express our experience. Also, the mini grant we received enabled us to bring ODE to our community and allowed participants to explore the tool hands-on during our “ODE hack day.”

Open Data Editor in Action: Enhancing genomic data literacy among researcher communities in Kenya

Bioinformatics Hub of Kenya initiative (BHKi) was able to find errors in over-complex spreadsheets and use ODE’s metadata panel to standardise schemas for future surveys.

About the Open Data Editor

The Open Data Editor (ODE) is Open Knowledge’s new open source desktop application for nonprofits, data journalists, activists, and public servants, aiming at helping them detect errors in their datasets. It’s a free, open-source tool designed for people working with tabular data (Excel, Google Sheets, CSV) who don’t know how to code or don’t have the programming skills to automatise the data exploration process.

Simple, lightweight, privacy-friendly, and built for real-world challenges like offline work and low-resource settings, ODE is part of Open Knowledge’s initiative The Tech We Want — our ambitious effort to reimagine how technology is built and used.

And there’s more! ODE comes with a free online course that can help you improve the quality of your datasets, therefore making your life/work easier.

Download Open Data Editor

↪ Take the course: Learn how to use ODE

All of Open Knowledge’s work with the Open Data Editor is made possible thanks to a charitable grant from the Patrick J. McGovern Foundation. Learn more about its funding programmes here.