Reflecting on the first cohort of Frictionless Data Reproducible Research fellows

It is truly bittersweet to say that we are at the end of the first cohort of the Frictionless Data Reproducible Research fellows.

Over the past nine months, I have had the pleasure of working with Monica Granados, Selene Yang, Daniel Ouso and Lily Zhao during the fellows programme. Combining their diverse backgrounds (from government data to mapping data, from post-PhD to graduate student), they have spent many hours together learning how to advocate for open science and how to use the Frictionless Data code and tools to make their data more reproducible.

Together, they have also written several blogposts, presented a talk and given a workshop. And they did all of this during a global pandemic! I feel lucky to have worked with them, and will be eagerly watching their contributions to the open science space.

Each fellow wrote a final blogpost reflecting on their time with the programme. You can read the originals here, and I have also republished them below:

Lily Zhao: Reflecting on my time as a fellow

As one of the inaugural Reproducible Research Fellows of Frictionless Data, I am eager to share my experience of the program with you about working with Sele, Ouso and Monica under the leadership of Lilly Winfree this year. I could not have asked for a better group of individuals to work remotely with.

Sele, Ouso, Monica and I spent the last nine months discussing common issues in research reproducibility and delving into the philosophy behind open data science. Together we learned to apply Frictionless Data tools to our own data and mastered techniques for streamlining the reproducibility of our own research process. Lilly was an excellent mentor throughout the program and was always there to help with any issues we ran into.

This was also one of my first experiences working entirely remotely on a team across many time zones. Through the use of Google hangout, Zoom and Slack the entire process was easier than I ever thought it could be. It is wonderful that through technology we are able to collaborate across the world easier than ever before.

We were also able to give multiple presentations together. Monica and I were joint speakers as part of the csv conference where we talk about our experience as fellows, and our experience using Frictionless Data tools. With so many people on the Zoom call it really felt like were part of a large community.

The four of us also led a hands-on workshop introducing the Data Package Creator and GoodTables web interface tools. This was especially fun for me because we used a subset of my French Polynesia interview data as practice data for all workshop participants. Many of the questions asked by participants mirrored questions the four of us had already worked through together, so it was great to be able to share what we had learned with others.

I look forward to sharing these tools and the philosophy of open data science throughout my career and am very grateful to the Open Knowledge Foundation for this amazing learning opportunity. If you would like to learn more about my experience in the Frictionless Data Fellows program please feel free to reach out to me personally!

Monica, Sele, Lilly, Ouso and I on our most recent conference call 🙂

Monica Granados: Gimme Dat Data (in a validated Data Package)

As a scientist I collect a lot of data. Especially about animals that live in the water – fish, mussels, crayfish. This data is not only useful to me but it can be used by others to improve the power in their studies, increase geographic range or phylogenetic diversity for example.

Prior to the Frictionless Data for Reproducible Research Fellowship, I had my data on GitHub along with a script that would use rcurl to pull the data from the repository. While the repository was accompanied by a README, the file didn’t have much information other than the manuscript which included the data. This structure facilitated reproducibility but not reusability. Conceivably if you wanted to use my data for your own experiments you could have contextualised the data by using the relevant manuscript, but it still would have been a challenge without any metadata, not to mention any potential structural errors you could have encountered that I didn’t catch when I uploaded the data.

It was through the introduction of Frictionless Tools, however that I realised that there was more I could do to make my science even more transparent, reproducible and reusable. The fellowship syllabus was structured in such a way that by learning about the tools we learned what the tools were facilitating – better data sharing. The fellows would learn how to use the tool through a self guided lesson and then answer questions on Slack which asked us to interrogate why the tool was built the way it was. These lessons were also supported by calls with the full cohort of fellows where we discussed what we had learned, problems we were encountering as we used the tools with our own data and reviewed papers on open science. The fellowship culminated with a workshop delivered by all four fellows attended by over 40 participants and a presentation at csv,conf.

Now when I share data as a data package I know I have validated by tabular data for structural errors and the file contains metadata that contextualises the data.

Having the opportunity to be a part of the inaugural cohort has been a wonderful experience. I learned new tools and information that I will take and share for the rest of my career, but also gained new colleagues and open science friends in my fellow fellows.

Daniel Ouso: Better Data, one resource at a time – my fellowship experience

Getting into the Frictionless Data fellowship

My background is largely in molecular biology, particularly infection diagnostics targeting arthropod viruses, bacteria and protozoa. I have a relatively shorter bioinformatics experience, but this is the direction am passionate to build my research occupation in. I first heard about Frictionless data from the African Carpentries instructors’ mailing list. It was the inaugural fellowship call that had been shared by Anelda. I caught it at the nick of time; deadline submission! By the way, you can watch for annual calls and other interesting stuff by following @frictionlessd8a. The call for the second cohort just closed in June and was open from late April. The fellowship starts in September.

On-boarding

Lilly arranged the first-time meeting to usher me into the fellow, after a few email correspondence. I got introduced Jo Barrat who patiently took me through my paces completing logistical preliminaries. I was really looking forward to getting started. The on-boarding enabled acquaintance with the rest of the fellows, awesome people. I was excited!

Context

Overall, the world is in search of and is promoting better ways to work with data, whether it is collecting data or accessibility or novel ways to analyse high-throughput data or dedicated workflows to publish data alongside accustomed scientific publishing or moving/working with data across frameworks or merely storage and security. All these, plus other factors, provide avenues to exhaustively interrogate data in multiple ways, thus promoting improved data usefulness. This has been arguably under-appreciated in times past. Frictionless data, through its Progressive Data Toolkit and with the help of organisations like OKF and funding by Sloan Foundation, is dedicated to alleviating hindrances to some of the aforementioned efforts. People empowerment is a core resource to the #BetterData dream.

The fellowship

An aspect of any research is the collection of data, which is applied to test hypotheses under study. The underlying importance of data, good data for that matter, in research is therefore unquestionable. Approaches to data analysis may differ from field to field, yet there are conventional principles that do not discriminate fields; such are the targets to Frictionless Data. I jump at the opportunity to learn ways to ramp up my data workflow efficiency, with a touch of research openness and reproducibility.

The journey took off withdrawing a meticulous roadmap, which I found very helpful, and seem to end with this – sharing my experience. In between exciting things happened. In case one was coming in a little rusty with their basic Python/R, they were catered for early on, though you didn’t exactly need them to use the tools. To say, literally, ZERO programming skills prerequisite. There were a plethora of resources, and help from the fellows, not to mention from the ever welcoming Lilly. The core sections of the Fellowship were prefaced by grasping basic components like the JSON schema data interchange format. Following were the core tools and their specifications. The Data Package Creator tool is impressively emphatic on capturing metadata, a backbone theme for reproducibility. I found Table Schema and Schema specifications initially confusing. Other fellows and I have previously shared on the Data Package Creator and GoodTables, tools for creating and validating data packages respectively. These tools are very progressive, continually incorporating feedback from the community, including fellows, to improve user experience. So don’t be surprised at a few changes since the fellows’ blogs. In fact, a new entrant, which I only knew of recently, is the DataHub tool – “Is a useful solution for sharing datasets, and discovering high-quality datasets that others have produced”. I am yet to check it out.

Besides the main focus of the fellowship, I got to learn a lot covering organisational skills and tools such as GitHub projects, Toggl for time-monitoring, general remote working, among others. I got introduced to new communities/initiatives such as PREreview; my first time to participate in open research reviewing. The fellows were awesome to work with and Lilly Winfree provided the best mentorship.

Sometimes problems are foreseen and contingencies planned, other times unforeseen surprises rear their heads into our otherwise “perfect” plan. Guess what? You nailed it! COVID-19. Such require adaptability akin to that of the fictional El Professor in the Money Heist. Since we could not organise the in-person seminar and/or workshops as part of the fellowship, we collectively adopted a virtual workshop. It went amazingly well.

What next

Acquired knowledge and skills become more useful when implemented. My goal is to apply them in every opportune opening and to keep learning other integrative tools. Yet, there is also this about knowledge; it is to be spread. I hope to compensate for suspended social sessions and to keep engagement with @frictionlessd8a to continue open and reproducible research advocacy.

Conclusion

Tools that need minimal to no coding experience support well the adoption of good data hygiene practices, more so in places with scanty coding expertise. The FD tools will surely help your workflows with some greasing regardless of your coding proficiency, especially for tabular data. This is especially needful seeing the deluge of data persistently churned out from various sources. Frictionless Data is for everyone working with data; researcher, data scientist or data engineers. The ultimate goal is to work with data in an open and reproducible way, which is consistent with modern scientific research practice. A concerted approach is also key, I am glad to have represented Africa in the fellowship. Do not hesitate to reach out if you think I can be resourceful to your cause.

Sele Yang: A seguir reproduciendo conocimiento!

Termina un gran proceso para la primera cohorte del Frictionless Data for Reproducible Research Fellowship. Un proceso de grandísimos y valiosos aprendizajes que, sí y sólo sí pudieron darse, gracias al trabajo colaborativo entre todas las personas que participaron.

En un inicio, recuerdo el gran miedo (que de alguna forma todavía persiste, pero más levemente) de no contar con las habilidades técnicas requeridas para poder llevar a cabo mi proyecto, pero poco a poco fui conociendo y sintiéndome apoyada por mis compañeros y compañeras, que con muchísima paciencia me llevaron de la mano para no perderme en el proceso.

Recorrí gracias a este equipo, las playas de M’orea con los datos de Lily, aprendí de formas de investigación por fuera de mi campo de experiencia con Ouso y Mónica. Reconocí el gran trabajo que realizan investigadores e investigadoras para defender el conocimiento más abierto, equitativo y accesible.

Si bien nuestro recorrido compartido termina acá, puedo resaltar que a pesar de la crisis que nos llevó a cambiar muchas acciones con COVID-19 durante nuestro programa, logramos encontrarnos aunque fuese virtualmente, no sólo para compartir entre nosotres, sino también con una gran audiencia para nuestro taller sobre el uso de herramientas y metodologías del programa. Una gran actividad para reforzar la importancia de compartir conocimiento, y hacerlo más accesible, mucho más en tiempos de crisis.

Agradezco al Open Knowledge Foundation por haber llevado a cabo este programa, y les invito a todas las personas a que recorran la información que produjimos durante estos meses de trabajo. Termino este proceso de aprendizaje con la convicción todavía más fuerte sobre lo necesario qué son los procesos colaborativos que buscan aperturar y democratizar la ciencia y el conocimiento. Mucho más en estos tiempos en los que la colaboración y puesta en común del aprendizaje nos hará más fuertes como sociedad.