Working with UNHCR to better collect, archive and re-use data about some of the world’s most vulnerable people

Since 2018, the team at Open Knowledge Foundation has been working with the Raw Internal Data Library (RIDL) project team at UNHCR to build an internal library of data to support evidence-based decision making by UNHCR and its partners.

What’s this about?

The United Nations High Commissioner for Refugees (UNHCR) is a global organisation ‘dedicated to saving lives, protecting rights and building a better future for refugees, forcibly displaced communities and stateless people’.

Around the world, at least 82 million people have been forced to flee their homes. Many of these people are refugees and asylum seekers. Over half are internally displaced within the border of their own country. The vast majority of these people are hosted in developing countries. Learn more here.

UNHCR has a presence in 125 countries, with 90%+ of staff based in the field. An important dimension of their work involves collecting and using data – to understand what’s happening, to which people, where it’s happening and what should be done about it.

In the past, managing this data has been a huge challenge. Data was collected in a decentralised manner. It was then stored, archived, and processed in a decentralised manner. This meant that much of the value of this data was lost. Insights were undiscovered. Opportunities missed.

In 2019, the UNHCR released its Data Transformation Strategy 2020 – 2025 – with the vision of UNHCR becoming ‘a trusted leader on data and information related to refugees and other affected populations, thereby enabling actions that protect, include and empower’.

The Raw Internal Data Library (RIDL) supports this strategy by creating a safe, organized place for UNHCR to store its data , with metadata that helps staff find the data they need and enables them to re-use it in multiple types of analysis.

Since 2018, the team at Open Knowledge Foundation have been working with the RIDL team to build this library using CKAN – the open source data management system.

OKF spoke with Mariann Urban at UNHCR Global Data Service about the project to learn more.

Here is an extract of that interview, which has been edited for length and clarity.

Hi Mariann. Can you start by telling us why data is important for UNHCR

MU/UNHCR: That’s a great question. Pretty much everyone at UNHCR now recognises that good data is the key to achieving meaningful solutions for displaced people. It’s important to enable evidence-based decision making and to deliver our mandate. And also, it helps us raise awareness and demonstrate the impact of our work. Data is at the foundation of what UNHCR does. It’s also important for building strong partnerships with governments and other organisations. When we share this data, anonymised where necessary, it allows our partners to design their programmes better. Data is critical to generate better knowledge and insights. Secondary usage includes indicator baseline analysis, trend analysis, forecasting, modeling etc. Data is really valuable!

What kinds of datasets does UNHCR collect and use?

MU/UNHCR: We have people working in countries all over the world, most of them in the field. Every year UNHCR spends a huge amount of money collecting data. It’s a huge investment. Much of this data collection happens at the field level, organised by our partners in operations. They collect a multitude of operational data each year.

You must have lots of interesting data. Can you give us an example of one important dataset?

MU/UNHCR: One of the most valuable datasets is our registration data. Registering refugees and asylum seekers is the primary responsibility of governments. But if they require help, UNHCR provides support in that area.

In the past, How was data collected, archived and used at UNHCR?

MU/UNHCR: Let me give you an example about how it used to be. In the past, let’s imagine, there was a data collection exercise in Cameroon. Our colleagues finished the exercise, and the data stayed in the partner organisation, or sometimes with the actual person collecting the data. It was stored on hard drives, shared drives, email accounts etc. Then, the next person who wanted to work with the data, or a similar data set probably had no access to this data, to use as a baseline, or for trends analysis.

That sounds like a problem.

MU/UNHCR: Yes! This was the problem statement that led to the idea of the Raw Internal Data Library (RIDL). Of course, we already have corporate data archiving solutions. But we realised we needed something more.

Tell us more about RIDL

MU/UNHCR: The main goal of RIDL is to stop data loss. We know that the organisation cannot capitalise on data if they are lost or forgotten, or not stored in a format that is interoperable, machine-readable, and does not include a minimum set of metadata to ensure appropriate further use.

RIDL is built on CKAN. Why is that?

MU/UNHCR: Our team had some experience with CKAN, which is already used in the humanitarian data community. UNHCR has been an active user of OCHA’s Humanitarian Data Exchange (HDX) platform to share aggregate data externally and we closely collaborate with its technical team. After a market research, we realised that CKAN was also a good solution for an internal library – the data is internal, but it needs to be visible to a lot of people inside the organisation.

What about external partners and the media? Can they access RIDL datasets?

MU/UNHCR: There are some complicated issues around privacy and security. Some of the data we collect is extremely sensitive. We have to be strong custodians of this data to ensure it is used appropriately. Once we analyse the data, we can take the next step and share it externally, of course. Sometimes our data include personal identifiers, it therefore must be cleaned and anonymised to ensure that data subjects are not identifiable. Once we have a dataset that is anonymised – we use our Microdata Library to publish it externally. Thus RIDL is the first step in a long chain of sharing our data with partners, governments, researchers and the media.

RIDL is a technological solution. But I imagine there is some cultural change required for UNHCR to reach its vision of becoming a data-enabled organisation.

MU/UNHCR: Yes of course, achieving these aspirations is not just about getting the technology right. We also have to make cultural, procedural and governance changes to become a data-enabled organisation. It’s a huge project. It needs a culture shift in UNHCR – because even if it’s internal, it’s a bit of work to convince people to upload. The metadata is always visible for everyone internally, but the actual data itself can be restricted and only visible following a request and evaluation. We want to be a trusted leader, but we also want to use that data to arrive at a better solution for refugees, to enrich our partnerships, and to enable evidence-based decision making – which is what we always aim to do.

Thanks for sharing your insights with us today Mariann.

MU/UNHCR: No problem. It’s been a pleasure.

Find out more

Open Knowledge Foundation is working with UNHCR to deliver the Raw Internal Data Library (RIDL). If you work outside of UNHCR, you can access UNHCR’s Microdata Library here. Learn more about CKAN here.