‘When AI Meets Open Data’: Reflections for Open Government Communities

I had the pleasure of being invited to the 2025 Open Government Partnership (OGP) Global Summit to participate in a panel titled “When AI Meets Open Data”, along with with Josema Alonso (SEMIC-EU), Alicia Garcia de Blas de la Serna (Ministry for the Digital Transformation and of the Civil Service, Spain), and Renato Berrino Malaccorto (Open Data Charter).

Thank you to Francisco Javier García-Vieira and Red.es for the invitation and excellent moderation. The panel took place in Vitoria-Gasteiz on 8 October 2025.

The following is a summary of my intervention.

Introduction

Before delving into the main topic, let’s establish three key points to guide the conversation:

What do we mean by AI? While AI has experienced a boom in the last 15 years, it has been around for decades. We are now witnessing the generative AI wave (triggered by the rise of LLMs), which followed the previous wave of predictive AI (triggered by the rise of machine learning around 2015). It’s important to remember that “AI” can mean many different things.
We must understand the current context and how it shapes expectations for what AI can do:
1. Currently, there is a great deal of speculative investment in AI, with a well-documented global race for dominance. So any development, even if it is not major, will be magnified as a mechanism of power projection between competitors.
2. Overall, outside the tech circles developing these technologies, there is a widespread lack of understanding of how these technologies work and what they are truly capable of.
3. This context reminds me of the Big Data era a decade ago, when business speeches were filled with promises of how it would change our lives, only for it to become a “boring” technology powering a small subset of today’s digital world.
Artificial Intelligence, as we are experiencing it now, is neither artificial nor intelligent. It is a technological tool—a powerful one, capable of remarkable things, but a tool nonetheless.

Boring vs. Shiny

Now that we’ve set the context, let’s explore what happens when Artificial Intelligence meets Open Data. I will distinguish between boring applications and shiny ones, using several use cases focused on CKAN and Open Data Editor – two applications I work with that form the backbone of current open data infrastructure.

Boring AI Use Cases

I use the term “boring” in a light-hearted way to describe AI use cases that were groundbreaking around 2015 but have since become normal, expected features in modern systems. These are stable, consolidated software features that provide real value but no longer attract significant investor interest.

Open data portals (like CKAN) can benefit hugely from several such AI applications:

Automatic Metadata Tagging: AI can classify datasets to assist with metadata filing.
Translations: Many data portals support multiple languages. AI-powered translation can save countless hours of human work and lower the barrier to publishing data.
Classification and Recommendations: Similar to how online shops suggest products based on purchase history, open data portals could use AI to suggest similar datasets. This would give discoverability and navigation a significant boost.

Shiny AI Use Cases

The current wave of Generative AI, particularly around LLMs, has created a new hype cycle. This technology promises to finally deliver a natural language interface to the digital world. Are we on the doorstep of true accessibility to information and the long-promised citizen engagement with open data portals? We shall see. To get there, we must first overcome a major technical barrier: hallucinations.

The term “hallucinations” is the ultimate marketing strategy to mask the fact that, by nature, LLMs are systems that make mistakes. Their outputs are based on regurgitated results from extremely synthesized information, which can often be flaky. Whether due to flawed probability calculations, biased sources, outdated training data, or information that simply doesn’t exist, the outputs will never be 100% accurate. Given this inherent limitation, the development community is now focused on creating systems that return verifiable results or can query real data sources instead of relying solely on what the LLM “knows.”

There has been significant progress in this direction. Take the Brave search engine, for example: it returns a summarized result alongside links to the source web pages. Open data portals are following a similar path, aiming to provide citizens with a search interface that, given a question in natural language, returns an answer citing the specific dataset used as a source. This is an exciting development, but it carries a risk: hallucinations could damage your portal’s reputation and further undermine the already fragile trust in government institutions.

Conclusions

AI and open data are a natural and powerful combination with many use cases to explore. The current wave of LLMs promises to dramatically improve how citizens access the information provided on open data portals, but the community is still working to make this a reliable reality. We are beginning to see pilots and demos, but as of this writing, none of the major portals have fully implemented these features.

At the Open Knowledge Foundation, we will be working on this during 2026. We will share not only the code but also our learnings on how to integrate AI into open data portals effectively. Stay tuned.