
This is the eighteenth conversation of the 100+ Conversations to Inspire Our New Direction (#OKFN100) project.
Starting in 2023, we are meeting with over 100 people to discuss the future of open knowledge, shaped by a diverse set of visions from artists, activists, academics, archivists, thinkers, policymakers, data scientists, educators, and community leaders from around the world.
How can openness accelerate and strengthen struggles against the complex challenges of our time? This is the key question behind conversations like the one you can read below.
*
In today’s conversation, we speak to Andrés Vázquez, a senior developer at the Open Knowledge Foundation who specialises in CKAN – the world’s most widely used open-source data management system.
Vázquez is an advocate for civic and government technology and has been a contributor to Spanish-language open data communities for many years. He is also a member of Open Data Córdoba. He has recently worked on implementing open data portals at various levels, ranging from local governments to global multilateral institutions. He currently lives in Mendiolaza, Argentina.
In this interview, conducted in the context of the reactivation of the CKAN community at a meeting in Bologna, Italy, he reflects on the present and future of CKAN and Spanish-language open data communities, as well as avenues for innovation in open data.
We hope you enjoy reading it.
*
Lucas Pretti: CKAN is the industry standard for open data portals. I would compare it to WordPress. Around 40% of websites are said to use that platform, and it is estimated that between 50% and 70% of open data portals in countries and large organisations use CKAN. But is CKAN up to the task, given the current challenges?
Andrés Vázquez: Any government, organisation or company planning to open up their data will come across CKAN. Just as anyone who wants to create a website comes across WordPress and considers it as an option, the same is true of CKAN in the field of open data.
Anyone who wants to take this approach seriously will most likely end up choosing it, because it is free software and there is an ecosystem of companies offering services around CKAN. Despite its technical complexity, it is the most established tool.
Hundreds of portals around the world use it. Although there is no exact official number, there is a website that attempts to keep track of all of them. However, we can say that it is currently used by governments around the world, including the United States, Argentina and my own province of Córdoba.
In that sense, yes, CKAN is to open data what WordPress is to the web. It is a ready-to-use technology that allows you to set up an open data portal quickly.
Lucas Pretti: Could you tell me a little more about the pros and cons of CKAN in the current landscape? You mentioned some technical aspects earlier; I’m interested in understanding the advantages and disadvantages it offers today.
Andrés Vázquez: I’ll tell you from my experience. I was in charge of an open government project in a very important city in my province, and I had to decide whether or not to use CKAN. Thanks to that experience, I can clearly point out the pros and cons.
The first advantage is that it is free software with an active community behind it. It is not only free, but also designed to be extensible, allowing customisations and modifications to be applied in a separate environment, which provides a high degree of flexibility. It is not “canned” software that you have to use as it is; rather, it offers many possibilities for adaptation. This is undoubtedly the biggest advantage.
Another positive aspect is the community: many people work with CKAN, so it is easy to find suppliers and support.
One drawback is that the community mainly works in English. Not so many companies offer services in Spanish, although at Open Knowledge, we have promoted projects in this language because we have capacity in both English and Spanish. The situation is better in Portuguese, because the Brazilian community has always been one of the most active and robust open data communities. However, outside of English, the ecosystem is generally weaker.
Another disadvantage is related to CKAN’s architecture: it requires several different internal services, so it’s not a simple application that only requires a database. This can complicate implementation. We have encountered governments that wanted to use CKAN but were unable to do so and ended up opting for other solutions.
In any case, there is an ongoing effort within the CKAN community to simplify the installation process drastically, making it as easy as installing WordPress. Currently, this is not the case, which is a weakness.
Lucas Pretti: I wanted to discuss the topic of language. You are a long-standing contributor to and participant in the open data movement, particularly in the Spanish-speaking world. You mentioned that the CKAN community could develop further. But beyond CKAN, what is your view on the current state and maturity of open data communities in Spanish, not only in Latin America, but worldwide?
Andrés Vázquez: Good question. I started participating in the open data community in the middle of the last decade, at a time when it was experiencing a significant boom. In Argentina, many data portals were launched, including ones for cities and provinces. If I’m not mistaken, the national government was also launching initiatives. In fact, I would say that open data became fashionable, driven by a strong demand for transparency.
I have always believed that open data should be considered part of a city’s or country’s infrastructure, just like a bridge. If there is high-quality open data, preferably in real time, entrepreneurs, journalists and universities can create a lot of value from it. While citizens certainly have a right to open data, I have always emphasised its value as infrastructure.
Within the community, there were different positions: some were more focused on transparency and fighting corruption with a combative discourse, while others, like me, were more focused on the potential of data as a development asset.
Over time, as more and more people adopted data portals simply because they were trendy, a sense of disappointment set in as the promise that all data would be open was not fulfilled. The trend passed, and the initial momentum deflated.
Today, I would say we are in a calmer state. Governments continue to open up data, but there is no longer strong social demand for it or for more transparency. Recently, someone wrote to tell me that the open data we had promoted in my previous place of work had not been updated in years. This made me a little sad because it reflects what happened: there was a boom, then a bust. Now, we need to reactivate those communities and find new meaning for them.
Lucas Pretti: You are now entering the world of education with your academy. This makes me think about the younger generation who are just starting out in this field. Is there a generation interested in open data, one that values and understands it? Or not so much? I am referring to students of computer science, sociology and public policy, among others. What is your outlook on this scenario, and what concerns you?
Andrés Vázquez: As you say, I am starting a personal venture related to teaching programming. There, I want to encourage the use of open data to foster active citizenship, encouraging students to examine, analyse and draw conclusions from the data.
I don’t have much direct contact with young computer science graduates, so I don’t want to speak on their behalf. However, I can say that when I worked in government and local organisations, I participated in many events and talks with young people.
I always said the same thing: technologies come and go, capturing all the attention – blockchain, Bitcoin, NFTs and now artificial intelligence, for example – but they are often just distractions. What’s important is paying attention to what really has value. That’s why, whenever I talk to young programmers, I encourage them to pursue what makes sense to them.
I believe open data provides a great opportunity for young people who want to make a real impact. For example, ten years ago I analysed electoral rolls and found irregularities such as addresses with more voters than expected. At the time, no one paid much attention, but the issue has recently reappeared on a local council’s agenda and caused alarm. Some residents took an interest and realised that technology could help solve the problem. The same is true of monitoring government budgets and expenditure: these are issues that spark curiosity and demonstrate how technology can be connected to everyday life and the desire to participate actively in society.
This is why, in the course I am going to teach, we will probably use open data as a starting point to spark that curiosity. If students are interested, it makes much more sense for them to work with data from their own community than with Bitcoin or other passing fads. This has enormous potential to motivate younger generations.
Lucas Pretti: Let me elaborate on a weakness of CKAN that you mentioned: its technical complexity and dependence on other services. I know that at Open Knowledge, together with Patricio Del Boca, there are discussions about a “CKAN Lite”, or “CKAnito” in Spanish. Where are we with that? What is your vision for achieving a simpler CKAN?
Andrés Vázquez: Yes. In the context of the discussions we are promoting at Open Knowledge under the initiative The Tech We Want, we are mainly looking for simplicity.
Our goal is to make CKAN available in the simplest possible form, so that governments, organisations and communities – even those that do not speak English – can quickly deploy it and start uploading data.
Ideally, we would like to be able to deliver a turnkey solution where you can press a button to have a CKAN instance ready with minimum configuration steps and internal services already resolved. This would be a significant advancement, as it would minimise technical and language barriers.
We often see valuable data scattered and disorderly among journalists, academics and researchers. If there were an easy-to-use tool to consolidate this information, it would have an enormous impact. A recent example is the open data portal of the National University of Córdoba, on which I have been working in recent months. Very few universities publish data, partly because it is not easy to do so. This portal has enabled researchers to start uploading information, and it would be ideal if more institutions followed suit.
At Open Knowledge, we always advocate for this kind of initiative. I believe that, once we have the simplified technology we want, we will see many more projects like this. Currently, technical complexity is a drawback, but we are working on solving this problem, which is not as difficult as it seems. Once solved, I am sure it will encourage many more communities to participate.
Lucas Pretti: During a recent internal discussion, I was intrigued when you mentioned MCP (Model Context Protocol) as it was a term I had never heard before. I understood it to be a way of guiding artificial intelligence to return not only plausible answers, but also substantial and valuable information, rather than hallucinations. Could you explain a little more about what MCP is, and comment on any other barriers or possibilities for innovation in this field?
Andrés Vázquez: That’s an excellent question. I have been working with governments, either directly or indirectly, for a long time, and something I have always noticed is that, when a technology becomes fashionable, governments want to adopt it right away. It’s very difficult to escape that logic.
I remember once, at a meeting, coming to the conclusion that more than 90% of that city’s technological problems could be solved with tools that had been around for decades. Just because something is fashionable does not mean it solves real problems. In other words, much of the technology that has been available for 20 years could cover most of a city’s needs, but it is often not utilised.
This does not mean that new technologies are without value. Blockchain technology, for example, can be used to certify that data has not been modified and that it was indeed published by a government. Similarly, artificial intelligence is such an overwhelming technology that it seems to occupy every space. However, we still see ineffective implementations. Today, for instance, I interacted with an AI bot from my bank and it failed to understand my query. The bank is spending money on a solution that is not effective.
So, it’s not about rejecting artificial intelligence, but thinking clearly about when it can really add value. At Open Knowledge, for instance, we explore ways to use it responsibly. Personally, I don’t immediately believe what AI tells me unless I can verify it elsewhere. The big problem is that AI isn’t designed to tell the truth, but to produce answers that make sense. And that can be dangerous.
This brings us to the idea that data is the closest thing we have to verifiable facts. We believe that artificial intelligence, which excels at conversation, can be enhanced by reliable data. And that’s where MCPs come in.
MCPs are small applications that support artificial intelligence. They act as servers that tell the AI: “I know how to answer these types of questions.” For example, consider the case of the University of Córdoba. In Argentina, there are periodic discourses blaming foreigners for various problems, including taking up “too many” places at public universities. Many people would ask an AI how many foreign students there are at the University of Córdoba. This is where an MCP can add value: we created an application that connects the AI to the official database. Instead of inventing an answer, it provides the exact number with a link to the dataset to confirm it.
Ideally, I would like AI to work in a way that not only provides a number, but also offers a link to the dataset, an explanation from the relevant expert, and the original source. This would generate much more trust.
CKAN’s extensibility allows us to envisage this type of application. Imagine that when someone uploads a dataset, they also define the questions it answers and how queries should be formulated. This would give AI direct access to verifiable information, enabling it to provide much richer and more reliable answers.
Additionally, other interesting projects are underway, such as using embeddings to automatically relate seemingly unrelated datasets and identify common concepts between them. This would make searching within CKAN much more powerful.
The important thing is that, as free software, CKAN offers the possibility of expansion and adaptation. And that is key. I don’t know of any other open data technology with this level of flexibility, enabling anyone with the necessary technical knowledge to expand its capabilities and generate innovations like these, which are already fully achievable today.
Lucas Pretti: CKAN is now 20 years old. It was created by Open Knowledge in 2005. Meanwhile, the pace of scientific and technological change is faster than ever and shows no signs of slowing down. However, as you said before, the newest solutions are often not necessary; rather, it is older technologies that continue to be very useful. So, in that context, how can CKAN remain as relevant in the next decade as it has been until now?
Andrés Vázquez: I think there are several possible answers because this is a multidimensional issue.
Firstly, CKAN will remain relevant as long as there is a community of users interested in open data and participating in public debate. If universities, journalists, tech entrepreneurs, and civil society organisations demand data, that demand will keep CKAN relevant. Ultimately, when someone thinks about opening up data, CKAN is always the first option that comes to mind.
Another key aspect is funding. Currently, the project operates under a model in which the Open Knowledge Foundation acts as the main organisation, collaborating with partner companies. However, the Foundation alone does not have sufficient funding to sustain the entire project. While the companies provide support, I believe that establishing a stable and sustainable economic foundation is crucial for CKAN’s future.
In fact, CKAN was declared a digital public good by the Digital Public Goods Alliance (DPGA) in 2023, which is an important recognition. As you said, it is unusual for software more than 10 years old to remain so relevant. I like to think of the internet meme of a large technological infrastructure supported by a small, almost invisible component. CKAN is a bit like that component within the global open data infrastructure: small in comparison, but essential.
The challenge is to make the project self-sustaining so that it does not depend solely on CKAN instances running in different places, but rather on an active community that responds to demands, receives proposals for improvement, listens to criticism and drives development forward. To remain relevant, we need to reach that point in the coming years.
I also believe that CKAN should continue to embrace new technologies. This doesn’t mean jumping on every bandwagon, but rather seriously considering how to integrate innovations such as artificial intelligence in a useful way, as we discussed earlier. This should be done in a way that makes sense and adds real value. This will also help CKAN maintain its central position in the open data ecosystem.
Lucas Pretti: Finally, I would like to address the issue of open government. With the Open Government Partnership (OGP) global summit approaching, I have been studying related issues in recent weeks. I found a quote in a Brazilian document from the Fundação Escola Nacional de Administração Pública (Enap), which said that people often confuse open government with transparency alone.
In other words, when someone says, “My data is open, there it is, so the government is open”, that’s not really the case. Open government is also about social participation, collaboration and broader aspects. I think CKAN contributes to this limited view because it essentially allows people to say, “I have my data portal, therefore I am open.” But in reality, that’s not enough. This issue deserves deeper reflection.
Andrés Vázquez: Yes, absolutely. The concept of open government was hotly debated around ten years ago when it started to become popular, with different intellectual approaches emerging. In my early work, I subscribed to the idea that open government encompasses not only open data, but also citizen participation and collaboration.
While it is essential for a government to publish an open data portal, this alone is not enough. Open data is one of the pillars of open government. However, for a government to be truly open, citizens must feel that they can participate and express their opinions in spaces where they will be heard. This cannot be achieved with a portal alone; other actions are required.
There is also the dimension of collaboration, which I understand to be an even more advanced form of participation, whereby relevant community members have a say in government decisions. For example, I am thinking of universities, trade unions and professional associations – entities that have a deep understanding of a community’s reality and are not necessarily part of the government. These actors must have spaces in which to express their opinions.
While CKAN helps with one of the pillars of open government – open data – it is not open government in itself. CKAN is a tool for open data, but much more is needed from governments to truly be able to call themselves open.







