Peter Murray-Rust: ‘The Global South is seen to be doing things better, it only lacks a unified critical mass’

Picture: Daniel Mietchen/Public Domain

This is the sixth conversation of the 100+ Conversations to Inspire Our New Direction (#OKFN100) project.

Starting in January 2023, we are meeting with more than 100 people to discuss the future of open knowledge, shaped by a diverse set of visions from artists, activists, scholars, archivists, thinkers, policymakers, data scientists, educators, and community leaders from everyone.

The Open Knowledge Foundation team wants to identify and discuss issues sensitive to our movement and use this effort to constantly shape our actions and business strategies to best deliver what the community expects of us and our network, a pioneering organisation that has been defining the standards of the open movement for two decades.

Another goal is to include the perspectives of people of diverse backgrounds, especially those from marginalized communities, dissident identities, and whose geographic location is outside of the world’s major financial powers.

How openness can accelerate and strengthen the struggles against the complex challenges of our time? This is the key question behind conversations like the one you can read below.

This week we had the chance to talk with one of the most active voices in the field of open science, whose influence has reached generations of researchers in many geographies. Dr. Peter Murray-Rust is a chemist, a professor in molecular informatics at the University of Cambridge, and a historic open access and open data activist in academia.

In his career, he’s been particularly interested in promoting open knowledge through research groups and communities, such as the Blue Obelisk community for Open Source chemical software, and through the Semantic Web. He was among the proposers of the World Wide Molecular Matrix back in 2002 and has been the creator along with Henry Rzepa of the Chemical Markup Language.

As an open data activist, he has developed and published many manifestos such as the Panton Principles and the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities. In 2014, he was awarded a Fellowship by the Shuttleworth Foundation to develop the automated mining of science from the literature.

Peter is also a member of the Open Definition Advisory Council – which is one of the reasons why we are publishing this conversation today. The OKFN team is preparing the second round of consultations on the Open Definition review, with the aim of updating it to current challenges and finding a broader and more diverse consensus around it. The session will take place in person at RightsCon Costa Rica in June 7. This conversation helps to contextualise the work done previously and seeks to put our brains together to think about some of the topics that have already emerged in the previous session at MozFest and in the discussion forum.

With each conversation, more people join the project. On this occasion, we had the pleasure of having the participation of Adrià Mercader, OKFN’s Technical Lead based in Tarragona, Catalonia; Nikesh Balami, OKFN’s International Open Data Lead based in Kathmandu, Nepal; and Lucas Pretti, OKFN’s Communications Lead based between São Paulo, Brazil and Madrid, Spain.

We hope that Peter’s insights will serve as an inspirational source for the important discussions ahead.

Lucas Pretti: In an email message you sent me while we were arranging this conversation, you said the following, “We are literally fighting for the soul of Open and we are not winning”. Could you elaborate on this feeling of loss in more detail? Why are we losing?

Peter Murray-Rust: The idea of public open started with software, with Richard Stallman’s freedoms and similar things in the 80s. I would say that until the cloud and the mega-corporations have come along, open software has been a success. A little over a decade ago, there was a big momentum but since 2013-14s it died and the corporates came to realise that there were vast amounts of wealth to be made by enclosing open. We’ve seen it in all areas. There were great inroads into the public domain and massive investments in lawyers to sue people who don’t support corporates. My own area is scholarly publications, which is one of the worst at the complete travesty of everything.

The whole thing at best is incredibly messy today. You cannot rely on something a hundred years old being in the public domain, as it could well be owned by a corporation. In any area where you land, you don’t know how you’re going to go out, as there’s no roadmap. For example, who owns the map of Britain? Who owns the postcodes of Britain? Who owns timetables? Who owns this sort of thing? There’s actually no expectation that they will be open. It’s all scattered all over.

At least I find that most governments are largely on the side of open. I saw recently that the US government is now funding open knowledge networks. Nothing to do with OKFN, but it’s the technology to build knowledge graphs from all public knowledge. This is really good if they’re able to make it happen.

But in general terms, we’re no nearer knowing whether we’re going to win these battles. A lot of these things are worthy endeavours, but they’re not backed by the force of law. Until we have the force of law, and corporations are fined for this sort of thing, we’re never going to get compliance.

Adrià Mercader: I agree, but want to offer my perspective as a counter-argument. I joined Open Knowledge in 2011 at the beginning when the curve was at worst. If we judge the current situation by the standards of the hopes we had back then, it’s difficult not to be a bit disappointed.

But maybe we need to rethink and reframe what the long-term goal is. I don’t think that the landscape of open is worse now than it was in 2010 by any measure, especially in certain areas or topics like government. There’s innovation and individuals and groups within different governments in the world pushing for more transparency and better data publication. Now people know what open data is, and I think that there’s more data literacy in general across different sectors. People are more aware of why data is important, what are data formats, etc. There’s been a progression.

So I wanted to ask you specifically in the context of the academia. Do people starting up in academia today coming from higher education have an increased awareness of topics like open access, licensing, or even data literacy skills? Do you think that this has improved over the years?

Peter Murray-Rust: I don’t teach or research in the UK anymore. My current experience is with Indian undergraduates. Unfortunately, there’s not a high entry point with these concepts there, but in general terms, I would say that the generation who are graduating are much more aware of these ideas, much more likely to publish open-source software and understand the value of open data. Some disciplines have managed it well, like bioscience, a lot of geosciences, and astronomy. Engineering and chemistry are still very fossilised in that regard tough.

Universities are conservative organisations. Just because they do cutting-edge research doesn’t mean that they are innovative as organisations. Universities are often symbiotic with major corporations. If you take journals and books, these are now run by huge corporates, same for teaching. A lot of learning societies are also incredibly conservative and very possessive.

With the advances in knowledge management, I expect that this is going to come to more and more commoditised education, and undergraduates will be inducted into this process. I’m a bit of a pessimist here. As soon as corporates are included in the system, any values are lost because corporates now do not have human values. Their values are determined by shareholder value, which is computable by machines. It really doesn’t matter whether those machines are AI or collections of human bureaucrats. Everything in academia is getting more and more mechanised.

Adrià Mercader: You touched on probably everyone’s favourite subject right now. Artificial intelligence has a lot of potentialities but also presents many dangers. I’m keen to hear your thoughts on AI. Could you reflect on the impacts on academia but also think about challenges like climate change?

Peter Murray-Rust: GPT and its peers will soon be indistinguishable from a very competent human in many different areas. I haven’t used it a lot, but I can ensure that it’s really an automated collection of semi-public human knowledge based on language models and on sticking together sentences. I don’t know whether it’s going to become sentient, intelligent or whatever, but I am clear that it’s going to be controlled by corporates. And those corporates will surveil and control their users. Every time somebody uses it, the AI will know more about that person and will develop its intelligence based on that interaction. I’m sure Google has been doing this for years, and that’s the main use of Bing for Microsoft.

That reminds me of Microsoft Academic Graph, which was a rival to Google Scholar that aims to index the world scholar literature. About a year ago they open-sourced it and Microsoft Academic Graph is now managed by OpenAlex. This is one of the most innovative and challenging open communities I know, run by Heather Piwowar and Jason Priem out of Vancouver. They spent years fighting the corporates to create an index of scholarly publishing using a semantic map of academic publications. It’s all open source (I use open source just as a means conformant to the Open Definition) and anyone can use it. For all I know, Jason is putting it into AI now…

What I don’t know about AI is how much the startup costs are for people who don’t have billion-dollar revenues. How much do you need to compute banks under the Arctic ice? Can you get into it at a medium level? I know Mozilla is thinking about this with their Mozilla.ai project, but there’s still no one standing up to be the public AI of 2023. I have suggested that this is something that CERN might do, as they have played a big role in managing the sort of peripherals of knowledge. The Open Hardware licence was developed by CERN, for example. And now more recently I think they’re looking at the possibility of open in AI.

If this continues to be run by corporations, for which the only goal is growing their market share, I expect to see many lawsuits involving AI and people who are misusing it. Copyright is going to be a major problem in AI.

Lucas Pretti: Here is a great opportunity to discuss the Open Definition. As you know, in 2023 we are reigniting the discussions in order to review the definition and find a broader and more diverse consensus around it. One of the emerging tendencies is what people are calling “responsible licences”, being RAIL the main one today for AI – you probably remember that it was mentioned by someone in our MozFest session in March.

The current Open Definition includes the expression “for any purpose” as one of the variables that define open content, with which we all theoretically agree. But the enclosures are coming so strong that it might be useful to review this generalisation and add some sort of protection depending on the usage. What do you think? How can the Open Definition be useful to address that contradiction?

Peter Murray-Rust: Yes, it’s a very, very important topic. First of all, the Open Definition doesn’t stand alone. There are many laws that have more power. So, for example, if you use the Open Definition to publish personal information, you can’t say you’re allowed to do that as you answer to the country whose privacy laws you’ve broken.

Coming down to “for any purpose”, there’s the need to realise that what the definition is doing is helping organisations create legally actionable documents. The philosophy of the definition is to help decide whether a proposed licence is conformant with open or not. But the final licence is always negotiated in some jurisdictions.

Now, one of the things that I think you have to do is to get, not just me, but some of the original members of the Open Definition Advisory Council into the review to make sure the licence is really actionable in a legal aspect. I know cases in the spirit of “do no evil” that are actually positively harmful because they’re not actionable and lead to messy court cases. So you have to talk to those people who understand more than I do exactly about how the legal aspects are, or you can run into the same problem. That would be one thing.

The other thing is the environment of the Open Definition licence right now. In the 2000s it worked mainly because there was mostly static information circulating. In other words, you had a map, a corpus of documents, or a piece of music, and you could apply the Open Definition to that particular object. Now we’re moving into the realm where the coherence of an object is fragmented. Today many websites are made of an assembly of components from other databases, and we need to discuss what is protectable or relevant to state as an open element. Another real problem, which was slightly different, is that you can end up with open material in a closed environment.

For example, I put my slides up on SlideShare and then SlideShare came along and implemented a new policy saying that everybody has to sign up to share SlideShare. The corporates will always enclose the mechanism of accessing open objects, and that problem hasn’t been solved. The only way it could be solved is to have a trusted organisation actually possessing all of the open content available, like the Internet Archive or something of that sort. By the way, the Internet Archive is being sued by whoever. Goodness!

Lucas Pretti: Yes, there’s a huge campaign today, #EmpoweringLibraries, defending the Internet Archive and the right of libraries and librarians to own and lend digital documents. That’s what you’re referring to, right?

Peter Murray-Rust: Yes. I think that academic libraries particularly have completely dropped the ball over the last 20 years… They should have been protecting the material, but what they have been doing is paying libraries subscriptions and paywalls. They’re building their own piddling repositories which aren’t properly used and aren’t federated. They should have stood for a national repository of academia for every country.

Well, some countries have done that, like Brazil and the Netherlands. But in the UK it is a total mess. Today, librarians are ultra-scared of transgressing any rule. If a publisher makes a rule, they adhere to it and don’t challenge it. Same for copyfraud – it’s a disaster! Anybody can copyright anything. The only recourse you have is to hire a lawyer, sue them, and you might get back the monetary value of your contribution to the document. But it would be symbolic. Nobody ever has punitive damages in this sort of area.

Nikesh Balami: It’s great to hear all these things directly from you. It’s all very in line with the narratives and discussions we have in forums and events in the open movement. We are always discussing the gaps, safety, priorities, and people defining openness in their own ways. And the losses.

But how can we turn this game around? How can we engage more younger generations and make sure the narrative fits into the new technologies that have been shifting?

Peter Murray-Rust: One of the things I haven’t mentioned and I feel very strongly is that many of the current systems and scholarly publishing are utterly inequitable and neocolonialist, and they’re getting worse. The only way to challenge that would be to get enough people in the North to take that on board as a crusade – and there aren’t at the moment. The US is trying to do that with its latest policies and releases. The UK government is a disaster, so it’s not going to do anything. The EU is heavily lobbied by corporates, so it’s going to come up with something in the middle.

I actually think that the big potential growing point is Latin America as a focus for the Global South. There’s a tradition in Latin America that scholarly publication is a public good. That has been strong with SciELO and related things, and people like Arianna Becerril-Garcia are taking that to the next level. Arianna is a computer scientist, and professor of computer science at UNAM, in Mexico, and she has been spearheading open in Latin America through projects like Redalyc, an archive platform, and AmeliCA, a publishing platform and framework for academic publications. They are now expanding it to Namibia, looking up to do the same sort of thing having Namibia as an African hub.

We need to link up this kind of people in Latin America, Southern Sub-Saharan Africa, and the Indo-Pacific States, India, Indonesia, etc. We’ve gotta have a unified critical mass, which is seen to be doing things better. That’s the only way we’re going to win. If we do that, we might have enough to build a critical mass that would challenge the North. What I’m saying is that we’re not going to win on principles or on price, because the North has got billions of dollars to give to publishers. The article processing charges are totally iniquitous.

By the way, do you know what it costs in dollars to publish an open-access paper in the world’s most recognised scientific journal (I can’t say the name here, but it’s easy to deduce)? Have a guess. I want to hear your guess.

Adrià Mercader: A thousand dollars.

Lucas Pretti: I would say $500, something like this.

Nikesh Balami: I was thinking around $300, something below 500.

Peter Murray-Rust: The answer is $12,000. It’s the price for glory. It’s not advertising. It’s saying to the author, “If you publish in my magazine and you’ve got $12,000, then your career will advance much fo faster than your rivals”.

Lucas Pretti: And you can get a Nobel maybe in 15 years…

Adrià Mercader: It’s actually more perverse than that. The whole academic system is built on what journals you publish in. My partner is a scientist, and the funding she receives will depend on the impact factor of whatever journal she’s publishing. It’s like a racket, basically.

Peter Murray-Rust: Exactly. It’s corrupt. It’s totally corrupt. Impact factors are generally made by algorithms, but in particularly important journals impact factors are negotiable. If the Open Knowledge Foundation is about injustice, that typifies one of the major global injustices in the world: access to knowledge. I would say that part of OKFN’s role should be to discover and formalise injustices, particularly with respect to the Global South to put them in front of people who can make a difference.