Do we trust the plane or the pilot? The problem with ‘trustworthy’ AI

On April 8th 2019, the High-Level Expert Group on AI, a committee set up by the European Commission, presented the Ethics Guidelines for Trustworthy Artificial Intelligence. It defines trustworthy AI through three principles and seven key requirements. Such AI should be: lawful, ethical and robust, and take into account the following principles:

Human agency and oversight
Technical Robustness and safety
Privacy and data governance
Transparency
Diversity, non-discrimination and fairness
Societal and environmental well-being
Accountability

The concept has inspired other actors such as the Mozilla Foundation which has built on the concept and wrote a white paper clarifying its vision. Both the ethics guidelines and Mozilla’s white paper are valuable efforts in the fight for a better approach to what we at Open Knowledge call Public Impact Algorithms:

“Public Impact Algorithms are algorithms which are used in a context where they have the potential for causing harm to individuals or communities due to technical and/or non-technical issues in their implementation. Potential harmful outcomes include the reinforcement of systemic discrimination (such as structural racism or sexism), the introduction of bias at scale in public services or the infringement of fundamental rights (such as the right to dignity) »

The problem does not lie in the definition of trustworthiness: the ethical principles and key requirements are sound and comprehensive. Instead, it arises from the aggregation behind a single label of concepts whose implementation presents extremely different challenges.

Going back to the seven principles outlined above, two dimensions are mixed in: the technical performance of the AI and the effectiveness of the oversight and accountability ecosystem which surrounds it. The principles fall overwhelmingly under the Oversight and Accountability category.

Technical performance	Oversight and Accountability
Technical robustness and safety Transparency	Human agency and oversight Privacy and data governance Transparency Diversity, non-discrimination and fairness Societal and environmental well-being Accountability

Talking about ‘trustworthy AI’ emphasizes the tool while de-emphasizing the accountability ecosystem, which becomes a bullet point; all but ensuring that it will not be given the attention it deserves.

Building a trustworthy plane

The reason why no one uses the expression ’trustworthy’ plane(1) or car (2) is not because trust is not essential to the aviation or automotive industries. It’s because trust is not a useful concept for legislative or technical discussions. Instead, more operational terms such as safety, compliance or suitability are used. Trust exists in the discourse around these industries, but is instead placed in the ecosystem of practices, regulations and actors which drive the industry: for the civil aviation industry this includes the quality of pilot training, the oversight on airplane design, or the standard of safety written in the legislation (3).

The concept of ‘trustworthy AI’ displaces the trust from the ecosystem to the tool. This has several potential consequences:

Trust could become embedded in the discourse and legislation on the issue, pushing to the side other concepts that are more operational (safety, privacy, explicability) or essential (power, agency(4)).
Trustworthy AI could become an all encompassing label —akin to an organic fruit label— which would legitimize AI-enabled tools, cutting off discussions about the suitability of the tool for specific contexts or questions about whether these tools should be deployed at all. Why do the hard work of building accountable processes when a label can be used as a shortcut?
Minorities and disenfranchised groups would again be left out of the conversation: the trust that a public official puts into an AI tool will be extended by default to their constituents.

This scenario can already be seen in the European Commission’s white paper on AI(5): their vision occults completely the idea that some AI applications may not be desirable; they outline an ecosystem made of labels, risk levels(6) and testing centers, which would presumably give a definitive assessment on AI tools before their deployment; they use the concept of ’trust’ as a tool for accelerating the development of AI rather than as a way to examine the technology on its merits. Trust as the oil in the AI industry’s engine.

We should not trust AI

Behind Open Knowledge’s Open AI and Algorithms programme is the core belief that we can’t and shouldn’t trust Public Impact Algorithms by default. Instead, we need to build an ecosystem of regulation, practices and actors in which we can place our trust. The principles behind this ecosystem will resonate with the definition given above of ’trustworthy’ AI: human agency and oversight, privacy, transparency, accountability… But while a team of computer science researchers may discover a breakthrough in explainable deep learning, the work needed to set up and maintain this ecosystem will not come through a breakthrough: it will be a years-long, multi-stakeholder driven and cross-sector effort that will face its share of opponents and headwinds. This work can not, and should not, simply be a bullet point under a meaningless label.

Concretely, this ecosystem would emphasize:

Meaningful transparency: at the design level (explainable statistical model vs black box algorithms)(7), before deployment (clarifying goals, indicators, risks and remediations)(8) and during the tool’s lifecycle (open performance data, audit reports)
Mandatory auditing: although algorithms deployed in public services should be open source, Intellectual Property Laws dictate that some of them will not. The second best option should consequently be to mandate auditing by regulators (who would have access to source code) and external auditors using API designed to monitor key indicators (some of them mandated by law, others defined with stakeholders)(9).
Clear redress and accountability processes: multiple actors intervene between the design and the deployment of an AI-enabled tool. Who is accountable for what will have to be clarified.
Stakeholder engagement: algorithms used in public services should be proactively discussed with the people they will affect, and the possibility of not deploying the tool should be on the table
Privacy by design: the implementation of algorithms in the public sector often leads to more data centralisation and sharing, with little oversight or even impact assessment.

These and other aspects of this ecosystem will be refined and extended as the public debate continues. But we need to make sure that the ethical debates and the ecosystem issue are not sidelined by an all-encompassing label which will hide the complexity and nuance of the issue. An algorithm may well be trustworthy in a certain context (clean structured data, stable variables, competent administrators, suitable assumptions) while being harmful in others, however similar they might be.

(1) The aviation industry talks about ‘airworthiness’ which is technical jargon for safety and legal compliance https://www.easa.europa.eu/regulations#regulations-basic-regulation
(2) The automotive industry mainly talks about safety https://ec.europa.eu/growth/sectors/automotive/legislation/motor-vehicles-trailers_en
(3) which is why federal aviation agencies (FAA) generally do not re-certify a plane validated by the USA’s FAA: they trust their oversight. The Boeing scandal led to a breach of trust and certification agencies around the world asked to re-certify the plane themselves. https://en.wikipedia.org/wiki/Boeing_737_MAX_groundings
(4) I purposefully did not mention fairness here. See this paper discussing the problems with using fairness in the AI debate: https://www.cs.cornell.edu/~red/fairness_equality_power.pdf
(5) It was published on February 2020, which means that they already had access to the draft version of the Ethics Guidelines for Trustworthy AI https://algorithmwatch.org/en/story/ai-white-paper/
(6) See also the report from Data Ethics Commission of the Government which defines 5 risk levels https://algorithmwatch.org/en/germanys-data-ethics-commission-releases-75-recommendations-with-eu-wide-application-in-mind/
(7) Too little scrutiny is put on the relative performance of black box algorithms vs explainable statistical models. This paper discusses this issue: https://hdsr.mitpress.mit.edu/pub/f9kuryi8/release/5
(8) As of October 2020, Amsterdam (The Netherlands), Helsinki (Finland) and Nantes (France) are the only governments having deployed algorithm registers. But in all cases, the algorithms were deployed before being publicized.
(9) oversight through investigation will still be needed. Algorithm Watch has several projects in that direction, including a report on Instagram. This kind of work relies on volunteers sharing data about their social media feeds. Mozilla is also involved in helping them structure this kind of ‘data donation’ project https://algorithmwatch.org/en/story/instagram-algorithm-nudity/