Should we be concerned that the decisions of AIs are inscrutable?

by John Zerilli

Photo of a person using facial recognition in a dimly lit hallway with illuminated screens on the walls.

Facial recognition software now works even with masks. Photo by Kim Kyung-Hoon/Reuters

is a philosopher with particular interests in cognitive science, artificial intelligence and the law. He is currently a Leverhulme Trust Fellow at the University of Oxford and was previously a Research Fellow at the Leverhulme Centre for the Future of Intelligence at the University of Cambridge. His books include The Adaptable Mind (2020) and A Citizen’s Guide to Artificial Intelligence (2021).

Edited by Sally Davies

John Zerilli

Edited by Sally Davies

Machine learning is a black box – even when the decision is correct, how the algorithm arrived at it can be a mystery

You apply for a job via an online recruitment portal. At some point, you’re prompted to upload a current CV. Three weeks later, you receive the familiar email: ‘Due to the high number of exceptional candidates applying for this position, we regret to inform you that you were not successful in being shortlisted on this occasion …’ The odds being what they are, the news doesn’t come as a surprise. Sure, there’s always the immediate pang of disappointment and, more often than not, a bit of soul-searching. But, on some level, you knew this was coming, and bounce back within a day or two.

There are many reasons not to take job rejections personally, but there’s one in particular you might not consider: you might have been screened out by an algorithm that taught itself to filter candidates by gender, surname or ethnicity – in other words, by factors that have nothing to do with your ability to do the job. Even if you’re unfazed by the spectre of runaway robots enslaving humanity, this little tale shows how the ascendancy of machine learning (ML) comes with risks that should vex even the most sanguine observer of technology.

ML is only one type of artificial intelligence (AI), but it’s probably the most active area of AI research today. One of ML’s most ubiquitous incarnations, so-called ‘supervised’ ML, is already applied in a wide range of fields: HR recruitment, as we saw, but also criminal justice and policing, credit scoring, welfare assessment, immigration border control, medicine, fraud detection, tax-evasion, weather forecasting – basically any situation where the ability to predict outcomes is useful.

For all the techno-utopian fanfare they’re apt to generate, most ML systems are still in their infancy. Whether it’s learning how to tell kittens apart, or so-called ‘adversarial’ cases – where image-recognition systems are thrown by the addition of an unexpected item in the scene – we’re a long way off from truly intelligent machines. But that doesn’t mean we can rest easy, or shrink from the many important choices this technology forces us to confront right now.

As anyone in the field will tell you, the machinations of ML systems can be inherently difficult to interpret, particularly those of deep neural networks, a special class of ML systems that boast exceptional performance. In the argot of the ML community, deep neural networks are black boxes – devices whose inner workings are bafflingly complex and opaque, even to the initiated. But does this opacity really matter? Should we give even a moment’s thought to the inscrutability of systems that automate human decisions, so long as they’re demonstrably better than we would be: more accurate, less biased and more efficient?

In the world of big tech, you can expect to hear the following sort of argument: there are lots of useful systems we don’t fully understand – but who cares, so long as they are useful. Insisting on understanding something as a precondition to using it would be crazy. That’s certainly true in medicine, where the mechanisms behind many life-saving drugs are incompletely understood. But it’s also true in the history of technology. Hans Lippershey didn’t need to know about the properties of the visible spectrum before he could invent (and use) a telescope in the 1600s. And later, in the 19th century, Charles Babbage’s famed Analytical Engine uncannily anticipated the architecture of the modern digital computer, incorporating stored memory, looping and conditional branching. Yet Babbage knew nothing of the later advances in mathematical logic that would ultimately make digital computers possible in the 20th century.

This line of thinking finds support in some areas of epistemology too, the branch of philosophy concerned with the nature of knowledge. A common starting point is the view that knowledge is justified true belief – the idea that you know something not just when you believe it, and it happens to be true, but when you’re also justified in some way. So when are you justified in believing something? A ‘reliabilist’ would answer: when your belief is underpinned by some reliable process of belief formation, even if you can’t explain why the process is reliable. A reliabilist could argue, for example, that we were justified in believing what our eyes were telling us about the outside world before we had a detailed science of optics, because the human visual system is obviously adaptive and has served us well. Similarly, perhaps we’re right to accept the outputs of ML systems that have shown themselves to be reliable, even when we don’t understand them. Arguably a reliabilist principle lies behind the use of juries in many legal systems: their deliberations are black boxes (the 12 jurors don’t explain their decisions to anyone), but their verdicts are thought to be formed by a reliable process, so explanations aren’t required.

However, there’s a danger of carrying reliabilist thinking too far. Compare a simple digital calculator with an instrument designed to assess the risk that someone convicted of a crime will fall back into criminal behaviour (‘recidivism risk’ tools are being used all over the United States right now to help officials determine bail, sentencing and parole outcomes). The calculator’s outputs are so dependable that an explanation of them seems superfluous – even for the first-time homebuyer whose mortgage repayments are determined by it. One might take issue with other aspects of the process – the fairness of the loan terms, the intrusiveness of the credit rating agency – but you wouldn’t ordinarily question the engineering of the calculator itself.

That’s utterly unlike the recidivism risk tool. When it labels a prisoner as ‘high risk’, neither the prisoner nor the parole board can be truly satisfied until they have some grasp of the factors that led to it, and the relative weights of each factor. Why? Because the assessment is such that any answer will necessarily be imprecise. It involves the calculation of probabilities on the basis of limited and potentially poor-quality information whose very selection is value-laden.

But what if systems such as the recidivism tool were in fact more like the calculator? For argument’s sake, imagine a recidivism risk-assessment tool that was basically infallible, a kind of Casio-cum-Oracle-of-Delphi. Would we still expect it to ‘show its working’?

This requires us to think more deeply about what it means for an automated decision system to be ‘reliable’. It’s natural to think that such a system would make the ‘right’ recommendations, most of the time. But what if there were no such thing as a right recommendation? What if all we could hope for were only a right way of arriving at a recommendation – a right way of approaching a given set of circumstances? This is a familiar situation in law, politics and ethics. Here, competing values and ethical frameworks often produce very different conclusions about the proper course of action. There are rarely unambiguously correct outcomes; instead, there are only right ways of justifying them. This makes talk of ‘reliability’ suspect. For many of the most morally consequential and controversial applications of ML, to know that an automated system works properly just is to know and be satisfied with its reasons for deciding.

You might wonder, though, about decisions that really do have unequivocally correct answers. If an ML tool could be shown to get the right answer virtually 100 per cent of the time, might we dispense with explanations then? Would we care how the tool gets its answers, if it’s as reliable as the calculator in always giving the right ones?

It’s tempting to think we wouldn’t care, but a little reflection shows this isn’t credible. The reason we don’t mind how the calculator arrives at its answer isn’t because there’s a right answer. It’s because no matter which procedure the calculator happens to employ, there’s no risk of it mistreating the person whose circumstances depend on it. There’s no way a calculator’s reasoning can break the law by falling foul of antidiscrimination provisions. There’s no sense in which the calculator can misconstrue its jurisdiction, or take irrelevant matters into account.

By contrast, for the decisions that ML systems have been increasingly tasked to handle – even low-level and apparently mechanical ones, such as passport verification or CV screening – there’s a real risk of mistreatment. If facial structure happens to correlate to any degree with sexual orientation, say, and if sexual orientation predicts some variable of interest to a similar degree of precision – then the use of facial recognition technology becomes instantly suspect and possibly unlawful, whatever its precision. So too, a Delphic recidivism risk tool that correctly computes whether or not someone will have brushes with the law can do so in ways that mistreat that person.

Even when you think an ML tool gets the right answer every time and remains fully compliant with the law, there are all sorts of ways it could still mistreat someone. Imagine you’re an HR consultant, with two colleagues who are going to assist you in hiring for a particular position in a company. There are 30 applicants in total, but assume that there’s a standout candidate everyone agrees should get the job. The goal of the recruitment process, then, is to ensure that candidate gets selected.

You might decide to lighten the workload as follows: you divide the 30 applications into three random piles of 10, and distribute one to each of your colleagues, reserving the final pile of 10 for yourself. This distributes the workload evenly among yourselves. Each of you will longlist and score three of the best candidates from your piles. At a later meeting, you’ll compare your scores and decide on which five to shortlist. You conduct the interviews, and the best candidate gets the job.

An alternative but more laborious approach involves each of you reviewing and scoring all 30 candidates: you each independently longlist five, for a combined longlist of anywhere between five candidates (if you all happen to agree) and 15 (if there’s no agreement at all). Because there’s a standout candidate, we assume you’d all agree that at least that person should be longlisted. So the maximum combined longlist should have 13 candidates. As before: you meet, compare scores, jointly shortlist five, conduct the interviews, and the best candidate gets the job – the same candidate who would have pulled ahead using the less laborious approach.

Both of these procedures are a kind of algorithm. And both algorithms arrive at the correct result. So which algorithm should you adopt? The more expeditious one is tempting, but it poses a problem. The first algorithm is quicker, for sure, but it’s also procedurally unfair. Say your pile of 10 happened to have all the most able candidates, including the winning one, so that the other two piles had the candidates who were far less capable. The first algorithm would eliminate most of the very able candidates at the longlisting stage, while including some of the weakest candidates, because of their presence in the other two piles that must contribute to the longlist. The shortlist of five from nine will presumably include the three who were longlisted from the able pile. But two other candidates in the shortlist will be far weaker than any two that could have been selected from those eliminated in the able pile. This means that the first algorithm pits the stellar candidate against two of the weakest candidates at interview, while eliminating two of the strongest. Something’s gone wrong. It’s no answer that the stellar candidate would win in any case. The real point is that the first algorithm mistreats two able candidates.

My guess is that we’ll all very much want to know why our super-clever ML systems decide as they do, regardless of how prescient they prove to be, and how legal they are. Ultimately this is because the potential for decisions to mistreat us isn’t governed by the tides that determine accuracy and error. Explanations in some form are probably here to stay.

Published in association with the Leverhulme Centre for the Future of Intelligence, an Aeon+Psyche Partner.

TECHNOLOGY AND MEDIA FAIRNESS AND EQUALITY PROGRESS AND THE FUTURE

Syndicate this idea

Explore more

Painting of a forest at night with a small house and dimly lit sky visible through the trees.

MEANING AND THE GOOD LIFE

Is modern asceticism about conformity or quiet revolution?

From detoxes to slow food, today’s asceticism is often about fitting in. But we can rediscover its transformative power

by Iryna Mykhailova

Photo of a man sunbathing on a rooftop next to a large satellite dish.

PROGRESS AND THE FUTURE

Why it’s possible to be optimistic in a world of bad news

The original optimist, Leibniz, was mocked and misunderstood. Centuries later, his worldview can help us navigate modern life

by Sumit Paul-Choudhury

Photo of a woman with tattoos taking a selfie on a smartphone wearing an off-shoulder dress in front of glass doors.

VIRTUES AND VICES

Social comparison is driving us to despair. It doesn’t have to

In the social media age, it seems impossible not to measure ourselves against others – but we can dodge the worst pitfalls

by Wojciech Kaftański

World map illustration showing physical geography with oceans, continents and elevation.

ETHICS

What makes a map ‘good’? On the ethics of cartography

Rendering the world in a responsible way means wrestling with what gets depicted on a map, how, and for whom

by Nat Case

A judge with a stern expression in a courtroom wearing a black robe standing in front of a US flag.

FAIRNESS AND EQUALITY

A lawyer’s view of irrelevant influences in the courtroom

Psychology studies cast doubt on old assumptions about legal objectivity. Lawyers and laypeople alike should take notice

by Samu Czabán

A classical painting of two women and a man in a lavishly decorated room with ornate furniture and clothing, 19th century style.

ETHICS

The curious paradox in how we address each other today

While honouring people’s preferred pronouns, we’ve begun to neglect forms of formal address. Perhaps we need a rethink

by David Benatar

A blue sports car and a grey luxury car parked on a city street with people and a Christian Dior shop in the background.

GENDER

The ‘masculinity crisis’ is actually a crisis of self-esteem

There’s a modern belief that talent or effort can carry anyone to the top. It’s a myth that’s especially harmful to men

by Leo Rogers

Scene from a film with three characters walking from a futuristic spacecraft in a city setting.

PROGRESS AND THE FUTURE

We need the toolkit of utopian thinking, now more than ever

Many dismiss utopian ideas. But imagining a better world is a vital political skill for tackling today’s challenges

by Caitlin Rajan