Applying Descartes’ sceptical puzzle to deepfake videos reveals the challenge they present is one that we can rise to
The Apollo 11 Moon landing in 1969 was an extraordinary scientific achievement. It was also an extraordinary epistemological achievement. Not only did a man walk on the Moon but, almost contemporaneously, much of humanity came to know via a live broadcast that a man had walked on the Moon. The event was a dramatic illustration of the epistemic power of video footage. Some argue that this power is imperilled in the age of generative AI. According to these commentators, tools for generating ‘deepfakes’ – highly realistic synthetic images or recordings – threaten to induce widespread deception and, over time, pervasive distrust of seemingly reliable footage. If lifelike videos can be conjured up at will, then such footage is no longer good evidence.
Although deepfakes came to prominence in the form of nonconsensual pornography, their deceptive potential was quickly recognised. In an article for The Atlantic in 2018, Franklin Foer speculated that: ‘We’ll shortly live in a world where our eyes routinely deceive us. Put differently, we’re not so far from the collapse of reality.’ Deception is just the start of the challenge. Foer goes on to predict that: ‘Fabricated videos will create new and understandable suspicions about everything we watch.’ Some will be tempted to exploit these suspicions. A public figure, confronted with compromising video evidence, may insist that the video in question is a deepfake. The legal scholars Bobby Chesney and Danielle Citron refer to this enhanced ability to dismiss unwelcome video footage as the ‘liar’s dividend’. The philosopher Regina Rini has speculated that deepfakes will compromise the ability of videos to function as an ‘epistemic backstop’, an authoritative form of evidence against which assertions, photographs and other lesser forms of evidence can be checked. By compromising this epistemic backstop, deepfakes threaten to reduce the costs of deception. Concerns along these lines constitute what Joshua Habgood-Coote has called the ‘epistemic apocalypse narrative’.
This apocalyptic narrative is intuitively compelling, especially given prominent concerns about fake news, disinformation and other sources of deception. Moreover, concerns about the epistemic impacts of deepfakes resemble long-standing sceptical worries that have animated the history of philosophy. In his book The Significance of Philosophical Scepticism (1984), Barry Stroud introduces the following thought experiment. Imagine a man waking up to find himself locked in a room full of televisions displaying moving images. Can he know that the events depicted on the screens correspond to real events in the outside world? It seems not. His confinement prevents him from verifying the correspondence between reality and what is depicted on screen. He might assess the mutual consistency of the on-screen events, but such consistency is no sure indicator of the reality beyond the screens. Thus, even if the screens display live streams of outside events, he cannot know this. The resources available to him provide no means of bridging on-screen representation and external reality.
A perfect being, Descartes reasoned, would not allow for systematic deception
The epistemological threat posed by deepfakes is less dire than that faced by the man in Stroud’s thought experiment. We have independent means of checking whether at least some videos correspond to reality. Still, our predicament is not altogether dissimilar. Much of our knowledge, especially of distant events, is unavoidably mediated by video footage and similar representations. Although we differ from Stroud’s imagined character in that no physical locks prevent us from investigating the correspondence between these representations and the real world, such investigations are hardly practical, and we cannot consistently conduct them. One could not, for example, personally check the provenance of all the video clips that appear on the nightly news or on one’s social media feeds. Moreover, especially for videos that purport to represent distant events, the most obvious means of assessing their veracity would be to compare them with other videos purporting to represent the same events. But generative AI could, in principle, be used to mass-produce mutually consistent videos falsely purporting to represent the same event. Thus, even such consistency would indicate little about the veracity of the footage. There is consequently a gap between the videos that purport to represent reality, and on which we base many of our beliefs, and reality itself. Is it possible to bridge this gap?
Insight into this question can be derived from an unlikely source. Stroud’s thought experiment concerning the locked room is intended as an analogy for a sceptical puzzle articulated clearly by the 17th-century philosopher René Descartes, but familiar even to Descartes’s ancient predecessors. The puzzle arises when one thinks, as Descartes did, that knowledge of the world is mediated by mental representations of external objects. Descartes asked how one can know that these mental representations accurately reflect the external world, or indeed how one can know that there is any world at all beyond one’s own mind. Descartes sharpened this challenge by arguing that, on the face of things, the same experiences commonly taken to represent external reality could be figments of dreams or generated by a deceptive demon.
Beyond clearly articulating the challenge, Descartes claimed to be the first philosopher ever to overturn the doubt of the sceptics. His proposal was to defeat these doubts by discovering a guarantor for the reliability of his perceptions. Descartes thought that, by proving that God exists, he could prove that his perceptions were, by and large, accurate. A perfect being, Descartes reasoned, would not allow for systematic deception. Although Descartes’s theological response to scepticism is widely regarded as circuitous and indeed viciously circular, an analogous strategy can be fruitfully applied to the sceptical challenge posed by deepfakes. Comparably to how dreams or a deceptive demon might give rise to nonveridical experiences, deepfakers might produce lifelike but inauthentic videos. Thus, nothing internal to a given video guarantees its accuracy. But it does not follow that no such guarantee exists. Instead, the reliability of video footage can and should be assessed according to the social context in which it is encountered. We may not have God to rely on, but we can rely on trusted sources as guarantors of the accuracy of video footage.
The idea here is that the evidential power of a given video is not strictly determined by its content but depends also on the broader context in which it is encountered. The relevant context includes the source of the video and the audience’s background beliefs. Thus, for example, a video broadcast by a reputable news organisation is powerful evidence for the real occurrence of the events it purports to depict. Video footage shared by an anonymous or pseudonymous account on social media, without the backing of more reputable sources, provides limited evidence – even if it appears realistic. In this sense, the evidential force of video footage is like that of testimony. Testimony from a source who is known to be both honest and competent on the topic is a powerful form of evidence. Testimony from one who is either dishonest or incompetent, or indeed one whose reliability is unknown, amounts to relatively weak evidence.
Deepfakes ought not lead one to a universal scepticism toward video footage
Some philosophers have argued that the status of video evidence as comparable to that of testimony is itself a novelty, brought about by innovations, like deepfakes, that grant individuals enhanced ability to manipulate video footage. These philosophers suggest that, while video recordings once conferred perceptual knowledge, they can now confer only something like testimonial knowledge. In less technical terms, watching video footage of an event was once a way of witnessing that event unfold, but is now more akin to being told that the event occurred. But this is an oversimplification. Deepfakes offer new ways of manipulating videos, but videos have always been manipulable. The term ‘cheap fakes’ was coined by Britt Paris and Joan Donovan to describe a range of relatively low-tech techniques for manipulating videos, including altering the speed, using lookalikes, and misrepresenting the surrounding context. The possibility of manipulating video footage in these ways is inevitable and ever-present, and thus, even before the emergence of deepfakes, video footage never spoke entirely for itself. Instead, assessing the evidential import of video footage has always required attention to the surrounding social context.
Videos are not alone in this regard. Photographs are also regarded as powerful evidence, despite their evident manipulability. Nor is this manipulability attributable to Photoshop or to any specific technological innovation. In the early days of photography, techniques for doctoring photographs served to compensate for technical limitations that otherwise prevented photographs from accurately reflecting real scenes. More recently, Stalin had his political enemies airbrushed out of official photos, and he was hardly alone in this practice. Reasons of politics, pride and sheer pettiness have often inspired authoritarian leaders to interfere in the photographic record. Practices of this sort should give one pause about taking historical photos as face value, but they do not warrant blanket scepticism toward the photographic record. Rather, our credulity toward photographs ought to depend on the reliability of the sources of those photographs. Similarly, deepfakes ought not lead one to a universal scepticism toward video footage. Instead, confidence in video footage should be sensitive to the reliability of its sources.
In contrast to more familiar epistemological puzzles, like those articulated by Stroud and Descartes, we have the resources to solve the epistemological challenge posed by deepfakes. But to say that the challenge can be solved is not to say that the challenge will consistently be solved. Trust is a double-edged sword. Insofar as reliance on trusted sources is the solution to the epistemological threat of deepfakes, it is a matter of great concern that trust can be, and sometimes is, improperly withheld, misdirected or exploited. Footage from the first manned Moon landing quickly generated collective knowledge of the event, partly because the footage was broadcast by widely trusted sources. But not everyone believed their eyes or the sources of the footage. Conspiracy theories quickly emerged, according to which the footage was fabricated, possibly with the technical assistance of Stanley Kubrick, who had recently directed 2001: A Space Odyssey (1968). The episode illustrates how, against a backdrop of distrust, technical innovations can impact the perceived evidential weight of videographic evidence.
Moreover, just as unwarranted distrust can deprive legitimate video footage of its evidential power, unwarranted trust can bolster the deceptive power of even crudely manipulated videos. Indeed, at present, most visual misinformation takes the form of cheap fakes, rather than deepfakes. When an audience improperly puts trust in a video’s source, that audience may fail to notice even obvious indications that the video has been manipulated. This is especially true for audiences ideologically motivated to believe the content of the video.
The epistemological threat posed by deepfakes is not illusory, but nor is it insurmountable. Recognising this is essential to preventing the epistemic apocalypse narrative from becoming a self-fulfilling prophecy of paranoia. Avoiding this outcome will require addressing underlying pathologies of trust and distrust that allow misinformation to thrive and that compromise the authority of legitimate information.