Deepfake Tools of the Trade

Published in

Immerse

11 min readDec 21, 2021

The following is an excerpt from a new report, JUST JOKING! Deepfakes, Satire and the Politics of Synthetic Media, co-produced by Sam Gregory at WITNESS and Katerina Cizek at Co-Creation Studio at MIT Open Documentary Lab. The report was written by Henry Ajder and Joshua Glick.

Tools, Memes, and Tough Cases

The development of accessible, easy-to-use apps has helped spur deepfakes’ proliferation across the Internet.

Wombo creates lip-synching face portraits that can perform an array of songs. Reface superimposes a face onto existing gifs. MyHeritage’s “deep nostalgia” feature animates eyes, faces, and mouths from old photographs. Sway uses motion filters to transform a static pose into a dancing or stunt-filled sequence. FaceApp allows users to age and contort the image of a face. Zao draws on a film and TV clip library to enable voice modulation and faceswaps.

These apps have facilitated the synthetic “memeification” of both well-known figures and everyday people using them on their friends and family. Boasting the amusement of variation and relying on just enough insider knowledge, deepfake memes attract a subculture of interest among like-minded viewers and fan communities, especially on platforms such as TikTok. Making these memes requires as little as one image of the targeted individual, and the user interface is designed to feel intuitive.

Apps like Wombo, MyHeritage, Reface and FaceApp have helped spur deepfakes’ proliferation across the Internet.

This level of accessibility does come with constraints. Most apps have a limited array of pre-selected templates, which make it easy for viewers to replicate and modify pre-existing media. Some makers and viewers consider the low production values of such app-generated deepfakes to be part of the aesthetic. The glitchy, grainy image quality feeds into the absurdist fantasies that the memes conjure, particularly when applied to well-known Hollywood stars or cartoon characters.

Software packages such as DeepFaceLab facilitate the making of more elaborate and polished videos. This approach tends to require more expertise, training data, and expensive hardware. The maker plays an active role during each step of production rather than simply uploading a picture into an app. Nonetheless, the prime interest with most of these more professionally rendered videos remains the same: inserting movie stars, musicians, and politicians into well-known slices of screen culture.

Nicolas Cage was one of the first celebrities to surface as a deepfake meme in 2018. The actor’s manic eccentricities, span of roles (including his starring performance in the all-too-relevant Face/Off), and off-screen antics allowed him to build on his highly visible presence within Internet culture. This flurry of “Cagefakes” reflects the entertainment value of AI-enabled media and the ease with which it can be made by enthusiasts. Another popular fancentric deepfake meme along these lines was Back to the Future featuring Tom Holland and Robert Downey Jr.

A more elaborate and participatory project is Baka Mitai, in which any face can be animated to sing along to the Karaoke song “Baka Mitai” from the Xbox game Yakuza 0. It was initially created using open-source software and YouTube tutorials, but after going viral on TikTok, an app version emerged. Almost anyone can now create Baka Mitai-esque videos lip-synced to other popular songs or featuring custom face movements.

While beloved by fans, deepfake entertainment can cause confusion. In early 2021, strikingly realistic deepfakes of Tom Cruise created by expert VFX designer Chris Ume and professional Tom Cruise impersonator Miles Fisher went viral on TikTok. The videos of Cruise playing golf, performing a magic trick, and imitating a snapping turtle were not intended to deceive and were collectively published on the account DeepTomCruise. However, the precision of Fisher’s impersonations, combined with Ume’s expert AI work, tricked some viewers into thinking the videos were authentic when they began to circulate more widely.

In other instances, deepfake tools have stoked legal controversy. The Canadian psychologist Jordan Peterson threatened a lawsuit over NotJordanPeterson, a text input-to-AI enabled speech web app that could make the combative right-wing culture warrior and free-speech advocate say anything users wanted. The site owners quickly abdicated. On another occasion, Jay-Z unsuccessfully sought the removal of deepfakes from YouTube that simulated him reciting Shakespearean soliloquies and singing Billy Joel’s “We Didn’t Start the Fire.” The hip hop artist’s DMCA takedown notice claiming that the deepfakes violated copyright was ultimately rejected by Google. The creator, Vocal Synthesis, had previously made similar deepfakes depicting Queen Elizabeth II reading lyrics by the punk band the Sex Pistols and conservative pundit Tucker Carlson reading the Unabomber Manifesto.

In a voice deepfake created by Vocal Synthesis, we can hear Jay-Z rapping Shakespeare’s “To Be or Not To Be” and Billy Joel’s “We Didn’t Start the Fire.” The hip hop artist unsuccessfully sought the removal of the videos claiming copyright infringement.

Framing the Conversation: Use and Abuse of Accessible Design

Given the popularity of homemade deepfakes and the ease with which they can circulate, these apps call for more attention. It is perhaps ironic that Peterson, a free-speech absolutist, reacted so strongly to the technologies themselves, claiming that “the sanctity of your voice, and your image, is at serious risk.” Still, NotJordanPeterson’s seemingly limitless possibilities do raise questions of ethical design. As Henry Ajder and Nina Schick wrote in an article for Wired, the more open an app, the more susceptible it is to malicious uses.

In confronting questions of design protocol, apps will need to reconcile with Terms of Service Agreements, potential limitations on likeness, and restrictions concerning who can make deepfakes. This is especially true since media literacy remains underdeveloped among so many young and old makers and consumers alike. Finding the right balance between creative interests and individuals’ rights and restrictions over production will be crucial. WITNESS has begun to workshop a draft code of ethics that incorporates rights, potential harms, technological advances, and creative freedoms.

Jay-Z’s resistance to being re-created via AI also points to how, in a different scenario, bad actors could dismantle the legal protections granted to deepfakes (and other forms of expressive art and parody) by claiming that they violate copyright infringement. This is a particularly acute problem when these complaints are filed through automated social media moderation systems. For example, YouTube’s Content ID protocol struggles to distinguish between “fair use” and illegitimate cases. Threats of legal action could be used to intimidate creators who cannot afford costly court fights. To adjudicate all these cases adequately, the major platforms would need to hire and train many more moderators than they are currently willing to pay for.

Art and Documentary Horizons

Practitioners of documentary art are continuously pushing the boundaries of nonfiction storytelling, encouraging a more media-literate and socially engaged public. Work using synthetic media increasingly extends beyond short-form videos into larger projects. Bill Posters and Daniel Howe’s Spectre installation features deepfakes of celebrities, politicians, and tech entrepreneurs boasting about how they manipulate users by harvesting their data. These absurdist performances draw viewers’ attention to the ways that techno-utopian celebrations of a networked world mask pernicious forms of exploitation. As Posters and Howe explain, the project enables “audiences to experience the inner workings of the Digital Influence Industry” and “feel what is at stake when the data taken from us in countless everyday actions is used in unexpected and potentially dangerous ways.”

Bill Posters and Daniel Howe’s Spectre installation features deepfakes of celebrities, politicians, and tech entrepreneurs boasting about how they manipulate social media users. Some of those featured are Kim Kardashian, Mark Zuckerberg and Freddy Mercury.

Stephanie Lepp’s Deep Reckonings features synthetic re-creations of men who have abused power attempting to grapple with their own words and actions. Vignettes range from U.S. Supreme Court Justice Brett Kavanaugh at a press conference to Alex Jones in an interview with Joe Rogan. Their confessions seem to be a complete inversion of their public lives, casting them in a moral light. These videos could be interpreted as conjuring fantasies of remorse and reconciliation, but they might also be seen as placing the real-world personas of these men in sharp relief, creating a more truthful account of their egregious words and actions. Lepp views Deep Reckonings as a way to use “our synthetic selves to elicit our better angels.” Viewing a synthetic version of oneself addressing personal struggles, Lepp argues, could be a means for deepfakes to elicit self-betterment and personal growth. This is part of a broader wave of “Deepfakes for Good,” including projects focused on mental health, education, and social justice.

The art installation In Event of Moon Disaster presents a counterfactual history of the 1969 Apollo 11 moon landing, encouraging viewers to think critically about how we construct narratives of the past and how we understand our current information ecosystem. Directed by Francesca Panetta and Halsey Burgund in collaboration with the MIT Center for Advanced Virtuality, the installation features a period American living room in which a fabricated TV news report plays on loop. At the heart of the report is a deepfaked Richard Nixon reading the contingency speech his administration had prepared in case the Apollo 11 space mission failed and the astronauts became stranded on the moon.

Complete with a “reveal” component of the installation and an accompanying educational website, In Event of Moon Disaster serves as both revelation and warning. According to Panetta and Burgund, the project was designed to raise awareness of “how new technologies can obfuscate the truth around us” but also to demonstrate how the same technologies can be used constructively.

Left: Richard Nixon reading a contingency speech after the Apollo 11 space mission “failed.” Right: Museum installation features a period American living room in which a fabricated TV news report plays Nixon’s contingency speech on loop. According to the artists, the project was designed to raise awareness of “how new technologies can obfuscate the truth around us.”

That charge is also taken up by the feature-length documentary Welcome to Chechnya. Director David France and VFX supervisor Ryan Laney created digital faces to shield twenty three persecuted members of the Chechen LGBTQ+ community. France and Laney found queer activists in NYC to “lend their faces” to the project as an activist gesture. They were photographed with a nine-camera setup that captured their faces from every possible angle, then matched with the film’s subjects through a deep-learning process, tweaked with meticulous effects work. France and Laney originally considered other methods such as blurring the subjects’ faces or casting them in shadow, but opted for this bespoke form of faceswapping. This way, identities are protected while preserving the subjects’ humanity, allowing them to express themselves more fully to viewers. A slight halo surrounds their heads, signaling that their faces have indeed been altered, but the effect doesn’t distract from the action. Here, the deepfakery (a term that France himself bristles at) serves a practical, narrative function along with a higher ethical purpose.

In the film Welcome to Chechnya, deepfakes are used to protect the identity of persecuted LGBTQ+Chechens.

Framing the Conversation: Labeling and Disclosure

Signaling explicitly to viewers that they are watching a deepfake can be a way of ensuring transparency. In more top-down, legacy news publications and broadcasting, it was easy enough to frame material with a caption or a parenthetical note, a label above a headline, or a program host’s wry introducion. Online, and especially on social media, such transformations and parodies can more easily reach a viewer as decontextualized fragments. The dilemma is that explicit labels (watermarks, pre-roll warnings, etc.) might neutralize the satire’s impact, and subtler markers (or none at all) might lead to misinterpretation by viewers or platform moderators. The protocols of such “semantic signposting” are far from clear. And there is no one-size-fits-all solution; as illustrated by the debate about a use of synthetic voice in Roadrunner, a recent documentary about the late chef and television host Anthony Bourdain, questions of audience expectations and genre conventions vary even within the documentary space.

Who is responsible for providing context for deepfakes, and what might be an appropriate marker? The websites for Deep Reckoning and Spectre frame their projects as works of art, but their respective creators, Lepp and Posters, have different perspectives regarding labeling that they shared in a Deepfakery video episode. Lepp added watermarks along with introductory and concluding disclaimers, believing that they emphasize “the power of the medium, that you can know it is fake and it will still influence or move you.” By contrast, Posters prefers not to use prefatory text, believing that it undermines the videos’ rhetorical power. He insists that the realism of the performances, along with the way they might challenge gatekeepers to assess and categorize them, is part of the project’s point.

For documentary, disclosure poses an important ethical issue. The slight halo or “afterblur” that France created around the subjects’ heads in Welcome to Chechnya continuously reminds viewers that the faces have been altered. In Roadrunner, director Morgan Neville put no indicators around the synthetic audio of the deceased culinary adventurer “reading aloud” a despairing email to friend and artist David Choe. In addition, Neville claimed to have gotten consent from Bourdain’s estate via his literary executor and widow Ottavia Busia. However, Busia took to Twitter to deny it. Many understood Neville’s comments about the scenes in a New Yorker interview — “we can have a documentary ethics panel about it later” — to be flippant and dismissive.

Given the ease with which deepfakes can be made, altered, and shared, social media companies need to take seriously how they’re managed. A nuanced and interpretive approach to moderation would assess the implications of different forms of sound and image fabrication. Too light a touch could confuse viewers or lead to the spread of online misinformation. Too aggressive an approach could result in deepfake art being unable to find a platform.

A series of articles by First Draft and The Partnership on AI asserts the need for more conscientious labeling, but more research is needed on the potential risks and benefits. Legal and policy guardrails can also impact these processes. Countries that already heavily police their public sphere, such as China, will likely respond to deepfakes in the same sweeping and punitive fashion as they do other forms of sociopolitical critique. In the United States, social media companies’ hands-off approach to the content they host has been shaped by Section 230 of the Communications Decency Act, which states: “No provider or user of an interactive computer service shall be treated as the publisher or speaker of any information provided by another information content provider.” Given the outsized roles these platforms now play in how media and information circulate, many watchdogs, critics, and politicians have argued that this framework needs to be revisited.

Specific pieces of legislation on the books involving synthetic media include California’s AB 730, which bans political deepfakes 30 days before an election, unless they come with an explicit disclaimer or are clearly works of satire/parody. Policies of this kind strive to ensure campaign equity and protect the democratic process.

See the rest of the report here JUST JOKING!

For more news, discourse, and resources on immersive and emerging forms of nonfiction media, sign up for our monthly newsletter.

Immerse is an initiative of the MIT Open DocLab and Dot Connector Studio, and receives funding from Just Films | Ford Foundation and the MacArthur Foundation. The Gotham Film & Media Institute is our fiscal sponsor. Learn more here. We are committed to exploring and showcasing emerging nonfiction projects that push the boundaries of media and tackle issues of social justice — and rely on friends like you to sustain ourselves and grow. Join us by making a gift today.

Deepfake Tools of the Trade

Tools, Memes, and Tough Cases

Framing the Conversation: Use and Abuse of Accessible Design

Art and Documentary Horizons

Framing the Conversation: Labeling and Disclosure

Written by Katerina Cizek