What is a Deepfake?
Last reviewed by Moderation API
A deepfake is synthetic media generated or altered by deep learning models so that a real person appears to say or do something they never actually said or did. The term originated on Reddit in 2017 and now covers a wide range of techniques, from face-swapping in video to cloned voices generated from a few seconds of reference audio.
What once required a research lab and a GPU cluster can now be produced in minutes with consumer apps. That shift has turned deepfakes from a curiosity into a mainstream Trust & Safety problem.
How deepfakes are made
Most deepfakes are built on one of three model families. Generative adversarial networks (GANs) pit a generator against a discriminator to produce photorealistic faces. Autoencoders learn compressed representations of two faces and swap them frame by frame. Diffusion models, the same family behind Stable Diffusion and Midjourney, are now the state of the art for image and short-clip generation.
On the audio side, neural voice cloning systems can replicate a target speaker's voice from as little as three seconds of clean audio.
The tooling keeps getting cheaper, faster, and more open, which means detection has to keep pace with a moving adversary.
Where deepfakes cause harm
The largest documented harm category is non-consensual intimate imagery (NCII): deepfake nudes and sexual content generated from photos of real people, overwhelmingly women, without their consent. Multiple research reports have found that the vast majority of deepfake videos on the open web fall into this category.
A second major category is impersonation fraud. Cloned executive voices are used to authorize fraudulent wire transfers, and face-swapped video calls are used to bypass identity verification on financial platforms. Deepfakes are also used for political disinformation, revenge harassment, and sextortion of minors.
Detection approaches
Detecting deepfakes is an arms race. Early techniques relied on physiological cues like irregular blinking, inconsistent lighting on skin, or artifacts around hairlines, but modern generators have largely closed those gaps. Current detection blends classifier ensembles trained on large corpora of known fakes, provenance signals such as C2PA content credentials, watermarking schemes like Google's SynthID, and reverse-image search across known generator outputs. No single technique is robust on its own, which is why platforms typically layer multiple detectors and fall back on human review for high-stakes decisions.
The regulatory response
Regulators have started to catch up.
The US TAKE IT DOWN Act, signed into law in 2025, requires platforms to remove non-consensual intimate imagery, including AI-generated deepfake NCII, within 48 hours of a valid report. The EU AI Act requires disclosure when content has been artificially generated or manipulated. Several US states have passed their own deepfake laws covering elections and intimate imagery.
For platforms, the practical takeaway is that deepfake handling is no longer optional. Detection, reporting flows, and rapid takedown are table stakes.
