Back to Glossary

What is Content Moderation?

Last reviewed by Moderation API

Content moderation is the set of policies, people, and technology a platform uses to decide what user-submitted content stays up, what comes down, and what gets escalated for a closer look. It is what separates a usable social product from an unmanageable dump. Over the last decade it has gone from a back-office function to a regulated, board-level concern for any company that hosts third-party content.

Why platforms moderate at all

Any site that accepts user submissions has the same problem. Social networks, marketplaces, dating apps, game chat, review sites: a small fraction of users will post things that are illegal, abusive, deceptive, or wildly off-topic. Left alone, those posts drive other users away, create legal exposure, and scare off advertisers.

Moderation protects users from child sexual abuse material (CSAM), terrorist content, hate speech, harassment, spam, scams, non-consensual intimate imagery, self-harm content, and fraud. In 2023, the National Center for Missing and Exploited Children received more than 36 million CyberTipline reports, most of them forwarded by platform moderation systems.

Types of moderation

Moderation is usually categorized by when and how review happens.

  • Pre-moderation reviews content before it is published. It is the safest model but adds latency and does not scale well above a certain volume.
  • Post-moderation publishes immediately and reviews after the fact, either proactively through automated scanning or reactively through user reports.
  • Reactive moderation depends on user flags. It is common on smaller communities and forums where the volume is manageable.
  • Proactive moderation uses classifiers, hash matching, and behavioral signals to surface harmful content before anyone reports it. Meta's Community Standards Enforcement Report states that over 95% of the hate speech Facebook removes is detected this way.
  • Distributed moderation leans on community voting, as on Reddit or Stack Overflow.

Humans and machines, together

Modern moderation is almost always a hybrid operation.

Automated systems handle the overwhelming majority of decisions: hash databases like PhotoDNA for known CSAM, perceptual hashing for video, NLP classifiers, computer vision models, and more recently large language models that can weigh context. Human reviewers, typically organized in trust and safety teams or outsourced to BPOs, handle the ambiguous edge cases, policy appeals, and training-data labeling that keep the models honest. LLM-based moderation has largely displaced older keyword and bag-of-words systems because it can reason about context, sarcasm, and multilingual content that previously required a person. Platforms like Moderation API expose these classifiers through a single API so product teams can ship safety features without building the ML stack in-house.

The scale problem

Scale is what makes moderation genuinely hard. YouTube users upload more than 500 hours of video per minute. Meta processes billions of posts per day. Even a 99.9% accurate classifier generates millions of mistakes at that volume, which is why appeals, transparency reports, and human-in-the-loop review are not optional.

Regulation and the rise of the T&S function

From a compliance standpoint, moderation is no longer a discretionary activity.

The EU Digital Services Act (DSA), which became fully applicable in February 2024, imposes transparency, risk assessment, and notice-and-action obligations on online platforms, with heightened duties for Very Large Online Platforms (VLOPs) with more than 45 million EU users. The UK Online Safety Act, enacted in 2023 and enforced by Ofcom, imposes a duty of care and carries fines of up to 10% of global turnover. Germany's NetzDG, Australia's Online Safety Act, Singapore's Online Safety Act, and Ireland's Online Safety and Media Regulation Act add overlapping obligations.

The Trust and Safety Professional Association (TSPA) has become the main industry body for practitioners working through all of it.

What effective programs have in common

Good moderation programs tend to share a few things: community guidelines that are public and specific enough to be actionable, layered detection that combines hashes, classifiers, and human review, a working appeals path, regular transparency reporting, dedicated escalation for CSAM and terrorism, and a policy team that updates the rules as new harms appear. None of this is glamorous work, and most of it is invisible to the users it protects.

Find out what we'd flag on your platform