Back to Glossary

What is Hate Speech?

Last reviewed by Moderation API

Hate speech sits at the intersection of free expression, public safety, and platform liability, and it is one of the most contested categories in all of content moderation. Unlike CSAM or terrorist content, there is no single global definition, no universal legal threshold, and no classifier that can adjudicate it without context. Platforms still have to make billions of decisions about it every year, which is why hate speech is the category that stress-tests every moderation program.

Legal definitions vary sharply by region

The United States has no federal law against hate speech. Under the First Amendment and precedents like Brandenburg v. Ohio (1969) and Matal v. Tam (2017), speech that expresses hatred is constitutionally protected unless it crosses into a true threat, incitement to imminent lawless action, or targeted harassment.

Europe goes in the opposite direction.

The EU Framework Decision on Combating Racism and Xenophobia (2008/913/JHA) requires member states to criminalize public incitement to violence or hatred based on race, color, religion, descent, or national or ethnic origin. Germany's NetzDG, in force since 2018, requires large social networks to remove "manifestly unlawful" hate content within 24 hours of notification. The UK Online Safety Act 2023 treats hate offenses as priority illegal content that platforms must proactively mitigate. Sections 318 and 319 of Canada's Criminal Code criminalize advocacy of genocide and public incitement of hatred. At the international level, Article 20 of the ICCPR requires states to prohibit advocacy of national, racial, or religious hatred that constitutes incitement.

The practical result is that a post that is lawful in Texas can be illegal in Berlin.

How platforms define it

Because the legal picture is inconsistent, major platforms write their own policies and apply them globally. Meta's Hateful Conduct policy prohibits direct attacks against people based on protected characteristics including race, ethnicity, national origin, disability, religious affiliation, caste, sexual orientation, sex, gender identity, and serious disease. YouTube's Hate Speech Policy bans content promoting violence or hatred against individuals or groups based on similar attributes. TikTok's Community Guidelines prohibit hateful behavior and hateful ideologies, including content tied to known hate groups. Across these policies the usual coverage includes slurs, dehumanizing comparisons, calls for exclusion or segregation, and Holocaust or genocide denial.

Detection is a hard problem

Automated hate speech detection is one of the harder applied NLP tasks for several reasons:

  • Context dependence: the same word can be a slur, a reclaimed identity term, an academic quotation, or counter-speech depending on who is speaking and to whom.
  • Dog whistles and coded language: extremist communities constantly invent new terms, numeric codes, and emoji substitutions to get around keyword filters.
  • Multilingual and code-mixed content: most historical training data is English, but hate speech circulates in every language, often with mixed scripts in a single post.
  • Memes and images: much of the current supply lives in image macros, stickers, and short video, which means detection now requires multimodal models and not just text classifiers.
  • Sarcasm and counter-speech: a post that quotes hate speech in order to condemn it can look nearly identical to the original at the token level.

Modern systems combine transformer-based text classifiers, image and OCR models, user-behavior signals, and LLM reviewers that can reason about context. Services like Moderation API give product teams access to multilingual classifiers without having to train their own.

What the transparency reports show

Meta's Community Standards Enforcement Report states that hate speech prevalence on Facebook fell from roughly 10 to 11 views per 10,000 in 2020 to around 1 to 2 per 10,000 in recent quarters, with proactive detection above 95%. YouTube has removed millions of videos for hateful or abusive content since introducing its dedicated policy in 2019. TikTok's quarterly enforcement report discloses removal volumes by category and region, as required by the EU DSA.

Removal, reduction, or counter-speech

Researchers, civil society groups, and regulators still disagree on the right response.

Outright removal is fast, but it raises censorship concerns and can push speech to less moderated platforms. Reduction strategies demote content in recommendations without deleting it, which lowers reach while sidestepping the removal debate. Counter-speech and digital literacy programs, championed by groups like the Dangerous Speech Project, aim to undermine hateful narratives instead of erasing them.

Most large platforms end up running a layered version of all three: remove the worst, reduce the borderline, and invest in user controls and media literacy for the rest.

Find out what we'd flag on your platform