Toxicity
Last reviewed by Moderation API
Toxicity in content moderation describes language that is harmful, abusive, or disruptive to a community — typically including hate speech, harassment, threats, and personal attacks. Toxicity classifiers score text on this dimension.
