Back to Glossary

Red Teaming (AI)

Last reviewed by Moderation API

AI red teaming is the practice of adversarially probing a machine learning system — especially large language and multimodal models — to surface unsafe, biased, or policy-violating outputs before deployment. It combines manual attack crafting, automated prompt generation, and structured evaluation against a defined harm taxonomy.

Find out what we'd flag on your platform