Red Teaming (AI)
Last reviewed by Moderation API
AI red teaming is the practice of adversarially probing a machine learning system — especially large language and multimodal models — to surface unsafe, biased, or policy-violating outputs before deployment. It combines manual attack crafting, automated prompt generation, and structured evaluation against a defined harm taxonomy.
