LLM Jailbreak

Last reviewed May 15, 2026 by Moderation API

An LLM jailbreak is a prompt or sequence of prompts crafted to bypass a language model's built-in safety rules and elicit content the model is trained to refuse. Techniques include role-play framing, obfuscated instructions, and multi-turn manipulation that gradually erodes the model's guardrails.

LLM Jailbreak

Find out what we'd flag on your platform