← Concept library

Safety & Alignment

Jailbreaks and Refusal Robustness

How attackers reliably bypass model refusal training, why post-hoc filters are necessary but never sufficient, and how AILuminate measures what remains.

intermediate · 9 min read · Premium

This concept is for Pro members.

Unlock the full library, study plans, the AI mentor, and daily emails.

See plans