← Concept library

Safety & Alignment

Constitutional AI and RLAIF

How Anthropic replaced human harmlessness labels with a written constitution and a critique-and-revise loop, and why this makes alignment auditable.

advanced · 9 min read · Premium

This concept is for Pro members.

Unlock the full library, study plans, the AI mentor, and daily emails.

See plans