Reasoning Models
Process Reward Models and Verifiable Rewards
Why scoring every step of a reasoning trace beats scoring only the final answer, and how Ai2 and DeepSeek replaced PRMs entirely with programmatic correctness checks.
advanced · 9 min read · Premium
This concept is for Pro members.
Unlock the full library, study plans, the AI mentor, and daily emails.
See plans