← Concept library

Foundations

Model-Based Reinforcement Learning

Model-based RL learns an explicit dynamics model of the environment and uses it for planning or synthetic data generation, trading model bias for dramatic gains in sample efficiency.

advanced · 8 min read · Premium

AlphaGo needed around 5 million self-play games to surpass human Go players. A well-tuned model-based agent tackling the same problem can learn a comparable policy with orders of magnitude fewer environment interactions, because it spends most of its compute on a learned simulator rather than the real world. That asymmetry is the central argument for model-based reinforcement learning (MBRL).

The fundamental split: model-free vs. model-based

In model-free RL, the agent learns a value function or policy entirely from samples of real experience. Each gradient step consumes data collected by actually executing actions. The environment is a black box: you query it, observe (s, a, r, s'), and move on. No structure is assumed.

Keep reading with Pro.

You're reading the preview. Unlock the full concept plus the library, study plans, the AI mentor, and daily emails.

Sign in to save and react.
Share Copied