Foundations
Imitation Learning and Diffusion Policies
Why cloning a demonstrator's actions drifts into unseen states, and how generative action models such as diffusion policies and action chunking control the drift.
intermediate · 8 min read · Premium
Teach a robot by showing it. Collect a few hundred demonstrations of a human driving the arm through a task, train a network to map each observed state to the action the human took, and you have a policy. This is behaviour cloning, and it is the most natural idea in robot learning. It is also the idea with the sharpest hidden failure: a policy trained this way does not fail gracefully as it gets slightly worse, it fails catastrophically, because a small error moves the robot into a state the human never visited, and there the policy has no idea what to do. Ross, Gordon and Bagnell made this precise in 2010, and the fix is not more data in the naive sense; it is a different way of modelling what a demonstration even is.
Behaviour cloning and why the errors compound
Behaviour cloning treats control as plain supervised learning. Given demonstration pairs (state, action), fit a function pi(a | s) that reproduces the expert's action. Training is standard: minimise the loss on the demonstration set. The trouble is that supervised learning assumes the test inputs are drawn from the same distribution as the training inputs, and in sequential control that assumption is false the moment the policy starts acting.
Keep reading with Pro.
You're reading the preview. Unlock the full concept plus the library, study plans, the AI mentor, and daily emails.