Architectures & Scaling
Model Collapse from Recursive Training
When a model is trained repeatedly on its own outputs, tail distributions erode and the model progressively forgets rare but important knowledge, eventually producing impoverished, homogenised text.
advanced · 8 min read · Premium
By the time GPT-4-class models had been deployed publicly, a substantial fraction of new text appearing on the internet was already generated by models trained on earlier internet text. Feed that new web corpus back into the next training run, and you have a closed loop. Shumailov et al. (2023) named the resulting degradation model collapse: "use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear."
The word "irreversible" is doing real work there. This is not a temporary calibration error you can anneal away; it is a structural loss of distributional knowledge that compounds with each generation.
Keep reading with Pro.
You're reading the preview. Unlock the full concept plus the library, study plans, the AI mentor, and daily emails.