NLP Foundations
Context Windows and Long-Context Models
Why a model advertised at a million tokens can still lose the fact in the middle, and what actually sets the limit: memory, compute, position, and attention itself.
advanced · 10 min read · Premium
The advertised context length is a marketing number and an engineering ceiling; the usable context is something smaller and harder to measure. A model can accept a million tokens, keep its perplexity stable across all of them, and still fail to answer a question whose evidence sat in the middle of the input. Two failure surfaces hide behind one number. The first is whether the model can represent and process a sequence that long at all. The second is whether it actually attends to the right part of it. Confusing the two is the most common mistake teams make when they buy "long context" and wonder why retrieval still beats it.
What sets the ceiling
Three pressures bound how long a window you can train and serve.
Keep reading with Pro.
You're reading the preview. Unlock the full concept plus the library, study plans, the AI mentor, and daily emails.