← Concept library

Inference Optimisation

Structured Generation and Constrained Decoding

How masking the logits at each decode step to only tokens a schema or grammar allows guarantees syntactically valid output, and where that guarantee stops.

intermediate · 8 min read · Premium

Ask a model for JSON and roughly some fraction of the time you get JSON wrapped in an apology, a trailing comma, a code fence, or a hallucinated field. Retrying and regex-scrubbing the output is the folk remedy, and it is a losing game at scale. Constrained decoding removes the problem at the source: at every decode step, before sampling, you set the logits of every token that would break the required structure to negative infinity. The model can only sample from what is still legal. If the grammar says the next character must be } or a digit, every other token in the 128k vocabulary is masked out and cannot be chosen, no matter how confident the model was about emitting prose. The output is valid by construction, not by luck.

Constraints as a state machine over the vocabulary

The mechanism starts by turning the desired shape into an automaton. A regular expression or a JSON schema (which, for a fixed set of fields and types, is a regular language) compiles to a finite-state machine (FSM): a set of states, and for each state a set of characters that advance it to a next state. ^\d{3}-\d{4}$ becomes seven states in a line; a JSON object schema becomes a larger FSM that walks {, a quoted key, :, a typed value, then either , or }.

Keep reading with Pro.

You're reading the preview. Unlock the full concept plus the library, study plans, the AI mentor, and daily emails.

Sign in to save and react.
Share Copied