Structured Output Coercion

You asked for JSON and got a friendly paragraph, a fenced code block, and then the JSON with a trailing comment. Your parser threw. This is the daily reality of getting machine-readable output from a language model through prompting: the model is a next-token predictor that has seen a million JSON blobs wrapped in markdown chatter, and left to its own devices it reproduces the chatter too. Prompt coercion is the set of techniques that raise the parse-success rate from "usually" to "almost always" without touching the decoder. It is cheap, portable across providers, and works with any chat endpoint. What it never does is guarantee the output validates. That distinction is the whole point of this concept, and the reason constrained decoding exists as the harder-guarantee alternative.

Show the schema and a filled example

The single highest-leverage move is to stop describing the format in prose and start showing it. A model completes patterns; give it the pattern to complete. Put the target schema in the prompt, then a fully populated example instance right next to it:

Return a JSON object with this shape:
{"sentiment": "positive|negative|neutral", "score": <float 0-1>, "spans": [<string>]}

Example for the input "the food was cold but staff were kind":
{"sentiment": "neutral", "score": 0.55, "spans": ["food was cold", "staff were kind"]}

The filled example does more work than the schema. It pins the enum spelling, shows how a float should look (0.55, not "0.55" or 55%), and demonstrates the array element type. Anthropic's prompting guidance is explicit that examples are one of the most effective levers for shaping output, and that showing the exact desired format beats explaining it. One example fixes the obvious ambiguities; two or three cover the edge cases (empty arrays, the neutral case, the missing-field case) that a single example leaves undefined.

Fence the fields with tags or delimiters

When you need more than one field, or a field that itself contains free text, delimiters stop the fields from bleeding into each other. XML-style tags are the workhorse here because they are unambiguous to both the model and your extractor:

<answer>
  <reasoning>...the model's working...</reasoning>
  <result>{"decision": "approve", "limit": 5000}</result>
</answer>

Anthropic's docs recommend XML tags precisely for this: to separate the parts of a prompt and to structure the parts of the output so you can reliably pull out the piece you want. The extraction is then a regex or an XML parse for <result>...</result> rather than a hope that the JSON was the only brace-delimited thing in the response. Tags also give you a clean home for a reasoning field, which matters more than it looks (see the failure section). Plain delimiters (triple backticks, ---, sentinel strings like ===JSON===) work for simpler cases, but tags nest and self-document in a way raw delimiters do not.

Prefill the assistant turn

The most direct lever is to write the first few characters of the model's own response for it. Chat APIs let you seed the start of the assistant turn; whatever you put there, the model continues from. Seed it with { and the model is now completing a JSON object, not deciding whether to greet you first:

User:      Extract the fields as JSON.
Assistant: {

Because the response now begins mid-object, the model cannot emit "Sure, here's the JSON:" preamble; there is no room for it. The same trick with <result> forces a tagged response, and prefilling [ forces a JSON array. The cost is that the opening token is now yours, not the model's, so your parser must prepend the { you seeded before parsing. Prefilling is a prompt-only technique and still offers no guarantee about what comes after the brace, but it eliminates the single most common failure (prose around the JSON) almost entirely.

Use the tool-call channel as the structured surface

Every major API already has a mechanism that emits a JSON object conforming to a schema you supply: function/tool calling. You can exploit it even when you have no real tool to run. Define a single "tool" whose parameters are your desired output schema, force the model to call it, and read the arguments as your structured result. This routes the output through the provider's structured channel rather than the free-text channel, and the arguments arrive already parsed. OpenAI's Structured Outputs takes this further and will constrain generation to your JSON Schema at the decoder, which is no longer pure prompt coercion; it is the constrained-decoding path wearing a function-calling coat, and it is worth reaching for when the provider offers it. Note the boundary: a schema-declared tool call is prompt-side coercion unless the provider actually enforces the schema during decoding.

Retry and repair on parse failure

Even with all of the above, some fraction of responses will not parse. The pragmatic backstop is a loop: attempt to parse; on failure, send the broken output back with the parser error and ask for a correction. A repair prompt ("that was not valid JSON, the error was Unexpected token } at position 214, return only the corrected object") succeeds far more often than a blind retry because the model can see what it broke. Keep the loop capped at two or three attempts; a response that fails to parse three times usually signals a prompt problem, not a fluke, and burning tokens on a fourth attempt rarely helps.

When it falls down

Prompt coercion has no hard guarantee, and every failure mode below follows from that one fact. Constrained decoding closes most of them by construction; prompting only makes them rare.

Silent schema drift. The output parses as valid JSON but the fields are wrong: an enum the schema never listed, a string where you wanted a number, a renamed key. Nothing throws. The fix is validating against a real schema (Pydantic, JSON Schema) after parsing, not just checking that json.loads succeeded.
Escaping and nesting errors. Free text inside a JSON string field is where it breaks: an unescaped quote or newline, a stray backslash, a code snippet with its own braces. The deeper the nesting, the higher the rate. Fencing the free-text field in its own tag outside the JSON, or asking for it base64-encoded, sidesteps the escaping entirely.
Extra prose around the JSON. Preamble ("Here is the JSON:") or a trailing explanation. Prefilling the opening brace and instructing "return only the object, no other text" are the two levers; combining them is close to reliable.
Rigid formats can suppress reasoning quality. Forcing the model to emit only a terse structured object, with no room to think, measurably degrades answer quality on tasks that need working. The model spends its capacity satisfying the format instead of solving the problem. The fix is to give reasoning a place to live: add a reasoning field before the answer fields, or ask the model to reason in prose first and then emit the structured block, so the format constraint applies only after the thinking is done.

The honest summary: coercion buys you a high parse rate cheaply and portably, but the guarantee is statistical, not structural. When a malformed field means a failed transaction rather than a retried request, move the guarantee into the decoder.