Agents & Tool Use
Prompt Engineering and Structured Output Patterns
Explicit criteria, few-shot prompting, tool_use with JSON schemas, validation-retry loops, and the Message Batches API.
intermediate · 9 min read
Effective prompts use explicit criteria, targeted examples, and structured output enforcement to achieve reliable, consistent results.
Explicit Criteria Beat Vague Instructions
"Be conservative" and "only report high-confidence findings" do not improve precision. Instead, define specifically what to report and what to skip: "Report: bugs, security issues. Skip: minor style, local patterns." Define severity levels with concrete code examples.
High false positive rates in one category undermine developer trust across all categories. Temporarily disable noisy categories while improving their prompts.
Few-Shot Prompting
Few-shot examples are the most effective technique for consistent, formatted output. Use 2-4 targeted examples that:
- Show reasoning for why one action was chosen over plausible alternatives
- Demonstrate the desired output format
- Handle ambiguous scenarios that instructions alone produce inconsistent results for
- Show extraction from varied document structures
Examples enable the model to generalize to novel patterns, not just match pre-specified cases.
Structured Output via tool_use
Define extraction tools with JSON schemas and use the tool_use response to get guaranteed schema-compliant output. This eliminates JSON syntax errors entirely.
tool_choice controls behavior: "auto" lets the model choose, "any" forces a tool call (any tool), forced selection ({"type": "tool", "name": "..."}) forces a specific tool.
Design nullable fields for information that may not exist in source documents - this prevents the model from fabricating values to satisfy required fields. Add "unclear" enum values and "other" + detail string patterns for extensible categorization.
Validation-Retry Loops
When extraction validation fails, retry with the original document, the failed extraction, and specific validation errors. But recognize when retries won't help: if the information simply isn't in the source document, no amount of retrying will extract it.
Message Batches API
50% cost savings with up to 24-hour processing and no latency SLA. Use for overnight reports, weekly audits, nightly test generation. Never use for blocking pre-merge checks. Correlate request/response pairs with custom_id.