Claude Certified Architect - Foundations: The Complete Exam Preparation Guide
June 03, 2026 · 120 min read
The Claude Certified Architect - Foundations (CCAF) certification validates that you can make informed decisions about trade-offs when implementing real-world solutions with Claude. It tests foundational knowledge across four core technologies: Claude Agent SDK, the Claude API, Claude Code, and Model Context Protocol (MCP).
This guide is your one-stop exam preparation resource. Every domain, every task statement, every concept is covered here with visual diagrams, worked examples, and practice questions. Check off items as you master them, and your progress is saved automatically in your browser.
Exam Overview
Before diving into the domains, understand what you are signing up for.
flowchart LR
subgraph Format["Exam Format"]
Q["Multiple Choice"]
SC["Scenario-Based"]
P["Pass/Fail"]
end
subgraph Scoring["Scoring"]
S1["Scaled: 100-1000"]
S2["Pass: 720+"]
S3["No penalty for guessing"]
end
subgraph Structure["Structure"]
ST1["4 scenarios per exam"]
ST2["Drawn from 6 total"]
ST3["5 content domains"]
end
Format --> Scoring --> Structure
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
class Q,SC,P,S1,S2,S3,ST1,ST2,ST3 blue
Key exam facts:
- All questions are multiple choice with one correct answer and three distractors
- Scored on a scale of 100-1,000 with a minimum passing score of 720
- Unanswered questions are scored as incorrect - no penalty for guessing, so always answer
- Questions are scenario-based, grounded in realistic production contexts
- 4 random scenarios are presented from a pool of 6
Target Candidate Profile
The ideal candidate is a solution architect with 6+ months of hands-on experience building with Claude APIs, Agent SDK, Claude Code, and MCP. You should have practical experience with:
- Building agentic applications with the Claude Agent SDK
- Configuring Claude Code for team workflows
- Designing MCP tool and resource interfaces
- Engineering prompts for reliable structured output
- Managing context windows across long interactions
- Integrating Claude into CI/CD pipelines
- Making escalation and reliability decisions
Domain Weightings
pie title CCAF Exam Domain Weightings
"D1: Agentic Architecture (27%)" : 27
"D2: Tool Design & MCP (18%)" : 18
"D3: Claude Code Config (20%)" : 20
"D4: Prompt Engineering (20%)" : 20
"D5: Context & Reliability (15%)" : 15
| Domain | Weight | Focus Areas |
|---|---|---|
| D1: Agentic Architecture & Orchestration | 27% | Agentic loops, multi-agent systems, hooks, session management |
| D2: Tool Design & MCP Integration | 18% | Tool descriptions, error responses, MCP servers, built-in tools |
| D3: Claude Code Configuration & Workflows | 20% | CLAUDE.md hierarchy, skills, rules, plan mode, CI/CD |
| D4: Prompt Engineering & Structured Output | 20% | Few-shot, JSON schemas, batch processing, multi-pass review |
| D5: Context Management & Reliability | 15% | Context preservation, escalation, error propagation, provenance |
The Six Exam Scenarios
Every exam question lives inside one of these realistic production scenarios. You will see 4 of 6 on your exam.
flowchart TB
subgraph S1["Scenario 1: Customer Support Agent"]
S1D["D1 + D2 + D5"]
S1T["Agent SDK, MCP tools,<br/>escalation, refund logic"]
end
subgraph S2["Scenario 2: Code Generation"]
S2D["D3 + D5"]
S2T["Claude Code, CLAUDE.md,<br/>slash commands, plan mode"]
end
subgraph S3["Scenario 3: Multi-Agent Research"]
S3D["D1 + D2 + D5"]
S3T["Coordinator-subagent,<br/>context passing, synthesis"]
end
subgraph S4["Scenario 4: Developer Productivity"]
S4D["D2 + D3 + D1"]
S4T["Built-in tools, MCP servers,<br/>codebase exploration"]
end
subgraph S5["Scenario 5: CI/CD Integration"]
S5D["D3 + D4"]
S5T["-p flag, JSON output,<br/>batch API, code review"]
end
subgraph S6["Scenario 6: Structured Extraction"]
S6D["D4 + D5"]
S6T["JSON schemas, tool_use,<br/>validation loops, confidence"]
end
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
class S1D,S2D,S3D,S4D,S5D,S6D purple
class S1T,S2T,S3T,S4T,S5T,S6T blue
Domain 1: Agentic Architecture & Orchestration (27%)
This is the highest-weighted domain on the exam. It covers the design, implementation, and management of agentic systems built with the Claude Agent SDK.
1.1 The Agentic Loop Lifecycle
The agentic loop is the fundamental control structure for autonomous Claude agents. Understanding how it works is critical.
flowchart TD
START["Send request to Claude<br/>(system + user + history)"] --> RESPONSE["Receive response"]
RESPONSE --> CHECK{"Inspect<br/>stop_reason"}
CHECK -->|"tool_use"| EXEC["Execute requested tool(s)"]
EXEC --> APPEND["Append tool results<br/>to conversation history"]
APPEND --> START
CHECK -->|"end_turn"| DONE["Present final response<br/>to user"]
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
classDef emerald fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
class START,APPEND blue
class EXEC amber
class DONE emerald
class RESPONSE,CHECK blue
The Loop Works Like This:
- Send a request to Claude with the system prompt, user message, and conversation history
- Inspect the
stop_reasonin the response:
-"tool_use"- Claude wants to call a tool. Execute it, append the result to history, and loop back
-"end_turn"- Claude is done. Present the final response to the user - Append tool results to the conversation history so Claude can reason about new information
- Repeat until
stop_reasonis"end_turn"
Critical Anti-Patterns to Avoid:
| Anti-Pattern | Why It Fails |
|---|---|
| Parsing natural language signals for loop termination | Unreliable - the model may phrase things differently each time |
| Setting arbitrary iteration caps as the primary stop mechanism | Cuts off legitimate multi-step reasoning; use stop_reason instead |
| Checking for assistant text content as completion indicator | The model can include text alongside tool calls |
Exam Tip: When the exam asks about loop termination, the answer is almost always
stop_reason. The model-driven approach (checkingstop_reason) is preferred over any heuristic-based approach.
1.2 Multi-Agent Orchestration: Hub-and-Spoke
Multi-agent systems use a hub-and-spoke (coordinator-subagent) architecture. The coordinator is the brain; subagents are specialists.
flowchart TB
USER["User Query"] --> COORD["Coordinator Agent"]
COORD -->|"Task tool"| SA1["Search Subagent"]
COORD -->|"Task tool"| SA2["Analysis Subagent"]
COORD -->|"Task tool"| SA3["Synthesis Subagent"]
SA1 -->|"Results"| COORD
SA2 -->|"Results"| COORD
SA3 -->|"Results"| COORD
COORD --> OUTPUT["Final Report"]
classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
classDef emerald fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
class COORD purple
class SA1,SA2,SA3 blue
class USER,OUTPUT emerald
Key Principles:
- Isolated context: Subagents do NOT automatically inherit the coordinator's conversation history. Context must be explicitly passed in the prompt.
- Coordinator responsibilities: Task decomposition, delegation, result aggregation, deciding which subagents to invoke based on query complexity
- All communication routes through the coordinator: This provides observability, consistent error handling, and controlled information flow
- Dynamic selection: The coordinator should analyze query requirements and dynamically select which subagents to invoke, not always route through the full pipeline
Common Pitfall - Overly Narrow Decomposition:
If the coordinator decomposes "impact of AI on creative industries" into only "AI in digital art," "AI in graphic design," and "AI in photography," it misses music, writing, and film entirely. The problem is the coordinator's decomposition, not the subagents' execution.
1.3 Subagent Invocation and Context Passing
The Task Tool:
- The Task tool is the mechanism for spawning subagents
- The coordinator's allowedTools must include "Task" to invoke subagents
- Each subagent is defined with an AgentDefinition including description, system prompt, and tool restrictions
Context Passing Rules:
flowchart LR
subgraph Wrong["WRONG: Implicit Inheritance"]
C1["Coordinator"] -->|"Spawn"| S1["Subagent"]
S1 -.->|"No access to<br/>parent context"| X["Missing Data"]
end
subgraph Right["RIGHT: Explicit Passing"]
C2["Coordinator"] -->|"Prompt includes:<br/>findings, URLs,<br/>metadata"| S2["Subagent"]
S2 -->|"Has everything<br/>it needs"| Y["Complete Output"]
end
classDef red fill:#991b1b,stroke:#f87171,stroke-width:1px,color:#fff
classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
class X red
class Y green
- Always include complete findings from prior agents directly in the subagent's prompt
- Use structured data formats to separate content from metadata (source URLs, document names, page numbers)
- Parallel spawning: Emit multiple Task tool calls in a single coordinator response rather than across separate turns
- Design coordinator prompts that specify research goals and quality criteria rather than step-by-step procedural instructions
Fork-Based Session Management:
- fork_session creates independent branches from a shared analysis baseline
- Useful for exploring divergent approaches (comparing two testing strategies from a shared codebase analysis)
1.4 Multi-Step Workflows and Enforcement Patterns
This is one of the most frequently tested concepts. Know the difference between:
flowchart TB
subgraph PROG["Programmatic Enforcement"]
direction TB
P1["Hooks / prerequisite gates"]
P2["Deterministic: 100% compliance"]
P3["Use when errors have<br/>financial/safety consequences"]
end
subgraph PROMPT["Prompt-Based Guidance"]
direction TB
PR1["System prompt instructions"]
PR2["Probabilistic: <100% compliance"]
PR3["Use for soft preferences,<br/>output formatting"]
end
classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
class P1,P2,P3 green
class PR1,PR2,PR3 amber
When to use programmatic enforcement:
- Identity verification before financial operations (block process_refund until get_customer returns a verified ID)
- Any business rule where non-compliance has real-world consequences
- Tool ordering that must be guaranteed, not hoped for
Structured Handoff Protocols:
When escalating to a human agent, compile structured summaries including:
- Customer ID
- Root cause analysis
- Refund amount
- Recommended action
The human agent receiving the escalation lacks access to the conversation transcript, so everything must be in the summary.
1.5 Agent SDK Hooks
Hooks intercept and transform data at deterministic points in the agent lifecycle.
sequenceDiagram
participant Agent as Claude Agent
participant Hook as PreToolUse Hook
participant Tool as MCP Tool
participant Post as PostToolUse Hook
Agent->>Hook: Tool call request
Hook->>Hook: Check compliance<br/>(e.g., refund < $500?)
alt Policy violation
Hook-->>Agent: Block + redirect<br/>to escalation
else Allowed
Hook->>Tool: Execute tool
Tool->>Post: Raw result
Post->>Post: Normalize data<br/>(timestamps, formats)
Post->>Agent: Clean, normalized result
end
Two Key Hook Patterns:
-
PostToolUse Hooks - Transform tool results before the model processes them:
- Normalize heterogeneous data formats (Unix timestamps to ISO 8601, numeric status codes to labels)
- Trim verbose outputs to relevant fields -
Tool Call Interception Hooks - Enforce business rules before tool execution:
- Block refunds exceeding a dollar threshold
- Redirect policy-violating actions to human escalation
Exam Tip: When the question is about "guaranteed compliance" or "deterministic enforcement," the answer is always hooks, never prompt instructions.
1.6 Task Decomposition Strategies
| Strategy | When to Use | Example |
|---|---|---|
| Prompt Chaining (fixed sequential) | Predictable multi-aspect reviews | Analyze each file individually, then cross-file integration pass |
| Dynamic Decomposition (adaptive) | Open-ended investigation tasks | "Add comprehensive tests to a legacy codebase" - map structure first, then prioritize |
Prompt chaining breaks work into sequential steps. Good for code reviews: analyze each file for local issues, then run a cross-file integration pass.
Dynamic decomposition generates subtasks based on what is discovered at each step. Good for open-ended tasks: first map the codebase structure, identify high-impact areas, then create a prioritized plan that adapts as dependencies are discovered.
1.7 Session State, Resumption, and Forking
flowchart TB
subgraph Resume["Session Resumption"]
R1["--resume session-name"]
R2["Continues prior conversation"]
R3["Use when prior context<br/>is mostly valid"]
end
subgraph Fork["Session Forking"]
F1["fork_session"]
F2["Independent branches from<br/>shared baseline"]
F3["Use for comparing<br/>divergent approaches"]
end
subgraph Fresh["Fresh Start"]
FR1["New session + summary"]
FR2["Inject structured summary"]
FR3["Use when prior tool<br/>results are stale"]
end
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
classDef teal fill:#0e7490,stroke:#22d3ee,stroke-width:1px,color:#fff
class R1,R2,R3 blue
class F1,F2,F3 purple
class FR1,FR2,FR3 teal
Key Decisions:
- Resume (--resume session-name): When prior context is mostly valid, you just need to continue
- Fork (fork_session): When you want to explore two approaches from a shared analysis baseline
- Fresh start with summary: When prior tool results are stale (code has changed since last session)
- When resuming, inform the agent about specific file changes for targeted re-analysis rather than full re-exploration
Domain 1 Practice Questions
Question 1: Production data shows that in 12% of cases, your agent skips get_customer and calls lookup_order using only the customer's stated name. What change most effectively addresses this?
A) Add a programmatic prerequisite that blocks lookup_order and process_refund calls until get_customer has returned a verified customer ID.
B) Enhance the system prompt to state that customer verification is mandatory.
C) Add few-shot examples showing the agent always calling get_customer first.
D) Implement a routing classifier that enables only appropriate tools per request type.
Answer: A - When a specific tool sequence is required for critical business logic (verifying identity before processing refunds), programmatic enforcement provides deterministic guarantees that prompt-based approaches (B, C) cannot. Option D addresses tool availability, not ordering.
Question 2: Your multi-agent research system covers only visual arts when asked about "AI on creative industries." Coordinator logs show it decomposed the topic into only "AI in digital art," "AI in graphic design," and "AI in photography." What is the root cause?
A) The synthesis agent lacks instructions for identifying coverage gaps.
B) The coordinator agent's task decomposition is too narrow, resulting in subagent assignments that don't cover all relevant domains.
C) The web search agent's queries are not comprehensive enough.
D) The document analysis agent is filtering out non-visual creative industries.
Answer: B - The coordinator's logs directly reveal it decomposed "creative industries" into only visual arts subtasks. The subagents executed correctly within their assigned scope - the problem is what they were assigned.
Question 3: The synthesis agent frequently needs to verify claims, causing 2-3 round trips per task via the coordinator. 85% of verifications are simple fact-checks. What's the most effective approach?
A) Give the synthesis agent a scoped verify_fact tool for simple lookups, while complex verifications continue delegating through the coordinator.
B) Have the synthesis agent batch all verification needs to the end.
C) Give the synthesis agent access to all web search tools.
D) Have the web search agent proactively cache extra context.
Answer: A - This applies the principle of least privilege: the synthesis agent gets only what it needs for the 85% common case. Option B creates blocking dependencies. Option C over-provisions (violating separation of concerns). Option D relies on speculative caching.
Question 4: Your agentic loop uses a counter that terminates after 5 iterations regardless of task completion. Users report tasks being cut off mid-way. What should you change?
A) Increase the iteration cap to 15.
B) Replace the iteration cap with stop_reason-based termination that continues when stop_reason is "tool_use" and terminates when it's "end_turn".
C) Add a prompt instruction telling the model to finish within 5 iterations.
D) Use max_tokens to control when the model stops.
Answer: B - The agentic loop should be driven by stop_reason, not arbitrary iteration caps. The model signals completion via "end_turn" - trust that signal instead of guessing how many iterations are enough.
Question 5: Your coordinator spawns subagents sequentially (one at a time). Research on "quantum computing applications" takes 3 minutes as each subagent waits for the previous one to complete. How can you reduce latency?
A) Give each subagent a larger context window.
B) Use a single agent instead of multiple subagents.
C) Have the coordinator emit multiple Task tool calls in a single response to spawn subagents in parallel.
D) Cache results from previous research runs.
Answer: C - Parallel subagent execution is achieved by emitting multiple Task tool calls in a single coordinator response. This is a fundamental multi-agent optimization technique.
Question 6: A customer says "I need a refund for order #4521 AND I want to change my subscription plan." How should the agent handle this?
A) Process the refund first, then address the subscription change in a follow-up.
B) Decompose the request into distinct items, investigate each using shared context, then synthesize a unified resolution.
C) Ask the customer which issue they'd like to address first.
D) Escalate to a human agent because multi-concern requests are too complex.
Answer: B - Multi-concern requests should be decomposed, investigated in parallel using shared context, and synthesized into a unified response. This is the standard pattern for handling complex customer requests.
Domain 1 Worked Example: Building a Customer Support Agent
Let's walk through a complete customer support agent design that covers most Domain 1 concepts.
Scenario: You are building an agent to handle returns, billing disputes, and account issues. The agent has four MCP tools: get_customer, lookup_order, process_refund, and escalate_to_human. The target is 80%+ first-contact resolution.
Step 1 - Design the Agentic Loop:
sequenceDiagram
participant C as Customer
participant A as Agent
participant T as Tools
participant H as Human Agent
C->>A: "I need a refund for<br/>order #4521"
A->>T: get_customer(email)
T-->>A: {id: 123, verified: true}
A->>T: lookup_order(order_id: 4521)
T-->>A: {status: "delivered",<br/>amount: $127.50, ...}
A->>A: Check: amount < $500?
A->>T: process_refund(order: 4521,<br/>amount: 127.50)
T-->>A: {status: "processed"}
A->>C: "Refund of $127.50<br/>processed for order #4521"
Step 2 - Add Programmatic Prerequisites:
The agent must call get_customer before lookup_order or process_refund. Implement a hook that checks whether a verified customer ID exists before allowing downstream tools:
lookup_orderis blocked untilget_customerhas returned a verified customer IDprocess_refundis blocked until bothget_customerANDlookup_orderhave completed- This eliminates the 12% of cases where the agent skips verification
Step 3 - Add Compliance Hooks:
A tool call interception hook blocks refunds above $500 and redirects to escalate_to_human with a structured summary:
- Customer ID
- Order details
- Refund amount (above threshold)
- Recommended action: "Human review required for refund > $500"
Step 4 - Handle Multi-Concern Requests:
When a customer says "I need a refund for order #4521 AND I want to change my subscription," the agent:
1. Decomposes the request into two distinct items
2. Investigates each using shared customer context (verified once, used for both)
3. Synthesizes a unified resolution addressing both issues
Step 5 - Define Escalation Criteria:
Add explicit criteria to the system prompt with few-shot examples:
- Escalate immediately: Customer explicitly says "let me talk to a human"
- Escalate: Policy gaps (competitor price matching when policy only covers own-site)
- Resolve autonomously: Standard returns with photo evidence, simple billing questions
- Acknowledge + offer resolution: Frustrated customer with straightforward issue (escalate only if they reiterate preference for human)
This single worked example touches Task Statements 1.1 (agentic loop), 1.4 (enforcement), 1.5 (hooks), and connects to Domain 2 (tool design) and Domain 5 (escalation).
Domain 1 Key Patterns Summary
flowchart TB
subgraph Patterns["Domain 1 Pattern Catalog"]
direction TB
P1["Agentic Loop<br/>stop_reason-based control"]
P2["Hub-and-Spoke<br/>Coordinator + subagents"]
P3["Parallel Spawning<br/>Multiple Task calls in one turn"]
P4["Programmatic Enforcement<br/>Hooks > prompts for compliance"]
P5["Iterative Refinement<br/>Coordinator re-delegates until sufficient"]
P6["Session Forking<br/>Divergent exploration from shared baseline"]
P7["Structured Handoffs<br/>Complete context for human agents"]
end
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
class P1,P2,P3,P4,P5,P6,P7 blue
Domain 2: Tool Design & MCP Integration (18%)
This domain covers how you design tool interfaces, handle errors, distribute tools across agents, and integrate MCP servers.
2.1 Effective Tool Interface Design
Tool descriptions are the primary mechanism LLMs use for tool selection. This is perhaps the most practically important concept in this domain.
flowchart TB
subgraph Bad["BAD: Minimal Descriptions"]
B1["get_customer:<br/>'Retrieves customer info'"]
B2["lookup_order:<br/>'Retrieves order details'"]
B3["Result: Model confuses<br/>the two tools"]
end
subgraph Good["GOOD: Detailed Descriptions"]
G1["get_customer:<br/>'Look up a customer by email or<br/>phone. Returns name, account status,<br/>subscription tier. Use for identity<br/>verification before operations.'"]
G2["lookup_order:<br/>'Retrieve order by order number<br/>(format: #NNNNN). Returns items,<br/>status, shipping. Use when user<br/>asks about a specific order.'"]
G3["Result: Reliable<br/>tool selection"]
end
classDef red fill:#991b1b,stroke:#f87171,stroke-width:1px,color:#fff
classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
class B1,B2,B3 red
class G1,G2,G3 green
Best Practices for Tool Descriptions:
- Include input formats: What types of identifiers does the tool accept?
- Include example queries: "Use when the user asks about order status"
- Include edge cases: "Returns null if the customer has no active orders"
- Include boundaries: "Do NOT use this for subscription queries; use get_subscription instead"
When Tools Have Overlapping Functions:
- Rename to eliminate ambiguity (e.g., analyze_content becomes extract_web_results)
- Split generic tools into purpose-specific ones (e.g., analyze_document becomes extract_data_points, summarize_content, and verify_claim_against_source)
Exam Tip: Watch for questions where tool descriptions are "minimal" or "near-identical." The first fix is always to expand descriptions, not to add routing classifiers or consolidate tools.
2.2 Structured Error Responses
MCP tools must return structured error metadata, not generic failure messages.
flowchart TB
ERROR["Tool Error Occurs"] --> CLASSIFY{"Error Category?"}
CLASSIFY -->|"Transient"| TRANS["Timeout, service unavailable<br/>isRetryable: true"]
CLASSIFY -->|"Validation"| VAL["Invalid input format<br/>isRetryable: false"]
CLASSIFY -->|"Business"| BUS["Policy violation<br/>isRetryable: false<br/>+ customer-friendly message"]
CLASSIFY -->|"Permission"| PERM["Insufficient access<br/>isRetryable: false"]
TRANS --> AGENT["Agent decides:<br/>retry with backoff"]
VAL --> AGENT2["Agent decides:<br/>fix input and retry"]
BUS --> AGENT3["Agent decides:<br/>explain to user"]
PERM --> AGENT4["Agent decides:<br/>escalate or inform"]
classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
class TRANS,VAL,BUS,PERM amber
class AGENT,AGENT2,AGENT3,AGENT4 blue
The Structured Error Response Pattern:
Every error should include:
- isError: true (MCP flag)
- errorCategory: transient, validation, business, permission
- isRetryable: boolean
- description: Human-readable explanation
Critical Distinction - Access Failures vs Empty Results:
- Access failure: The database was unreachable (error - needs retry decision)
- Valid empty result: The query ran successfully but found no matches (success - not an error)
Returning empty results as success when the tool actually failed prevents any recovery and risks incomplete outputs.
2.3 Tool Distribution Across Agents
The Key Principle: Giving an agent access to too many tools (e.g., 18 instead of 4-5) degrades tool selection reliability. Each agent should only have tools relevant to its role.
flowchart TB
subgraph Search["Search Agent Tools"]
T1["web_search"]
T2["fetch_url"]
T3["search_academic"]
end
subgraph Analysis["Analysis Agent Tools"]
T4["extract_data"]
T5["summarize_content"]
T6["compare_sources"]
end
subgraph Synthesis["Synthesis Agent Tools"]
T7["compile_report"]
T8["verify_fact"]
T9["format_citations"]
end
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
classDef teal fill:#0e7490,stroke:#22d3ee,stroke-width:1px,color:#fff
class T1,T2,T3 blue
class T4,T5,T6 purple
class T7,T8,T9 teal
tool_choice Configuration:
| Setting | Behavior | Use Case |
|---|---|---|
"auto" |
Model may return text instead of calling a tool | Default behavior - model decides |
"any" |
Model must call a tool but can choose which | Guarantee structured output when multiple schemas exist |
{"type": "tool", "name": "..."} |
Model must call a specific named tool | Force a specific extraction to run first |
Exam Tip:
tool_choice: "any"guarantees the model calls a tool. Forced selection guarantees it calls a specific tool. Know when to use each.
2.4 MCP Server Integration
flowchart LR
subgraph Project["Project Scope: .mcp.json"]
P1["Shared team tooling"]
P2["Version-controlled"]
P3["Uses env var expansion:<br/>${GITHUB_TOKEN}"]
end
subgraph User["User Scope: ~/.claude.json"]
U1["Personal/experimental"]
U2["NOT shared via VCS"]
U3["Individual credentials"]
end
subgraph Both["Available Simultaneously"]
B1["All tools discovered<br/>at connection time"]
end
Project --> Both
User --> Both
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
classDef emerald fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
class P1,P2,P3 blue
class U1,U2,U3 purple
class B1 emerald
MCP Resources vs MCP Tools:
- Resources: Expose content catalogs (issue summaries, documentation hierarchies, database schemas) - reduce exploratory tool calls
- Tools: Expose actions (search, create, update, delete)
Best Practices:
- Use community MCP servers for standard integrations (Jira, GitHub) rather than building custom ones
- Reserve custom servers for team-specific workflows
- Enhance tool descriptions to prevent the agent from preferring built-in tools (like Grep) over more capable MCP tools
- Configure shared servers in .mcp.json with ${ENV_VAR} expansion for auth tokens
2.5 Built-in Tools
| Tool | Purpose | When to Use |
|---|---|---|
| Grep | Content search | Searching file contents for patterns (function names, error messages, imports) |
| Glob | File path pattern matching | Finding files by name or extension (e.g., **/*.test.tsx) |
| Read | Load full file contents | When you need to examine a complete file |
| Write | Create or overwrite files | Creating new files or complete rewrites |
| Edit | Targeted modifications | Changing specific sections using unique text matching |
| Bash | Shell commands | Running builds, tests, git operations |
When Edit Fails: If Edit cannot find unique anchor text, fall back to Read + Write.
Incremental Codebase Understanding Pattern:
1. Start with Grep to find entry points
2. Use Read to follow imports and trace flows
3. Build understanding incrementally, rather than reading all files upfront
Domain 2 Practice Questions
Question 1: Your agent frequently calls get_customer when users ask about orders. Both tools have minimal descriptions and accept similar identifier formats. What's the most effective first step?
A) Add few-shot examples demonstrating correct tool selection.
B) Expand each tool's description to include input formats, example queries, edge cases, and boundaries explaining when to use it versus similar tools.
C) Implement a routing layer that pre-selects tools based on keywords.
D) Consolidate both tools into a single lookup_entity tool.
Answer: B - Tool descriptions are the primary mechanism LLMs use for tool selection. Expanding them is the lowest-effort, highest-leverage fix. Few-shot examples (A) add token overhead without fixing the root cause. A routing layer (C) is over-engineered. Consolidating (D) requires more effort than expanding descriptions.
Question 2: The web search subagent times out. How should failure information flow back to the coordinator?
A) Return structured error context including failure type, attempted query, partial results, and alternative approaches.
B) Implement automatic retry with exponential backoff, returning "search unavailable" after retries exhaust.
C) Catch the timeout and return an empty result set marked as successful.
D) Propagate the timeout exception to a top-level handler that terminates the workflow.
Answer: A - Structured error context gives the coordinator the information needed for intelligent recovery. Option B hides context behind a generic status. Option C suppresses the error (anti-pattern). Option D terminates unnecessarily.
Question 3: Your synthesis agent has access to 18 tools including web search, database queries, and report formatting. It frequently misuses the web search tools when it should be synthesizing. What should you do?
A) Restrict the synthesis agent's tool set to only synthesis-relevant tools (compile_report, verify_fact, format_citations), routing complex lookups through the coordinator.
B) Add few-shot examples showing the synthesis agent using only synthesis tools.
C) Increase the synthesis agent's context window to handle more tools.
D) Add a pre-processing step that removes irrelevant tools from the response.
Answer: A - The principle is scoped tool access. Giving agents tools outside their specialization leads to misuse. Restrict each agent to its relevant tools (4-5 is ideal).
Question 4: A new team member's Claude Code instance doesn't have access to the project's Jira MCP server, even though the team lead configured it. Where should the configuration be?
A) In ~/.claude.json on each developer's machine.
B) In .mcp.json at the project root, committed to version control.
C) In CLAUDE.md under a tools section.
D) In a .claude/config.json file.
Answer: B - Project-level MCP servers go in .mcp.json (version-controlled, shared). ~/.claude.json (A) is user-level and not shared. CLAUDE.md (C) is for instructions, not server configuration.
Question 5: You want to force the model to call extract_metadata before any enrichment tools. How do you configure tool_choice?
A) tool_choice: "auto" with a system prompt instruction.
B) tool_choice: "any" to guarantee a tool call.
C) tool_choice: {"type": "tool", "name": "extract_metadata"} to force the specific tool, then process subsequent steps in follow-up turns.
D) List extract_metadata first in the tools array.
Answer: C - Forced tool selection ensures a specific tool is called. "auto" (A) lets the model choose freely. "any" (B) guarantees a tool call but not which one. Tool ordering in the array (D) doesn't guarantee selection order.
Domain 2 Worked Example: Designing a Research Agent's Tool Suite
Let's design the tool distribution for a multi-agent research system to illustrate all Domain 2 concepts.
The System: A coordinator delegates to three subagents: Search, Analysis, and Synthesis. Each needs carefully scoped tools.
Step 1 - Tool Design with Differentiated Descriptions:
Instead of a generic analyze_content tool shared by all agents, split into three purpose-specific tools:
| Tool | Agent | Description |
|---|---|---|
web_search |
Search | "Search the web for articles, papers, and news. Input: query string (1-5 keywords). Output: list of URLs with titles and snippets. Use for broad discovery of sources. Does NOT fetch page content - use fetch_url for that." |
fetch_url |
Search | "Fetch and extract text content from a specific URL. Input: URL string. Returns cleaned text with metadata (title, author, date, word count). Use after web_search to get full content." |
extract_data_points |
Analysis | "Extract specific data points (statistics, dates, names, claims) from a provided text block. Input: text + list of fields to extract. Returns structured JSON with extracted values and confidence. Use for systematic fact extraction from documents." |
summarize_section |
Analysis | "Summarize a section of text into 2-3 key points. Input: text (up to 5000 tokens). Returns bullet-point summary preserving source attribution. Use for condensing verbose source material." |
verify_fact |
Synthesis | "Quick lookup to verify a specific factual claim (date, name, statistic). Input: claim string. Returns: verified/unverified/conflicting with source. Use for 85% of simple verifications without round-tripping through coordinator." |
compile_report |
Synthesis | "Compile findings into a structured report with sections, citations, and methodology notes. Input: structured findings list with sources. Returns formatted report markdown." |
Notice how each tool's description includes: input format, output format, when to use it, and boundaries.
Step 2 - Error Response Design:
flowchart TB
subgraph WebSearch["web_search Error Handling"]
WS1["Timeout → isRetryable: true<br/>errorCategory: transient<br/>'Search service timed out after 10s'"]
WS2["Invalid query → isRetryable: false<br/>errorCategory: validation<br/>'Query must be 1-5 keywords'"]
WS3["No results → NOT an error<br/>Valid empty result set<br/>isError: false, results: []"]
end
classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
class WS1,WS2 amber
class WS3 green
The critical distinction: a search that returns zero results is a successful query with no matches (not an error). A search that fails because the service is down is an access failure (an error requiring recovery).
Step 3 - MCP Server Configuration:
The team's shared Jira integration goes in .mcp.json with credential expansion:
{
"jira": {
"command": "jira-mcp-server",
"env": {
"JIRA_URL": "${JIRA_URL}",
"JIRA_TOKEN": "${JIRA_TOKEN}"
}
}
}
A developer's experimental paper-search MCP server goes in ~/.claude.json (personal, not shared).
Step 4 - Tool Choice Configuration:
For the extraction phase: force extract_data_points first using tool_choice: {"type": "tool", "name": "extract_data_points"}. After extraction completes, switch to tool_choice: "auto" for enrichment steps.
For the synthesis phase: use tool_choice: "any" to guarantee structured output when multiple report formats exist (the model picks the right format tool but must call one).
Domain 2 Additional Practice Questions
Question 6: Your search agent returns "search unavailable" after a timeout, but the coordinator has no information about what was searched or whether partial results exist. How should you improve the error response?
A) Add automatic retry with exponential backoff.
B) Return structured error context including: errorCategory: "transient", the attempted query, any partial results from before the timeout, and suggested alternative approaches (narrower query terms).
C) Have the search agent cache results so timeouts don't lose data.
D) Increase the timeout threshold.
Answer: B - Structured error context enables intelligent coordinator recovery. "Search unavailable" hides all the information the coordinator needs to decide whether to retry, modify the query, or proceed with partial results.
Question 7: You want to find all callers of a specific function across your codebase. Which built-in tool should you use?
A) Glob - to find files matching a naming pattern.
B) Grep - to search file contents for the function name pattern.
C) Read - to load and examine each file.
D) Bash - to run a shell search command.
Answer: B - Grep is for content search (searching file contents for patterns like function names, error messages, or import statements). Glob is for file path pattern matching. Reading all files (C) is inefficient. Bash search commands (D) should use the built-in Grep tool.
Question 8: Your agent has a community MCP server for Jira AND a custom Jira-like server your team built. The agent sometimes calls the wrong one. What's the best approach?
A) Use the community MCP server for standard Jira operations, removing the custom server. Reserve custom servers for team-specific workflows that the community server doesn't support.
B) Rename both servers to have completely different names.
C) Add a routing layer that intercepts Jira-related calls.
D) Give each agent access to only one of the two servers.
Answer: A - Prefer existing community MCP servers for standard integrations, reserving custom servers for team-specific workflows. Having two overlapping servers creates ambiguity. Remove the redundant custom server if the community one covers your needs.
Domain 3: Claude Code Configuration & Workflows (20%)
This domain covers the practical configuration and customization of Claude Code for individual developers and teams.
3.1 CLAUDE.md Configuration Hierarchy
flowchart TB
subgraph User["User Level: ~/.claude/CLAUDE.md"]
U1["Personal preferences"]
U2["NOT shared via VCS"]
U3["Applies only to this user"]
end
subgraph Project["Project Level: .claude/CLAUDE.md or root CLAUDE.md"]
P1["Universal coding standards"]
P2["Shared via version control"]
P3["Applies to ALL team members"]
end
subgraph Directory["Directory Level: subdirectory CLAUDE.md"]
D1["Package-specific rules"]
D2["Applies to that directory"]
D3["Overrides project-level for scope"]
end
User --> Project --> Directory
classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
classDef teal fill:#0e7490,stroke:#22d3ee,stroke-width:1px,color:#fff
class U1,U2,U3 purple
class P1,P2,P3 blue
class D1,D2,D3 teal
Key Configuration Options:
| Level | Location | Shared via VCS? | Use Case |
|---|---|---|---|
| User | ~/.claude/CLAUDE.md |
No | Personal preferences, output format |
| Project | .claude/CLAUDE.md or root CLAUDE.md |
Yes | Team coding standards, testing conventions |
| Directory | Subdirectory CLAUDE.md |
Yes | Package-specific rules |
Modular Organization:
- @import syntax for referencing external files to keep CLAUDE.md lean
- .claude/rules/ directory for topic-specific rule files (alternative to monolithic CLAUDE.md)
- /memory command to verify which memory files are loaded
Common Exam Pitfall: A new team member not receiving instructions because they're in
~/.claude/CLAUDE.md(user-level) instead of.claude/CLAUDE.md(project-level). User-level is personal and not version-controlled.
3.2 Custom Slash Commands and Skills
flowchart LR
subgraph Commands["Slash Commands"]
C1[".claude/commands/<br/>Project-scoped, shared"]
C2["~/.claude/commands/<br/>User-scoped, personal"]
end
subgraph Skills["Skills"]
S1[".claude/skills/<br/>with SKILL.md frontmatter"]
S2["context: fork<br/>Isolated sub-agent context"]
S3["allowed-tools<br/>Restrict tool access"]
S4["argument-hint<br/>Prompt for parameters"]
end
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
class C1,C2 blue
class S1,S2,S3,S4 purple
Commands vs Skills:
- Commands (.claude/commands/): Simple, always available, for team-wide workflows
- Skills (.claude/skills/): Richer, with frontmatter config, for task-specific workflows
Skill Frontmatter Options:
- context: fork - Run in isolated sub-agent context (prevents verbose output from polluting main conversation)
- allowed-tools - Restrict which tools the skill can use (e.g., limit to file reads to prevent destructive actions)
- argument-hint - Prompt developers for required parameters when they invoke without arguments
When to Use What:
- Skills for on-demand, task-specific workflows (codebase analysis, brainstorming)
- CLAUDE.md for always-loaded universal standards
3.3 Path-Specific Rules
Path-specific rules let you apply conventions conditionally based on which files are being edited.
How It Works:
Create files in .claude/rules/ with YAML frontmatter:
---
paths: ["**/*.test.tsx"]
---
All test files must use React Testing Library.
Never use enzyme or shallow rendering.
Use data-testid attributes for component queries.
Why This Matters:
- Rules load only when editing matching files, reducing irrelevant context and token usage
- Glob patterns work across directories - perfect for test files spread throughout a codebase
- Superior to directory-level CLAUDE.md when conventions span multiple directories
| Approach | Best For |
|---|---|
.claude/rules/ with globs |
Conventions by file type (test files, API files, Terraform) regardless of location |
| Directory CLAUDE.md | Conventions specific to one package or directory |
| Root CLAUDE.md | Universal standards that always apply |
3.4 Plan Mode vs Direct Execution
flowchart TB
TASK["Assess Task"] --> COMPLEX{"Complex?<br/>Multiple valid approaches?<br/>Multi-file changes?<br/>Architectural decisions?"}
COMPLEX -->|"Yes"| PLAN["PLAN MODE<br/>Explore, design, then execute"]
COMPLEX -->|"No"| DIRECT["DIRECT EXECUTION<br/>Single-file, clear scope"]
PLAN --> EXPLORE["Use Explore subagent<br/>for verbose discovery"]
PLAN --> IMPLEMENT["Switch to direct execution<br/>for implementation"]
classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
classDef emerald fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
class PLAN purple
class DIRECT emerald
class EXPLORE,IMPLEMENT blue
| Use Plan Mode | Use Direct Execution |
|---|---|
| Microservice restructuring | Single-file bug fix with clear stack trace |
| Library migration affecting 45+ files | Adding a date validation conditional |
| Choosing between integration approaches | Simple refactoring with known scope |
| Multi-file architectural changes | Well-understood, bounded changes |
The Explore Subagent:
Use the Explore subagent to isolate verbose discovery output from the main conversation context. It returns summaries, preserving your context window for actual implementation.
3.5 Iterative Refinement Techniques
Four key techniques for getting better results from Claude Code:
1. Concrete Input/Output Examples
When prose descriptions produce inconsistent results, provide 2-3 concrete examples showing the exact transformation you want.
2. Test-Driven Iteration
Write test suites first covering expected behavior, edge cases, and performance requirements. Then iterate by sharing test failures.
3. The Interview Pattern
Have Claude ask questions to surface considerations you may not have anticipated before implementing. Great for unfamiliar domains.
4. Sequential vs Parallel Issue Resolution
- Single message when fixes interact with each other (interacting problems)
- Sequential iteration when problems are independent
3.6 CI/CD Integration
flowchart LR
subgraph CI["CI Pipeline"]
PR["PR Created"] --> RUN["claude -p 'Review this PR'"]
RUN --> JSON["--output-format json<br/>--json-schema schema.json"]
JSON --> POST["Post findings as<br/>inline PR comments"]
end
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
class PR,RUN,JSON,POST blue
Critical CLI Flags for CI:
| Flag | Purpose |
|---|---|
-p / --print |
Non-interactive mode - processes prompt, outputs result, exits. Required for CI. |
--output-format json |
Machine-parseable structured output |
--json-schema |
Enforce a specific output schema |
Key Concepts:
- Session context isolation: A Claude session that generated code is less effective at reviewing its own changes. Use an independent review instance.
- Include prior review findings when re-running reviews after new commits to avoid duplicate comments
- CLAUDE.md provides project context (testing standards, fixture conventions) to CI-invoked Claude Code
- Provide existing test files in context so test generation avoids duplicating covered scenarios
Domain 3 Practice Questions
Question 1: You want to create a /review slash command available to every developer who clones the repo. Where should you create it?
A) In the .claude/commands/ directory in the project repository.
B) In ~/.claude/commands/ in each developer's home directory.
C) In the CLAUDE.md file at the project root.
D) In a .claude/config.json file with a commands array.
Answer: A - Project-scoped commands in .claude/commands/ are version-controlled and automatically available to all developers. User-level ~/.claude/commands/ (B) is personal. CLAUDE.md (C) is for instructions. config.json (D) doesn't support command definitions.
Question 2: You've been assigned to restructure a monolithic app into microservices, involving changes across dozens of files. Which approach?
A) Enter plan mode to explore the codebase, understand dependencies, and design an implementation approach before making changes.
B) Start with direct execution, letting implementation reveal natural service boundaries.
C) Use direct execution with comprehensive upfront instructions.
D) Begin in direct execution and switch to plan mode only if you encounter unexpected complexity.
Answer: A - Plan mode is designed for complex tasks with large-scale changes, multiple valid approaches, and architectural decisions. Direct execution (B) risks costly rework. Upfront instructions (C) assume knowledge you don't have. Waiting for complexity (D) ignores that complexity is already stated in the requirements.
Question 3: Test files are spread throughout the codebase (e.g., Button.test.tsx next to Button.tsx). You want all tests to follow the same conventions. What's the most maintainable approach?
A) Create rule files in .claude/rules/ with YAML frontmatter specifying glob patterns (e.g., paths: ["/.test.tsx"]).*
B) Consolidate all conventions in root CLAUDE.md under headers.
C) Create skills in .claude/skills/ for each code type.
D) Place a separate CLAUDE.md in each subdirectory.
Answer: A - Glob patterns in .claude/rules/ apply conventions by file path regardless of directory location. Essential for test files spread throughout the codebase. CLAUDE.md headers (B) rely on inference. Skills (C) require manual invocation. Subdirectory CLAUDE.md (D) can't handle files in many directories.
Question 4: Your CI pipeline script runs `claude "Analyze this PR"` but the job hangs indefinitely. What's the fix?
A) Add the -p flag: claude -p "Analyze this pull request for security issues"
B) Set the environment variable CLAUDE_HEADLESS=true.
C) Redirect stdin from /dev/null.
D) Add the --batch flag.
Answer: A - The -p (or --print) flag is the documented way to run Claude Code non-interactively. It processes the prompt, outputs the result, and exits. Options B, C, and D reference non-existent features or Unix workarounds that don't properly address Claude Code's syntax.
Question 5: A team lead's personal CLAUDE.md contains critical testing conventions, but new team members aren't getting them. What's the configuration issue?
A) The new team members need to run /memory to load the conventions.
B) The conventions are in the team lead's user-level ~/.claude/CLAUDE.md, which is not shared via version control. They should be moved to .claude/CLAUDE.md at the project root.
C) The new team members need to clone the repo again to pick up CLAUDE.md changes.
D) The CLAUDE.md file needs to be imported using @import syntax.
Answer: B - ~/.claude/CLAUDE.md is user-level and not version-controlled. Moving to .claude/CLAUDE.md (project-level) makes it available to everyone via VCS.
Question 6: You want a codebase analysis skill that produces very verbose output. How do you prevent it from polluting the main conversation?
A) Add allowed-tools: [Read, Grep, Glob] to the SKILL.md frontmatter.
B) Add context: fork to the SKILL.md frontmatter so it runs in an isolated sub-agent context.
C) Create the skill as a user-level personal variant.
D) Add a max-tokens limit in the skill configuration.
Answer: B - context: fork runs the skill in isolated sub-agent context, preventing verbose output from polluting the main conversation. allowed-tools (A) restricts which tools are available, not context isolation.
Domain 3 Worked Example: Setting Up a Monorepo with Claude Code
Let's configure Claude Code for a monorepo with frontend (React), backend (Python API), infrastructure (Terraform), and tests spread throughout.
Step 1 - Root CLAUDE.md (Universal Standards):
The root CLAUDE.md contains team-wide conventions that always apply:
- Code review checklist
- Commit message format
- Branch naming conventions
- Security requirements (no secrets in code, no raw SQL)
Step 2 - Path-Specific Rules in .claude/rules/:
Create four rule files with glob patterns:
flowchart TB
subgraph Rules[".claude/rules/ Directory"]
R1["testing.md<br/>paths: ['**/*.test.*', '**/*.spec.*']<br/>React Testing Library, no enzyme,<br/>data-testid attributes"]
R2["api-conventions.md<br/>paths: ['src/api/**/*', 'src/services/**/*']<br/>async/await, specific error handling,<br/>input validation patterns"]
R3["terraform.md<br/>paths: ['terraform/**/*', '**/*.tf']<br/>module structure, naming conventions,<br/>state management rules"]
R4["react-components.md<br/>paths: ['src/components/**/*.tsx']<br/>Functional style with hooks,<br/>prop types, accessibility"]
end
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
class R1,R2,R3,R4 blue
Why glob patterns instead of directory CLAUDE.md files? Because test files (*.test.tsx) are next to the components they test, spread across dozens of directories. A glob pattern **/*.test.* catches them all regardless of location.
Step 3 - Custom Skills:
Create a codebase analysis skill with context: fork so its verbose output stays isolated:
In .claude/skills/analyze-codebase/SKILL.md:
---
context: fork
allowed-tools: [Read, Grep, Glob]
argument-hint: "Which area? (api, frontend, terraform, all)"
---
Analyze the specified area of the codebase. Report: architecture overview, dependency map, test coverage gaps, and potential issues. Output a structured summary.
The context: fork prevents 2000 lines of discovery from polluting the main conversation. allowed-tools restricts to read-only operations. argument-hint prompts the developer for which area to analyze.
Step 4 - MCP Server Integration:
.mcp.json (project-scoped, committed to VCS):
{
"github": {
"command": "github-mcp-server",
"env": { "GITHUB_TOKEN": "${GITHUB_TOKEN}" }
}
}
~/.claude.json (personal, NOT committed):
{
"paper-search": {
"command": "arxiv-mcp-server"
}
}
Step 5 - Plan Mode Decision:
| Task | Mode | Why |
|---|---|---|
| Fix null pointer in user service | Direct execution | Single file, clear stack trace |
| Migrate from Express to Fastify | Plan mode | 45+ files, multiple valid approaches |
| Add date validation to signup form | Direct execution | One function, obvious implementation |
| Restructure API into microservices | Plan mode | Architectural decisions, dependency analysis |
Domain 3 Additional Practice Questions
Question 7: You use the interview pattern before implementing a caching layer. Claude asks about cache invalidation strategies, failure modes, and consistency requirements - surfacing considerations you hadn't thought about. Which iterative refinement technique is this?
A) Test-driven iteration
B) Concrete input/output examples
C) The interview pattern - having Claude ask questions to surface considerations you may not have anticipated before implementing
D) Sequential issue resolution
Answer: C - The interview pattern is specifically designed for unfamiliar domains where the developer may not know all the considerations. Claude asks probing questions to ensure the design accounts for edge cases before implementation begins.
Question 8: Your CI pipeline re-runs code review after new commits, but it keeps re-reporting issues from the previous review that were already addressed. How do you fix this?
A) Clear the review context between runs.
B) Include prior review findings in context when re-running reviews after new commits, instructing Claude to report only new or still-unaddressed issues.
C) Use a different model for each review iteration.
D) Compare the new review output against the previous one programmatically.
Answer: B - Including prior findings in context with instructions to focus on new/unaddressed issues prevents duplicate comments. This is the standard pattern for iterative CI review.
Question 9: Natural language descriptions of a code transformation keep producing inconsistent results. Sometimes Claude uppercases variables, sometimes it camelCases them. What's the most effective technique?
A) Provide 2-3 concrete input/output examples showing the exact transformation you want.
B) Write more detailed prose instructions specifying the exact naming convention.
C) Add a post-processing step to enforce naming conventions.
D) Use a linter to catch inconsistencies.
Answer: A - When prose descriptions produce inconsistent results, concrete input/output examples are the most effective technique. They show the exact transformation unambiguously, eliminating interpretation differences.
Domain 4: Prompt Engineering & Structured Output (20%)
This domain tests your ability to design precise prompts, use few-shot examples, enforce structured output via tool_use, and design batch processing strategies.
4.1 Explicit Criteria Over Vague Instructions
flowchart LR
subgraph Vague["VAGUE (Ineffective)"]
V1["'Be conservative'"]
V2["'Only report high-confidence findings'"]
V3["'Check that comments are accurate'"]
end
subgraph Specific["SPECIFIC (Effective)"]
S1["'Flag comments only when<br/>claimed behavior contradicts<br/>actual code behavior'"]
S2["'Report: bugs, security issues<br/>Skip: minor style, local patterns'"]
S3["'Severity: critical = data loss,<br/>high = wrong output,<br/>medium = perf regression'"]
end
Vague -->|"Replace with"| Specific
classDef red fill:#991b1b,stroke:#f87171,stroke-width:1px,color:#fff
classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
class V1,V2,V3 red
class S1,S2,S3 green
Key Principles:
- Define what to report and what to skip using categorical criteria, not confidence-based filtering
- High false positive rates destroy developer trust across all categories, not just the noisy ones
- Temporarily disable high false-positive categories to restore trust while improving those prompts
- Define severity levels with concrete code examples for each level
4.2 Few-Shot Prompting
Few-shot examples are the most effective technique for achieving consistently formatted, actionable output.
When to use few-shot examples:
- When detailed instructions alone produce inconsistent results
- For ambiguous-case handling (tool selection, edge cases)
- For demonstrating desired output format (location, issue, severity, suggested fix)
- For reducing hallucination in extraction tasks
- For distinguishing acceptable patterns from genuine issues (reducing false positives)
Best Practices:
- Use 2-4 targeted examples focusing on ambiguous scenarios
- Show reasoning for why one action was chosen over plausible alternatives
- Include examples demonstrating correct handling of varied document structures
- Examples enable the model to generalize to novel patterns, not just match pre-specified cases
4.3 Structured Output via Tool Use
flowchart TB
subgraph Define["Define Extraction Tool"]
D1["Name: extract_invoice"]
D2["Input schema: JSON Schema"]
D3["Fields: required + optional"]
end
subgraph Call["API Call"]
C1["tool_choice controls behavior"]
C2["Model calls the tool"]
C3["Response in tool_use block"]
end
subgraph Validate["Validation Layer"]
V1["Schema: syntax guaranteed"]
V2["Semantic: NOT guaranteed"]
V3["line_items may not sum<br/>to total (semantic error)"]
end
Define --> Call --> Validate
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
class D1,D2,D3,C1,C2,C3 blue
class V1,V2,V3 amber
Critical Distinction:
- Strict JSON schemas via tool_use eliminate SYNTAX errors (missing braces, trailing commas)
- They do NOT prevent SEMANTIC errors (wrong values, items that don't sum correctly)
Schema Design Best Practices:
| Pattern | Purpose |
|---|---|
| Make fields optional/nullable when source may lack the data | Prevents model from fabricating values to satisfy required fields |
Add "unclear" enum value |
For ambiguous cases where the answer isn't clear |
Add "other" + detail string field |
For extensible categorization beyond predefined enums |
| Include format normalization rules in prompts | Handle inconsistent source formatting |
4.4 Validation, Retry, and Feedback Loops
flowchart TB
EXTRACT["Extract data from document"] --> VALIDATE{"Validation<br/>passes?"}
VALIDATE -->|"Yes"| SUCCESS["Accept extraction"]
VALIDATE -->|"No"| ANALYZE{"Is information<br/>in the document?"}
ANALYZE -->|"Yes: format/structural error"| RETRY["Retry with:<br/>1. Original document<br/>2. Failed extraction<br/>3. Specific validation errors"]
ANALYZE -->|"No: info absent from source"| FAIL["Mark as unavailable<br/>(retries won't help)"]
RETRY --> VALIDATE
classDef emerald fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
classDef red fill:#991b1b,stroke:#f87171,stroke-width:1px,color:#fff
class SUCCESS emerald
class RETRY amber
class FAIL red
When Retries Work: Format mismatches, structural output errors, field placement errors
When Retries DON'T Work: Information exists only in an external document not provided
Self-Correction Validation Patterns:
- Extract calculated_total alongside stated_total to flag discrepancies
- Add conflict_detected booleans for inconsistent source data
- Include detected_pattern fields to enable analysis of false positive patterns when developers dismiss findings
4.5 Batch Processing Strategies
flowchart LR
subgraph Batch["Message Batches API"]
B1["50% cost savings"]
B2["Up to 24-hour processing"]
B3["No latency SLA"]
B4["No multi-turn tool calling"]
B5["custom_id for correlation"]
end
subgraph Good["GOOD Use Cases"]
G1["Overnight technical debt reports"]
G2["Weekly audit analysis"]
G3["Nightly test generation"]
end
subgraph Bad["BAD Use Cases"]
BD1["Blocking pre-merge checks"]
BD2["Real-time code review"]
BD3["Interactive feedback"]
end
Batch --> Good
Batch -.->|"NOT suitable"| Bad
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
classDef red fill:#991b1b,stroke:#f87171,stroke-width:1px,color:#fff
class B1,B2,B3,B4,B5 blue
class G1,G2,G3 green
class BD1,BD2,BD3 red
Key Facts for the Exam:
- 50% cost savings but processing up to 24 hours with no guaranteed latency SLA
- Cannot do multi-turn tool calling within a single batch request
- Use custom_id to correlate batch request/response pairs
- Failure handling: Resubmit only failed documents (identified by custom_id), possibly with modifications (chunking oversized documents)
- Prompt refinement tip: Test on a sample set before batch-processing large volumes
4.6 Multi-Instance and Multi-Pass Review
flowchart TB
subgraph Self["Self-Review (Weak)"]
SR1["Same session retains<br/>reasoning context"]
SR2["Less likely to question<br/>its own decisions"]
SR3["Even extended thinking<br/>doesn't fully help"]
end
subgraph Independent["Independent Review (Strong)"]
IR1["Separate Claude instance"]
IR2["No prior reasoning context"]
IR3["More effective at<br/>catching subtle issues"]
end
subgraph MultiPass["Multi-Pass (Best for Large PRs)"]
MP1["Per-file local analysis"]
MP2["Cross-file integration pass"]
MP3["Avoids attention dilution<br/>and contradictions"]
end
classDef red fill:#991b1b,stroke:#f87171,stroke-width:1px,color:#fff
classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
class SR1,SR2,SR3 red
class IR1,IR2,IR3 green
class MP1,MP2,MP3 blue
Why Self-Review Fails:
The model retains its reasoning context from generation, making it less likely to question its own decisions in the same session.
The Multi-Pass Pattern for Large PRs:
1. Per-file local analysis: Analyze each file individually for local issues
2. Cross-file integration pass: Examine cross-file data flow, consistency
3. This avoids attention dilution (superficial coverage of some files), contradictory findings (flagging a pattern in one file while approving it in another), and missed obvious bugs
Domain 4 Practice Questions
Question 1: Your automated code review reports have a 40% false positive rate on "misleading comments" but high accuracy on bugs and security. Developers are starting to ignore ALL findings. What should you do?
A) Add "be more conservative" to the system prompt.
B) Require the model to output confidence scores and filter below 80%.
C) Temporarily disable the "misleading comments" category while improving its prompt criteria, preserving developer trust in accurate categories.
D) Reduce the number of few-shot examples to make the model less aggressive.
Answer: C - High false positive rates in one category undermine trust across all categories. Temporarily disabling the noisy category preserves trust in accurate ones while you improve the prompt for that specific category.
Question 2: Your team wants to reduce API costs. You have: (1) blocking pre-merge checks, and (2) overnight technical debt reports. Your manager proposes switching both to the Message Batches API. How should you evaluate this?
A) Use batch processing for technical debt reports only; keep real-time calls for pre-merge checks.
B) Switch both to batch processing with status polling.
C) Keep real-time calls for both to avoid batch ordering issues.
D) Switch both to batch with timeout fallback.
Answer: A - The Batch API has processing up to 24 hours with no latency SLA. Suitable for overnight reports. Unsuitable for blocking pre-merge checks where developers wait.
Question 3: A PR modifies 14 files. Single-pass review produces inconsistent results: detailed feedback for some files, superficial for others, obvious bugs missed, and contradictory feedback. How should you restructure?
A) Split into focused passes: analyze each file individually for local issues, then run a separate integration pass examining cross-file data flow.
B) Require developers to split large PRs into 3-4 file submissions.
C) Switch to a higher-tier model with a larger context window.
D) Run three independent review passes and only flag issues appearing in at least two.
Answer: A - Splitting reviews into focused passes addresses the root cause: attention dilution. Per-file analysis ensures consistent depth; a separate integration pass catches cross-file issues. Larger context windows (C) don't solve attention quality. Consensus filtering (D) would suppress intermittently caught real bugs.
Question 4: Your extraction pipeline sometimes returns fabricated values for fields when the source document doesn't contain that information. How do you prevent this?
A) Add a post-processing step that validates all values against external databases.
B) Increase the model temperature to reduce deterministic fabrication.
C) Design schema fields as optional (nullable) when source documents may not contain the information, so the model can return null instead of fabricating values.
D) Add "do not hallucinate" to the system prompt.
Answer: C - Making fields nullable gives the model a valid way to say "this information isn't in the document." Required fields pressure the model to fabricate values to satisfy the schema.
Question 5: Your extraction validation fails because dates are in inconsistent formats (MM/DD/YYYY vs DD-MM-YYYY vs "January 5th, 2024"). Retrying produces the same errors. What approach should you take?
A) Define a regex-based post-processor to normalize all date formats after extraction.
B) Include format normalization rules in prompts alongside strict output schemas, with few-shot examples showing correct extraction from varied formats.
C) Pre-process all documents to a standard date format before extraction.
D) Add multiple date fields for each format variant.
Answer: B - Format normalization rules in prompts alongside output schemas handle inconsistent source formatting at extraction time. Few-shot examples showing various date formats ensure the model generalizes. Post-processing (A) shifts the problem. Pre-processing (C) requires format detection. Multiple fields (D) creates schema bloat.
Domain 4 Worked Example: Building a Data Extraction Pipeline
Let's design a complete structured data extraction pipeline for invoices, illustrating every Domain 4 concept.
Step 1 - Define the Extraction Tool with JSON Schema:
flowchart TB
subgraph Schema["JSON Schema Design"]
direction TB
S1["vendor_name: required string"]
S2["invoice_date: required string (ISO format)"]
S3["due_date: optional string<br/>(nullable - may not be present)"]
S4["line_items: required array"]
S5["stated_total: required number"]
S6["calculated_total: required number<br/>(sum of line items for validation)"]
S7["currency: enum ['USD','EUR','GBP','other']"]
S8["currency_detail: optional string<br/>(when currency is 'other')"]
S9["confidence_notes: optional string<br/>(model explains any uncertainty)"]
S10["conflict_detected: boolean<br/>(true if stated != calculated)"]
end
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
class S1,S2,S4,S5,S6,S10 blue
class S3,S7,S8,S9 amber
Notice the schema design patterns:
- due_date is nullable because not all invoices have one - preventing fabrication
- currency uses an enum with "other" + detail string for extensible categorization
- calculated_total alongside stated_total enables semantic self-correction
- conflict_detected flags discrepancies automatically
Step 2 - Add Few-Shot Examples:
Include 3 examples showing extraction from different invoice formats:
1. A standard corporate invoice with clear line items
2. A handwritten receipt with informal amounts ("about $50")
3. A multi-page invoice with the total on page 3 but items on pages 1-2
Each example shows the expected output, including how to handle ambiguity (use confidence_notes to explain) and informal measurements (normalize to standard units).
Step 3 - Implement Validation-Retry Loop:
flowchart TB
DOC["Invoice Document"] --> EXTRACT["Claude: extract_invoice tool<br/>tool_choice: forced"]
EXTRACT --> VALIDATE{"Validate<br/>extraction"}
VALIDATE -->|"Schema valid +<br/>totals match"| ACCEPT["Accept extraction"]
VALIDATE -->|"Totals mismatch"| RETRY1["Retry with:<br/>document + extraction +<br/>'Line items sum to X but<br/>stated total is Y. Re-extract<br/>checking for missed items.'"]
VALIDATE -->|"Required field<br/>missing"| CHECK{"Is info in<br/>the document?"}
CHECK -->|"Yes"| RETRY2["Retry with specific<br/>field-level error message"]
CHECK -->|"No"| MARK["Mark field as null<br/>+ log for human review"]
RETRY1 --> VALIDATE
RETRY2 --> VALIDATE
classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
classDef red fill:#991b1b,stroke:#f87171,stroke-width:1px,color:#fff
class ACCEPT green
class RETRY1,RETRY2 amber
class MARK red
The key insight: append specific validation errors to the retry prompt. "Line items sum to $1,175 but stated total is $1,250. Re-extract checking for missed items or tax/shipping." This guided retry is far more effective than a generic "try again."
Step 4 - Design Batch Strategy:
Process 10,000 invoices monthly. Strategy:
1. Test on a sample of 50 first to calibrate prompts and measure extraction quality
2. Submit monthly batch via Message Batches API (50% cost savings)
3. Handle failures by custom_id: resubmit only failed invoices with modifications (chunk multi-page invoices that exceeded context limits)
4. SLA calculation: 4-hour submission windows to guarantee 28-hour SLA with 24-hour batch processing
Step 5 - Human Review Routing:
flowchart LR
subgraph Auto["Automated (No Review)"]
A1["High confidence across all fields"]
A2["No conflicts detected"]
A3["Known document type"]
end
subgraph Sample["Stratified Random Sample"]
S1["Random 5% of automated"]
S2["Detect novel error patterns"]
S3["Measure ongoing error rates"]
end
subgraph Review["Human Review Queue"]
R1["Low confidence on any field"]
R2["Conflict detected (stated != calc)"]
R3["Unknown document type"]
R4["Ambiguous/contradictory source"]
end
classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
class A1,A2,A3 green
class S1,S2,S3 blue
class R1,R2,R3,R4 amber
Before automating any extractions, analyze accuracy by document type AND by field to verify consistent performance across all segments. Never trust aggregate metrics (97% overall may hide 60% accuracy on medical invoices).
Domain 4 Additional Practice Questions
Question 6: Your extraction tool uses "other" as a catch-all enum value but doesn't capture what the "other" actually is. Downstream systems can't process these records. How should you redesign the schema?
A) Add an "other_detail" string field that is required when the enum value is "other", capturing the specific value that didn't match predefined categories.
B) Add more enum values to cover all possible cases.
C) Remove the "other" option and force the model to choose from existing enums.
D) Post-process "other" values with a separate classification model.
Answer: A - The "other" + detail string pattern is the standard approach for extensible categorization. It captures novel values while keeping the enum clean. Adding more enums (B) is a losing battle. Removing "other" (C) forces fabrication. Post-processing (D) adds unnecessary complexity.
Question 7: You need to extract data from both invoices and contracts, but the document type is unknown at submission time. How should you configure tool_choice?
A) tool_choice: "auto" so the model decides which format to use.
B) tool_choice: "any" to guarantee structured output while letting the model pick the appropriate extraction schema (extract_invoice or extract_contract).
C) Force extract_invoice first, then try extract_contract if it fails.
D) Use a separate classifier to determine document type before extraction.
Answer: B - tool_choice: "any" guarantees the model calls a tool (producing structured output) while letting it choose the appropriate schema based on document content. "auto" (A) risks returning text instead of calling a tool. Sequential forcing (C) wastes API calls. A separate classifier (D) adds latency.
Question 8: Your code review system generates detailed findings but developers consistently dismiss findings about "unused imports" while accepting findings about "null pointer risks." The dismissal pattern is making it hard to improve the system. What should you implement?
A) Stop reporting unused imports entirely.
B) Add confidence scores to each finding.
C) Add a detected_pattern field to structured findings to track which code constructs trigger findings, enabling systematic analysis of dismissal patterns to improve prompts for those categories.
D) Weight findings based on historical acceptance rates.
Answer: C - The detected_pattern field enables systematic analysis of why developers dismiss certain findings. This data drives targeted prompt improvements for specific categories. Simply removing the category (A) loses valid findings. Confidence scores (B) don't explain dismissals.
Domain 5: Context Management & Reliability (15%)
This domain covers how you preserve critical information across long interactions, handle escalation, manage errors in multi-agent systems, and maintain information provenance.
5.1 Conversation Context Preservation
flowchart TB
subgraph Risks["Context Risks"]
R1["Progressive summarization<br/>loses specific values:<br/>dates, amounts, percentages"]
R2["Lost-in-the-middle:<br/>models miss findings<br/>from middle sections"]
R3["Tool result accumulation:<br/>40+ fields when only<br/>5 are relevant"]
end
subgraph Solutions["Mitigation Strategies"]
S1["Extract transactional facts<br/>into persistent 'case facts'<br/>block outside summaries"]
S2["Place key findings at<br/>beginning of input, use<br/>explicit section headers"]
S3["Trim verbose tool outputs<br/>to only relevant fields<br/>before they accumulate"]
end
R1 --> S1
R2 --> S2
R3 --> S3
classDef red fill:#991b1b,stroke:#f87171,stroke-width:1px,color:#fff
classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
class R1,R2,R3 red
class S1,S2,S3 green
Three Critical Risks:
-
Progressive Summarization condenses specific values (amounts, dates, order numbers) into vague summaries. Solution: Extract transactional facts into a persistent "case facts" block included in each prompt, outside summarized history.
-
Lost in the Middle effect means models reliably process information at the beginning and end of long inputs but may miss findings from middle sections. Solution: Place key findings summaries at the beginning and use explicit section headers.
-
Tool Result Accumulation consumes tokens disproportionately (e.g., order lookup returns 40+ fields when only 5 matter). Solution: Trim verbose outputs to only relevant fields before they accumulate.
For multi-agent systems:
- Require subagents to include metadata (dates, source locations, methodological context) in structured outputs
- Modify upstream agents to return structured data (key facts, citations, relevance scores) instead of verbose content and reasoning chains
5.2 Escalation and Ambiguity Resolution
flowchart TB
REQUEST["Customer Request"] --> CHECK{"Escalation Needed?"}
CHECK -->|"Customer explicitly<br/>asks for human"| IMMEDIATE["Escalate immediately<br/>without attempting investigation"]
CHECK -->|"Policy gap or<br/>exception needed"| ESCALATE["Escalate: policy<br/>is ambiguous or silent"]
CHECK -->|"Straightforward issue<br/>within capability"| RESOLVE["Acknowledge frustration<br/>+ offer resolution"]
RESOLVE --> REITERATE{"Customer reiterates<br/>human preference?"}
REITERATE -->|"Yes"| IMMEDIATE
REITERATE -->|"No"| DONE["Resolve autonomously"]
classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
classDef red fill:#991b1b,stroke:#f87171,stroke-width:1px,color:#fff
classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
class IMMEDIATE,ESCALATE red
class RESOLVE amber
class DONE green
Appropriate Escalation Triggers:
- Customer explicitly requests a human agent (honor immediately)
- Policy exceptions or gaps (policy is ambiguous or silent on the request)
- Inability to make meaningful progress
What Does NOT Work as Escalation Triggers:
- Sentiment-based escalation (sentiment doesn't correlate with case complexity)
- Self-reported confidence scores (poorly calibrated)
- Complexity heuristics (the model doesn't accurately self-assess)
Multiple Customer Matches:
When a lookup returns multiple matches, ask for additional identifiers rather than selecting based on heuristics.
5.3 Error Propagation in Multi-Agent Systems
Error Propagation Design Principles:
| Pattern | Description |
|---|---|
| Structured error context | Include failure type, attempted query, partial results, alternative approaches |
| Local recovery first | Subagents implement local retry for transient failures |
| Propagate only unresolvable errors | Include what was attempted and partial results |
| Never suppress errors | Returning empty results as success prevents recovery |
| Never terminate on single failure | Killing the entire workflow wastes completed work |
Anti-Patterns:
flowchart LR
subgraph Anti["Anti-Patterns"]
A1["Generic 'search unavailable'<br/>hides valuable context"]
A2["Empty results as success<br/>prevents recovery"]
A3["Terminate entire workflow<br/>on single failure"]
end
subgraph Correct["Correct Patterns"]
C1["Structured context enables<br/>intelligent coordinator decisions"]
C2["Access failure clearly<br/>distinguished from empty result"]
C3["Partial results + coverage<br/>annotations in synthesis"]
end
Anti -->|"Replace with"| Correct
classDef red fill:#991b1b,stroke:#f87171,stroke-width:1px,color:#fff
classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
class A1,A2,A3 red
class C1,C2,C3 green
5.4 Large Codebase Exploration
Signs of Context Degradation:
- Model starts giving inconsistent answers
- References "typical patterns" rather than specific classes discovered earlier
- Loses track of findings from earlier in the session
Mitigation Strategies:
| Strategy | When to Use |
|---|---|
| Scratchpad files | Persist key findings across context boundaries; reference in subsequent questions |
| Subagent delegation | Isolate verbose exploration; main agent coordinates high-level understanding |
| Summarize between phases | Inject summaries into next phase's initial context |
| Crash recovery manifests | Each agent exports state to a known location; coordinator loads on resume |
/compact |
Reduce context usage during extended exploration sessions |
5.5 Human Review Workflows and Confidence Calibration
The Danger of Aggregate Metrics:
97% overall accuracy may mask 60% accuracy on a specific document type or field. Always segment.
Confidence Calibration Pipeline:
flowchart LR
subgraph Extract["Extraction"]
E1["Model outputs<br/>field-level confidence"]
end
subgraph Calibrate["Calibration"]
C1["Labeled validation set"]
C2["Set review thresholds"]
end
subgraph Route["Routing"]
R1["High confidence:<br/>stratified random sample"]
R2["Low confidence:<br/>human review queue"]
R3["Ambiguous source:<br/>human review queue"]
end
Extract --> Calibrate --> Route
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
class E1,C1,C2,R1,R2,R3 blue
Key Practices:
- Stratified random sampling of high-confidence extractions for ongoing error rate measurement
- Analyze accuracy by document type AND field before reducing human review
- Route extractions with low model confidence or ambiguous/contradictory sources to human review
5.6 Information Provenance and Multi-Source Synthesis
Source attribution gets lost during summarization when findings are compressed without preserving claim-source mappings.
Provenance Preservation Rules:
- Structured claim-source mappings: Require subagents to output source URLs, document names, relevant excerpts that downstream agents must preserve through synthesis
- Temporal data: Require publication/collection dates to prevent temporal differences from being misinterpreted as contradictions
- Conflicting statistics: Annotate conflicts with source attribution rather than arbitrarily selecting one value
- Content-type-appropriate rendering: Financial data as tables, news as prose, technical findings as structured lists
Report Structure:
- Distinguish well-established findings from contested ones
- Preserve original source characterizations and methodological context
- Include coverage annotations indicating which topic areas have gaps due to unavailable sources
Domain 5 Practice Questions
Question 1: Your support agent handles a customer with 3 issues. After resolving the first, the agent responds to the third issue using details from the first (wrong order number, wrong amount). What's the root cause and fix?
A) The agent needs a larger context window.
B) Transactional facts are being lost in progressive summarization. Extract each issue's structured data (order ID, amount, status) into a separate persistent context layer.
C) The agent should handle one issue at a time.
D) Add few-shot examples showing multi-issue handling.
Answer: B - Progressive summarization condenses specific values into vague summaries, causing cross-contamination between issues. A persistent structured context layer keeps each issue's data distinct.
Question 2: A frustrated customer says "Just let me talk to a human!" while asking about a standard refund for a damaged item. The agent can handle this easily. What should it do?
A) Process the refund immediately since it's a straightforward case.
B) Explain that the issue can be resolved faster without waiting for a human.
C) Honor the customer's explicit request for a human agent immediately without first attempting investigation.
D) Ask why the customer wants a human to determine if escalation is truly needed.
Answer: C - When a customer explicitly requests a human agent, honor that request immediately. This is a firm rule. Even if the issue is straightforward, the customer's explicit preference takes priority.
Question 3: Your research system combines findings from multiple sources. The final report says "AI adoption is at 35%" without indicating which source. One source said 35% (2022 survey, US only), another said 47% (2024 survey, global). What went wrong?
A) The synthesis agent needs better summarization instructions.
B) Source attribution was lost during synthesis. Require subagents to output structured claim-source mappings including publication dates, and require the synthesis agent to preserve and merge these mappings, annotating conflicts rather than selecting one value.
C) The search agent should only return the most recent data.
D) Add a deduplication step to remove conflicting statistics.
Answer: B - The synthesis lost provenance (source, date, scope). The correct approach preserves claim-source mappings through synthesis and annotates conflicts with both values and their sources, including temporal context.
Question 4: During a long codebase exploration session, Claude starts referencing "typical patterns" instead of specific classes it discovered 30 turns ago. What should you do?
A) Restart the conversation and begin fresh.
B) Increase the model's context window.
C) Have agents maintain scratchpad files recording key findings, and reference them for subsequent questions. Use /compact to reduce context when it fills with verbose discovery output.
D) Use a larger model with better long-context performance.
Answer: C - Context degradation in extended sessions is addressed with scratchpad files (persist findings across context boundaries) and /compact (reduce context usage). Restarting (A) loses all progress. Context window size (B, D) doesn't solve attention quality degradation.
Question 5: Your extraction system achieves 97% overall accuracy. But when you start automating high-confidence extractions, error rates spike for medical documents. What went wrong?
A) The 97% aggregate accuracy masked poor performance on specific document types. You should have analyzed accuracy by document type and field before automating, using stratified random sampling for ongoing error detection.
B) The confidence scores need recalibration with a larger validation set.
C) Medical documents should be excluded from automated extraction entirely.
D) The model needs fine-tuning on medical documents.
Answer: A - Aggregate accuracy metrics mask poor performance on specific segments. Always stratify by document type and field, and use stratified random sampling of high-confidence extractions for ongoing error monitoring.
Domain 5 Worked Example: Multi-Source Research with Provenance
Let's trace how context and provenance flow through a research system to illustrate every Domain 5 concept.
Scenario: A multi-agent system researches "global AI adoption rates." Two sources provide conflicting data: Source A says 35% (US, 2022) and Source B says 47% (global, 2024).
Step 1 - Structured Subagent Output:
Each subagent must return structured claim-source mappings, not just text summaries:
flowchart TB
subgraph SearchOutput["Search Agent Output"]
SO1["claim: 'AI adoption at 35%'<br/>source: 'McKinsey 2022 Report'<br/>url: 'mckinsey.com/...'<br/>scope: 'US enterprises'<br/>date: '2022-11-15'"]
SO2["claim: 'AI adoption at 47%'<br/>source: 'Gartner 2024 Survey'<br/>url: 'gartner.com/...'<br/>scope: 'Global organizations'<br/>date: '2024-03-20'"]
end
subgraph AnalysisOutput["Analysis Agent Output"]
AO1["conflict_detected: true<br/>type: 'different_scope_and_time'<br/>resolution_note: 'Values differ<br/>due to geographic scope (US vs<br/>global) and temporal gap (2 years)'"]
end
subgraph SynthesisInput["What Synthesis Receives"]
SI1["Both claims with full attribution<br/>Conflict annotation<br/>Resolution guidance"]
end
SearchOutput --> AnalysisOutput --> SynthesisInput
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
class SO1,SO2 blue
class AO1 amber
class SI1 green
Step 2 - Context Preservation:
The coordinator maintains a "case facts" block that persists across summarization:
Case Facts Block (always included in every prompt):
- Research topic: "Global AI adoption rates"
- Source A: McKinsey 2022, 35%, US enterprises
- Source B: Gartner 2024, 47%, global organizations
- Known conflict: scope and temporal differences
- Coverage gaps: No data on Asia-Pacific, no sector breakdown
This block sits outside the summarized conversation history. Even if the conversation gets summarized, these facts persist verbatim.
Step 3 - Synthesis with Provenance:
The final report must:
- Present BOTH values with full source attribution (not pick one)
- Distinguish well-established findings from contested ones
- Note the scope difference (US vs global) and temporal gap
- Include coverage annotations ("No data was available for Asia-Pacific adoption rates")
- Render financial data as tables, news as prose, technical findings as structured lists
Wrong output: "AI adoption is at 35%." (Lost source, lost conflict)
Right output: "AI adoption rates vary by scope and timeframe: McKinsey's 2022 US enterprise survey found 35% adoption, while Gartner's 2024 global survey found 47%. The difference likely reflects both geographic scope (US vs global) and the two-year gap between studies."
Step 4 - Error Handling:
If the academic paper search subagent times out:
flowchart LR
subgraph Error["Error Context Returned"]
E1["failure_type: 'timeout'"]
E2["attempted_query: 'AI adoption<br/>rates peer-reviewed 2023-2024'"]
E3["partial_results: [2 of 5<br/>papers retrieved]"]
E4["alternatives: ['narrow to<br/>specific sector', 'try<br/>Google Scholar instead']"]
end
subgraph Recovery["Coordinator Decision"]
R1["Option A: Retry with narrower query"]
R2["Option B: Proceed with partial results"]
R3["Option C: Annotate report with<br/>'academic literature partially reviewed'"]
end
Error --> Recovery
classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
class E1,E2,E3,E4 amber
class R1,R2,R3 blue
The coordinator chooses option C: proceed with partial results and annotate the report's coverage gaps. The synthesis output includes: "Note: Academic literature was partially reviewed due to search service limitations. Two of five targeted papers were retrieved. Findings from peer-reviewed sources may be incomplete."
Domain 5 Additional Practice Questions
Question 6: Your support agent handles a customer with two issues. For issue #1, it correctly identifies order #4521 ($127.50). But when addressing issue #2, it mentions "$127.50" in context where the second order is $89.99. What specific pattern caused this?
A) The agent confused the two order numbers.
B) Progressive summarization compressed the two issues' details into a shared context, causing the amount from issue #1 to contaminate issue #2. The fix is a separate structured data layer for each issue.
C) The model has a recency bias toward larger numbers.
D) The tool returned incorrect data for the second order.
Answer: B - This is a classic progressive summarization failure. Specific values (amounts, order numbers) from one issue bleed into another when compressed into a shared summary. A separate context layer per issue prevents cross-contamination.
Question 7: Your agent uses a lookup_customer tool that sometimes returns 3 customers with similar names. The agent picks the one with the most recent activity. This leads to operating on the wrong customer 8% of the time. What should it do instead?
A) Add a confidence score and only proceed when confidence is above 90%.
B) Ask the customer for additional identifiers (email, phone, last 4 of account number) to disambiguate, rather than selecting based on heuristics.
C) Return all three customers and let the agent try each one.
D) Use a fuzzy matching algorithm to improve the initial match.
Answer: B - When tool results return multiple matches, the agent should ask for additional identifiers. Heuristic selection (most recent activity) creates a systematic error rate. Only the customer knows which account is theirs.
Question 8: Your coordinator agent crashes mid-research. When restarted, it has no memory of what three subagents already discovered. How should you design crash recovery?
A) Store all context in the coordinator's conversation history.
B) Have subagents write results to a shared database.
C) Design structured state persistence: each agent exports its state (findings, partial results, progress) to a known location. The coordinator loads a manifest on resume and injects the recovered state into agent prompts.
D) Use session resumption to continue from the crash point.
Answer: C - Structured agent state exports with a manifest enable the coordinator to recover from crashes. Each agent saves its state to a known location. On resume, the coordinator loads the manifest and injects state into agent prompts. Session resumption (D) may have stale or missing tool results.
Question 9: Your synthesis agent receives findings from 5 subagents, but the final report only includes findings from subagents 1, 2, and 5. Subagents 3 and 4 each contributed valid findings. What's the most likely cause?
A) Subagents 3 and 4 returned errors that were silently suppressed.
B) The "lost-in-the-middle" effect caused findings from the middle of the aggregated input to be missed. The fix is to place key findings summaries at the beginning of the input and use explicit section headers.
C) The synthesis agent has a token limit that truncated the input.
D) Subagents 3 and 4 returned findings in a different format.
Answer: B - The lost-in-the-middle effect means models reliably process information at the beginning and end of long inputs but may miss middle sections. Placing summaries at the start and using explicit section headers mitigates this.
Technologies and Concepts Reference
This section provides a quick-reference summary of every technology and concept that may appear on the exam.
Claude Agent SDK
mindmap
root((Agent SDK))
Agent Definitions
System prompts
Tool restrictions
allowedTools config
Agentic Loops
stop_reason handling
tool_use vs end_turn
Conversation history
Hooks
PostToolUse
Tool call interception
Data normalization
Subagent Spawning
Task tool
Parallel execution
Context passing
| Concept | Key Detail |
|---|---|
stop_reason: "tool_use" |
Continue the loop, execute the requested tool |
stop_reason: "end_turn" |
Terminate the loop, present the response |
PostToolUse hook |
Transform tool results before model processes them |
| Tool call interception | Block policy-violating actions before execution |
allowedTools |
Must include "Task" for coordinator to spawn subagents |
AgentDefinition |
Config for each subagent: description, system prompt, tools |
Model Context Protocol (MCP)
| Concept | Key Detail |
|---|---|
| MCP servers | Provide tools and resources to agents |
| MCP tools | Actions: search, create, update, delete |
| MCP resources | Content catalogs: issue summaries, schemas, doc hierarchies |
isError flag |
Signal tool failure back to the agent |
.mcp.json |
Project-level server config (shared, version-controlled) |
~/.claude.json |
User-level server config (personal, not shared) |
${ENV_VAR} expansion |
Credential management without committing secrets |
Claude Code
| Concept | Key Detail |
|---|---|
| CLAUDE.md hierarchy | User (~/.claude/) < Project (.claude/ or root) < Directory |
.claude/rules/ |
Topic-specific rules with YAML frontmatter glob patterns |
.claude/commands/ |
Project-scoped slash commands (shared via VCS) |
~/.claude/commands/ |
User-scoped personal commands |
.claude/skills/ |
Skills with SKILL.md frontmatter |
context: fork |
Run skill in isolated sub-agent context |
allowed-tools |
Restrict tool access during skill execution |
argument-hint |
Prompt for required parameters |
| Plan mode | Complex tasks, architectural decisions, multi-file changes |
| Direct execution | Simple, well-scoped changes |
| Explore subagent | Isolate verbose discovery output |
/memory |
Verify loaded memory files |
/compact |
Reduce context usage in extended sessions |
--resume |
Continue a named session |
fork_session |
Branch from shared analysis baseline |
Claude Code CLI (for CI/CD)
| Flag | Purpose |
|---|---|
-p / --print |
Non-interactive mode (required for CI) |
--output-format json |
Machine-parseable structured output |
--json-schema |
Enforce specific output schema |
Claude API
| Concept | Key Detail |
|---|---|
tool_use with JSON schemas |
Guaranteed schema-compliant structured output |
tool_choice: "auto" |
Model may return text instead of calling a tool |
tool_choice: "any" |
Model must call a tool (any available tool) |
| Forced tool selection | {"type": "tool", "name": "..."} - specific tool required |
stop_reason values |
"tool_use", "end_turn" |
max_tokens |
Limit response length |
| System prompts | Provide instructions and context |
Message Batches API
| Concept | Key Detail |
|---|---|
| Cost savings | 50% reduction |
| Processing window | Up to 24 hours |
| Latency SLA | None guaranteed |
custom_id |
Correlate request/response pairs |
| Multi-turn tool calling | NOT supported within a single batch request |
| Best for | Non-blocking, latency-tolerant workloads |
JSON Schema Patterns
| Pattern | Purpose |
|---|---|
| Required vs optional fields | Prevent fabrication of missing data |
| Nullable fields | Allow model to return null for absent information |
| Enum types | Constrain categorical values |
"other" + detail string |
Extensible categorization |
"unclear" enum value |
Handle ambiguous cases |
| Strict mode | Eliminates syntax errors (not semantic errors) |
Exam Scenarios Deep Dive
Scenario 1: Customer Support Resolution Agent
Setup: You're building a customer support agent using the Claude Agent SDK. It handles returns, billing disputes, and account issues with MCP tools: get_customer, lookup_order, process_refund, escalate_to_human. Target: 80%+ first-contact resolution.
What to Know:
- Programmatic prerequisites for tool ordering (verify customer before refund)
- Structured error responses from MCP tools
- Escalation criteria (explicit customer request, policy gaps, inability to progress)
- Multi-concern request decomposition
- Handoff summaries for human agents
- Hook-based compliance enforcement (block refunds above threshold)
Scenario 2: Code Generation with Claude Code
Setup: Using Claude Code for code generation, refactoring, debugging, documentation. Custom slash commands, CLAUDE.md, plan mode decisions.
What to Know:
- CLAUDE.md configuration hierarchy
- .claude/commands/ vs ~/.claude/commands/
- .claude/rules/ with glob patterns
- Skills with context: fork and allowed-tools
- Plan mode vs direct execution decision criteria
- Iterative refinement techniques
Scenario 3: Multi-Agent Research System
Setup: Coordinator delegates to search, analysis, synthesis, and report subagents. Produces comprehensive cited reports.
What to Know:
- Hub-and-spoke coordinator architecture
- Task decomposition pitfalls (overly narrow coverage)
- Parallel subagent execution
- Context passing (explicit, structured, with metadata)
- Error propagation (structured context, not generic messages)
- Source attribution preservation through synthesis
Scenario 4: Developer Productivity
Setup: Agent helps engineers explore codebases, understand legacy systems, generate boilerplate. Uses built-in tools and MCP servers.
What to Know:
- Built-in tool selection (Grep vs Glob vs Read vs Edit vs Bash)
- Incremental codebase understanding pattern
- MCP server integration (project vs user scope)
- Tool description quality for reliable selection
- Scoped tool access per agent role
Scenario 5: Claude Code for CI
Setup: Automated code reviews, test generation, PR feedback in CI/CD pipeline.
What to Know:
- -p flag for non-interactive mode
- --output-format json and --json-schema for structured CI output
- Session context isolation (independent reviewer vs self-review)
- Batch API appropriateness (overnight reports vs blocking checks)
- CLAUDE.md for providing test standards to CI
Scenario 6: Structured Data Extraction
Setup: Extract information from unstructured documents, validate with JSON schemas, handle edge cases.
What to Know:
- tool_use with JSON schemas for structured output
- tool_choice configuration options
- Schema design (nullable fields, enums with "other" + detail)
- Validation-retry loops (when retries help vs when they don't)
- Batch processing strategies
- Human review workflows and confidence calibration
Preparation Exercises
Exercise 1: Build a Multi-Tool Agent with Escalation Logic
Objective: Practice agentic loops, tool integration, structured errors, and escalation.
Steps:
1. Define 3-4 MCP tools with detailed, differentiated descriptions. Include two with similar functionality requiring careful description.
2. Implement an agentic loop checking stop_reason for "tool_use" vs "end_turn".
3. Add structured error responses: errorCategory, isRetryable, human-readable descriptions.
4. Implement a hook that intercepts tool calls to enforce a business rule.
5. Test with multi-concern messages.
Domains covered: D1, D2, D5
Exercise 2: Configure Claude Code for a Team
Objective: Practice CLAUDE.md hierarchy, custom commands, path rules, MCP integration.
Steps:
1. Create project-level CLAUDE.md with universal standards.
2. Create .claude/rules/ files with glob patterns for different code areas.
3. Create a skill with context: fork and allowed-tools.
4. Configure MCP server in .mcp.json with env var expansion.
5. Test plan mode vs direct execution on tasks of varying complexity.
Domains covered: D3, D2
Exercise 3: Build a Structured Data Extraction Pipeline
Objective: Practice JSON schemas, tool_use, validation-retry, batch processing.
Steps:
1. Define extraction tool with required/optional/nullable fields and enum patterns.
2. Implement validation-retry loop with error feedback.
3. Add few-shot examples for varied document formats.
4. Design batch processing with Message Batches API, handle failures by custom_id.
5. Implement human review routing with field-level confidence scores.
Domains covered: D4, D5
Exercise 4: Design a Multi-Agent Research Pipeline
Objective: Practice subagent orchestration, context passing, error propagation, provenance.
Steps:
1. Build coordinator with at least two subagents, allowedTools including "Task".
2. Implement parallel subagent execution.
3. Design structured output separating content from metadata.
4. Simulate subagent timeout and verify structured error propagation.
5. Test with conflicting source data and verify provenance preservation.
Domains covered: D1, D2, D5
Comprehensive Question Bank
This section contains additional practice questions across all domains to help you prepare thoroughly.
Cross-Domain Questions
Question 1 (D1+D2): Your coordinator agent always routes every query through all four subagents (search, analysis, synthesis, report), even for simple factual questions that only need a quick search. How should you fix this?
A) Design the coordinator to analyze query requirements and dynamically select which subagents to invoke rather than always routing through the full pipeline.
B) Add a complexity classifier that categorizes queries before the coordinator sees them.
C) Give the coordinator a "quick_answer" tool for simple queries.
D) Reduce the number of subagents to two.
Answer: A - The coordinator should dynamically select which subagents to invoke based on query complexity. This is a core principle of coordinator design - not every query needs the full pipeline.
Question 2 (D3+D4): Your CI pipeline runs Claude Code to generate tests, but it keeps suggesting tests that duplicate existing coverage. What's the most effective fix?
A) Add "do not duplicate existing tests" to the prompt.
B) Post-process generated tests to filter duplicates.
C) Provide existing test files in context so test generation avoids suggesting scenarios already covered by the test suite, and document testing standards and available fixtures in CLAUDE.md.
D) Use a separate model to filter duplicate test suggestions.
Answer: C - Providing existing tests in context lets Claude see what's covered. CLAUDE.md documentation of testing standards and fixtures improves quality. Simply saying "don't duplicate" (A) is vague. Post-processing (B) wastes API calls.
Question 3 (D1+D5): Your agent uses a scratchpad file but still loses track of findings after 50+ turns. The scratchpad has grown to 200 lines of unstructured notes. What should you change?
A) Increase the context window.
B) Summarize key findings from one exploration phase before spawning subagents for the next phase, injecting structured summaries into initial context rather than relying on a single growing scratchpad.
C) Use /compact more frequently.
D) Start a new session every 25 turns.
Answer: B - Structured phase summaries injected into new context is more effective than an ever-growing unstructured scratchpad. The pattern: summarize findings, inject into next phase's context, delegate to subagents for the next phase.
Question 4 (D2+D5): Your MCP tool returns 40+ fields per order lookup, but the agent only needs 5 fields for the current task. Context is filling up with irrelevant data. What should you do?
A) Ask the backend team to create a slim API endpoint.
B) Use a larger context model.
C) Implement a PostToolUse hook that trims verbose tool outputs to only relevant fields before they accumulate in context.
D) Add a system prompt instruction to ignore irrelevant fields.
Answer: C - A PostToolUse hook deterministically trims tool output before the model processes it, preventing context bloat. This is exactly what hooks are designed for - transforming tool results before they enter the conversation.
Question 5 (D4+D5): Your extraction pipeline extracts "total: $1,250" but the line items sum to $1,175. The JSON schema validation passes because both are valid numbers. How do you catch this?
A) Add strict mode to the JSON schema.
B) Implement a post-processing calculator.
C) Design a self-correction validation flow: extract "calculated_total" alongside "stated_total" to flag discrepancies, adding a "conflict_detected" boolean for inconsistent source data.
D) Add a "verify totals" instruction to the prompt.
Answer: C - Schema validation catches syntax errors, not semantic errors. Having the model extract both calculated and stated totals with a conflict flag enables automatic detection of semantic mismatches.
Question 6 (D1+D3): You want to compare two different refactoring approaches before committing to one. You've already done significant codebase analysis. What's the best approach?
A) Start two separate sessions, re-doing the codebase analysis in each.
B) Use fork_session to create two independent branches from the shared analysis baseline to explore each approach.
C) Try approach A first, then undo all changes and try approach B.
D) Ask Claude to describe both approaches without implementing either.
Answer: B - fork_session creates independent branches from a shared analysis baseline. This avoids re-doing the analysis (A) and avoids the risk of incomplete undo (C) while exploring both approaches with full implementation.
Question 7 (D2): You have two MCP tools: analyze_content ("Analyzes content") and analyze_document ("Analyzes documents"). The agent frequently picks the wrong one. After expanding descriptions, some misrouting still occurs because the system prompt says "always analyze content from web sources first." What's the issue?
A) The tool names are too similar and should be renamed.
B) The descriptions need more examples.
C) The system prompt contains keyword-sensitive instructions ("analyze content") that create an unintended association with the analyze_content tool, overriding well-written descriptions. Review and rephrase the system prompt.
D) Add tool_choice forced selection.
Answer: C - System prompt wording can create unintended tool associations through keywords. "Always analyze content from web sources" biases toward the analyze_content tool. Rephrasing to avoid tool-name keywords fixes this.
Question 8 (D4): Your extraction tool handles invoices well but produces empty results for contracts. Adding "extract data from contracts" to the instructions doesn't help. What should you try?
A) Fine-tune the model on contract examples.
B) Create a separate tool for contracts.
C) Add 2-4 few-shot examples showing correct extraction from contracts with their specific format variations (inline clauses vs separate schedules, varied terminology).
D) Increase temperature to encourage more diverse outputs.
Answer: C - Few-shot examples demonstrating extraction from varied document structures (contracts vs invoices) is the most effective technique for this class of problem. The model needs to see what successful contract extraction looks like.
Question 9 (D3): A developer runs a codebase analysis skill that floods the conversation with 2000 lines of discovery output. Subsequent questions get confused responses. What configuration prevents this?
A) Add context: fork to the skill's SKILL.md frontmatter so it runs in an isolated sub-agent context, returning only the summary to the main conversation.
B) Add a max-output-length parameter to the skill.
C) Use /compact after running the skill.
D) Break the skill into smaller sub-skills.
Answer: A - context: fork isolates the skill in a sub-agent context. The verbose discovery output stays in the fork; only the summary returns to the main conversation.
Question 10 (D5): Your multi-agent system produces a report where Finding #3 says "According to a 2023 study..." but there's no source citation. You trace it back to the synthesis agent, which received the finding with full attribution from the analysis agent. What happened?
A) The analysis agent's attribution format is inconsistent.
B) The synthesis agent compressed findings during summarization without preserving claim-source mappings. Require the synthesis agent to preserve and merge structured claim-source mappings when combining findings.
C) The report template strips citations.
D) The source URL was broken.
Answer: B - Source attribution is lost during summarization when findings are compressed without preserving claim-source mappings. The fix is requiring the synthesis agent to preserve mappings through the synthesis process.
Out-of-Scope Topics (What NOT to Study)
These topics will NOT appear on the exam. Save your study time:
- Fine-tuning Claude models or training custom models
- Claude API authentication, billing, or account management
- Detailed implementation of specific programming languages or frameworks
- Deploying or hosting MCP servers (infrastructure, networking, containers)
- Claude's internal architecture, training process, or model weights
- Constitutional AI, RLHF, or safety training methodologies
- Embedding models or vector database implementation details
- Computer use (browser automation, desktop interaction)
- Vision/image analysis capabilities
- Streaming API implementation or server-sent events
- Rate limiting, quotas, or API pricing calculations
- OAuth, API key rotation, or authentication protocol details
- Specific cloud provider configurations (AWS, GCP, Azure)
- Performance benchmarking or model comparison metrics
- Prompt caching implementation details (beyond knowing it exists)
- Token counting algorithms or tokenization specifics
Exam Day Strategy
Reading the Question
- Read the scenario context carefully - It tells you which systems exist and what tools are available
- Identify the root cause in the question stem - Many questions describe a problem; find what's actually wrong before looking at answers
- Eliminate "over-engineered" options - If a simpler fix addresses the root cause, the complex option is wrong
- Watch for the "first step" qualifier - When asked for the "most effective first step," pick the lowest-effort, highest-leverage fix
Common Distractor Patterns
| Distractor Type | Example | Why It's Wrong |
|---|---|---|
| Over-engineered | "Deploy a classifier model" when prompt optimization hasn't been tried | Jumps to infrastructure before simpler fixes |
| Solves wrong problem | "Sentiment analysis" when the issue is case complexity | Addresses a correlated but incorrect variable |
| Probabilistic where deterministic needed | "Enhance system prompt" when financial operations need guaranteed ordering | Prompt instructions have non-zero failure rate |
| Non-existent feature | CLAUDE_HEADLESS=true environment variable |
References features that don't exist in Claude Code |
Time Management
- Don't spend too long on any single question - guess and move on
- No penalty for guessing, so never leave a question blank
- Mark difficult questions for review if the exam interface allows it
Final Pre-Exam Checklist
Quick Reference Cheat Sheet
Decision Matrix: Which Approach to Use?
flowchart TB
Q1{"Need guaranteed<br/>tool ordering?"}
Q1 -->|"Yes"| HOOK["Use programmatic<br/>hooks/prerequisites"]
Q1 -->|"No"| Q2{"Need consistent<br/>output format?"}
Q2 -->|"Yes"| FEW["Use few-shot<br/>examples"]
Q2 -->|"No"| Q3{"Need structured<br/>data output?"}
Q3 -->|"Yes"| TOOL["Use tool_use<br/>with JSON schema"]
Q3 -->|"No"| Q4{"Complex multi-file<br/>task?"}
Q4 -->|"Yes"| PLAN["Use plan mode"]
Q4 -->|"No"| DIRECT["Use direct execution"]
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
class HOOK,FEW,TOOL,PLAN,DIRECT green
class Q1,Q2,Q3,Q4 blue
The "First Fix" Priority
When the exam asks for the "most effective" or "first step," default to this priority:
- Fix descriptions (tool descriptions, prompt criteria) - Lowest effort, highest leverage
- Add few-shot examples - Address ambiguity and inconsistency
- Add programmatic enforcement (hooks, prerequisites) - When compliance must be guaranteed
- Restructure architecture (split tools, add subagents) - When the design is fundamentally wrong
Key Numbers to Remember
| Metric | Value |
|---|---|
| Passing score | 720 / 1000 |
| Scenarios per exam | 4 (from pool of 6) |
| Ideal tools per agent | 4-5 (not 18) |
| Batch API cost savings | 50% |
| Batch API processing window | Up to 24 hours |
| Target experience level | 6+ months with Claude |
| Few-shot examples | 2-4 targeted examples |
| Highest-weighted domain | D1: Agentic Architecture (27%) |
You've reached the end of this comprehensive exam prep guide. Check your progress bar at the top - aim for 100% before sitting for the exam. Remember: the CCAF tests practical judgment about trade-offs, not memorization. Build real agents, configure real Claude Code projects, and extract real structured data to internalize these patterns.
Good luck on your certification.