← Blog

Claude Certified Architect - Foundations: The Complete Exam Preparation Guide

June 03, 2026 · 120 min read

Your Study Progress 0%
0 / 0 completed Reset progress

The Claude Certified Architect - Foundations (CCAF) certification validates that you can make informed decisions about trade-offs when implementing real-world solutions with Claude. It tests foundational knowledge across four core technologies: Claude Agent SDK, the Claude API, Claude Code, and Model Context Protocol (MCP).

This guide is your one-stop exam preparation resource. Every domain, every task statement, every concept is covered here with visual diagrams, worked examples, and practice questions. Check off items as you master them, and your progress is saved automatically in your browser.

Exam Overview

Before diving into the domains, understand what you are signing up for.

flowchart LR
    subgraph Format["Exam Format"]
        Q["Multiple Choice"]
        SC["Scenario-Based"]
        P["Pass/Fail"]
    end
    subgraph Scoring["Scoring"]
        S1["Scaled: 100-1000"]
        S2["Pass: 720+"]
        S3["No penalty for guessing"]
    end
    subgraph Structure["Structure"]
        ST1["4 scenarios per exam"]
        ST2["Drawn from 6 total"]
        ST3["5 content domains"]
    end
    Format --> Scoring --> Structure
    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    class Q,SC,P,S1,S2,S3,ST1,ST2,ST3 blue

Key exam facts:

  • All questions are multiple choice with one correct answer and three distractors
  • Scored on a scale of 100-1,000 with a minimum passing score of 720
  • Unanswered questions are scored as incorrect - no penalty for guessing, so always answer
  • Questions are scenario-based, grounded in realistic production contexts
  • 4 random scenarios are presented from a pool of 6

Target Candidate Profile

The ideal candidate is a solution architect with 6+ months of hands-on experience building with Claude APIs, Agent SDK, Claude Code, and MCP. You should have practical experience with:

  • Building agentic applications with the Claude Agent SDK
  • Configuring Claude Code for team workflows
  • Designing MCP tool and resource interfaces
  • Engineering prompts for reliable structured output
  • Managing context windows across long interactions
  • Integrating Claude into CI/CD pipelines
  • Making escalation and reliability decisions

Domain Weightings

pie title CCAF Exam Domain Weightings
    "D1: Agentic Architecture (27%)" : 27
    "D2: Tool Design & MCP (18%)" : 18
    "D3: Claude Code Config (20%)" : 20
    "D4: Prompt Engineering (20%)" : 20
    "D5: Context & Reliability (15%)" : 15
Domain Weight Focus Areas
D1: Agentic Architecture & Orchestration 27% Agentic loops, multi-agent systems, hooks, session management
D2: Tool Design & MCP Integration 18% Tool descriptions, error responses, MCP servers, built-in tools
D3: Claude Code Configuration & Workflows 20% CLAUDE.md hierarchy, skills, rules, plan mode, CI/CD
D4: Prompt Engineering & Structured Output 20% Few-shot, JSON schemas, batch processing, multi-pass review
D5: Context Management & Reliability 15% Context preservation, escalation, error propagation, provenance

The Six Exam Scenarios

Every exam question lives inside one of these realistic production scenarios. You will see 4 of 6 on your exam.

flowchart TB
    subgraph S1["Scenario 1: Customer Support Agent"]
        S1D["D1 + D2 + D5"]
        S1T["Agent SDK, MCP tools,<br/>escalation, refund logic"]
    end
    subgraph S2["Scenario 2: Code Generation"]
        S2D["D3 + D5"]
        S2T["Claude Code, CLAUDE.md,<br/>slash commands, plan mode"]
    end
    subgraph S3["Scenario 3: Multi-Agent Research"]
        S3D["D1 + D2 + D5"]
        S3T["Coordinator-subagent,<br/>context passing, synthesis"]
    end
    subgraph S4["Scenario 4: Developer Productivity"]
        S4D["D2 + D3 + D1"]
        S4T["Built-in tools, MCP servers,<br/>codebase exploration"]
    end
    subgraph S5["Scenario 5: CI/CD Integration"]
        S5D["D3 + D4"]
        S5T["-p flag, JSON output,<br/>batch API, code review"]
    end
    subgraph S6["Scenario 6: Structured Extraction"]
        S6D["D4 + D5"]
        S6T["JSON schemas, tool_use,<br/>validation loops, confidence"]
    end
    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
    class S1D,S2D,S3D,S4D,S5D,S6D purple
    class S1T,S2T,S3T,S4T,S5T,S6T blue

Domain 1: Agentic Architecture & Orchestration (27%)

This is the highest-weighted domain on the exam. It covers the design, implementation, and management of agentic systems built with the Claude Agent SDK.

1.1 The Agentic Loop Lifecycle

The agentic loop is the fundamental control structure for autonomous Claude agents. Understanding how it works is critical.

flowchart TD
    START["Send request to Claude<br/>(system + user + history)"] --> RESPONSE["Receive response"]
    RESPONSE --> CHECK{"Inspect<br/>stop_reason"}
    CHECK -->|"tool_use"| EXEC["Execute requested tool(s)"]
    EXEC --> APPEND["Append tool results<br/>to conversation history"]
    APPEND --> START
    CHECK -->|"end_turn"| DONE["Present final response<br/>to user"]

    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    classDef emerald fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
    classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
    class START,APPEND blue
    class EXEC amber
    class DONE emerald
    class RESPONSE,CHECK blue

The Loop Works Like This:

  1. Send a request to Claude with the system prompt, user message, and conversation history
  2. Inspect the stop_reason in the response:
    - "tool_use" - Claude wants to call a tool. Execute it, append the result to history, and loop back
    - "end_turn" - Claude is done. Present the final response to the user
  3. Append tool results to the conversation history so Claude can reason about new information
  4. Repeat until stop_reason is "end_turn"

Critical Anti-Patterns to Avoid:

Anti-Pattern Why It Fails
Parsing natural language signals for loop termination Unreliable - the model may phrase things differently each time
Setting arbitrary iteration caps as the primary stop mechanism Cuts off legitimate multi-step reasoning; use stop_reason instead
Checking for assistant text content as completion indicator The model can include text alongside tool calls

Exam Tip: When the exam asks about loop termination, the answer is almost always stop_reason. The model-driven approach (checking stop_reason) is preferred over any heuristic-based approach.

1.2 Multi-Agent Orchestration: Hub-and-Spoke

Multi-agent systems use a hub-and-spoke (coordinator-subagent) architecture. The coordinator is the brain; subagents are specialists.

flowchart TB
    USER["User Query"] --> COORD["Coordinator Agent"]
    COORD -->|"Task tool"| SA1["Search Subagent"]
    COORD -->|"Task tool"| SA2["Analysis Subagent"]
    COORD -->|"Task tool"| SA3["Synthesis Subagent"]
    SA1 -->|"Results"| COORD
    SA2 -->|"Results"| COORD
    SA3 -->|"Results"| COORD
    COORD --> OUTPUT["Final Report"]

    classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    classDef emerald fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
    class COORD purple
    class SA1,SA2,SA3 blue
    class USER,OUTPUT emerald

Key Principles:

  • Isolated context: Subagents do NOT automatically inherit the coordinator's conversation history. Context must be explicitly passed in the prompt.
  • Coordinator responsibilities: Task decomposition, delegation, result aggregation, deciding which subagents to invoke based on query complexity
  • All communication routes through the coordinator: This provides observability, consistent error handling, and controlled information flow
  • Dynamic selection: The coordinator should analyze query requirements and dynamically select which subagents to invoke, not always route through the full pipeline

Common Pitfall - Overly Narrow Decomposition:

If the coordinator decomposes "impact of AI on creative industries" into only "AI in digital art," "AI in graphic design," and "AI in photography," it misses music, writing, and film entirely. The problem is the coordinator's decomposition, not the subagents' execution.

1.3 Subagent Invocation and Context Passing

The Task Tool:
- The Task tool is the mechanism for spawning subagents
- The coordinator's allowedTools must include "Task" to invoke subagents
- Each subagent is defined with an AgentDefinition including description, system prompt, and tool restrictions

Context Passing Rules:

flowchart LR
    subgraph Wrong["WRONG: Implicit Inheritance"]
        C1["Coordinator"] -->|"Spawn"| S1["Subagent"]
        S1 -.->|"No access to<br/>parent context"| X["Missing Data"]
    end
    subgraph Right["RIGHT: Explicit Passing"]
        C2["Coordinator"] -->|"Prompt includes:<br/>findings, URLs,<br/>metadata"| S2["Subagent"]
        S2 -->|"Has everything<br/>it needs"| Y["Complete Output"]
    end
    classDef red fill:#991b1b,stroke:#f87171,stroke-width:1px,color:#fff
    classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
    class X red
    class Y green
  • Always include complete findings from prior agents directly in the subagent's prompt
  • Use structured data formats to separate content from metadata (source URLs, document names, page numbers)
  • Parallel spawning: Emit multiple Task tool calls in a single coordinator response rather than across separate turns
  • Design coordinator prompts that specify research goals and quality criteria rather than step-by-step procedural instructions

Fork-Based Session Management:
- fork_session creates independent branches from a shared analysis baseline
- Useful for exploring divergent approaches (comparing two testing strategies from a shared codebase analysis)

1.4 Multi-Step Workflows and Enforcement Patterns

This is one of the most frequently tested concepts. Know the difference between:

flowchart TB
    subgraph PROG["Programmatic Enforcement"]
        direction TB
        P1["Hooks / prerequisite gates"]
        P2["Deterministic: 100% compliance"]
        P3["Use when errors have<br/>financial/safety consequences"]
    end
    subgraph PROMPT["Prompt-Based Guidance"]
        direction TB
        PR1["System prompt instructions"]
        PR2["Probabilistic: <100% compliance"]
        PR3["Use for soft preferences,<br/>output formatting"]
    end
    classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
    classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
    class P1,P2,P3 green
    class PR1,PR2,PR3 amber

When to use programmatic enforcement:
- Identity verification before financial operations (block process_refund until get_customer returns a verified ID)
- Any business rule where non-compliance has real-world consequences
- Tool ordering that must be guaranteed, not hoped for

Structured Handoff Protocols:
When escalating to a human agent, compile structured summaries including:
- Customer ID
- Root cause analysis
- Refund amount
- Recommended action

The human agent receiving the escalation lacks access to the conversation transcript, so everything must be in the summary.

1.5 Agent SDK Hooks

Hooks intercept and transform data at deterministic points in the agent lifecycle.

sequenceDiagram
    participant Agent as Claude Agent
    participant Hook as PreToolUse Hook
    participant Tool as MCP Tool
    participant Post as PostToolUse Hook

    Agent->>Hook: Tool call request
    Hook->>Hook: Check compliance<br/>(e.g., refund < $500?)
    alt Policy violation
        Hook-->>Agent: Block + redirect<br/>to escalation
    else Allowed
        Hook->>Tool: Execute tool
        Tool->>Post: Raw result
        Post->>Post: Normalize data<br/>(timestamps, formats)
        Post->>Agent: Clean, normalized result
    end

Two Key Hook Patterns:

  1. PostToolUse Hooks - Transform tool results before the model processes them:
    - Normalize heterogeneous data formats (Unix timestamps to ISO 8601, numeric status codes to labels)
    - Trim verbose outputs to relevant fields

  2. Tool Call Interception Hooks - Enforce business rules before tool execution:
    - Block refunds exceeding a dollar threshold
    - Redirect policy-violating actions to human escalation

Exam Tip: When the question is about "guaranteed compliance" or "deterministic enforcement," the answer is always hooks, never prompt instructions.

1.6 Task Decomposition Strategies

Strategy When to Use Example
Prompt Chaining (fixed sequential) Predictable multi-aspect reviews Analyze each file individually, then cross-file integration pass
Dynamic Decomposition (adaptive) Open-ended investigation tasks "Add comprehensive tests to a legacy codebase" - map structure first, then prioritize

Prompt chaining breaks work into sequential steps. Good for code reviews: analyze each file for local issues, then run a cross-file integration pass.

Dynamic decomposition generates subtasks based on what is discovered at each step. Good for open-ended tasks: first map the codebase structure, identify high-impact areas, then create a prioritized plan that adapts as dependencies are discovered.

1.7 Session State, Resumption, and Forking

flowchart TB
    subgraph Resume["Session Resumption"]
        R1["--resume session-name"]
        R2["Continues prior conversation"]
        R3["Use when prior context<br/>is mostly valid"]
    end
    subgraph Fork["Session Forking"]
        F1["fork_session"]
        F2["Independent branches from<br/>shared baseline"]
        F3["Use for comparing<br/>divergent approaches"]
    end
    subgraph Fresh["Fresh Start"]
        FR1["New session + summary"]
        FR2["Inject structured summary"]
        FR3["Use when prior tool<br/>results are stale"]
    end
    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
    classDef teal fill:#0e7490,stroke:#22d3ee,stroke-width:1px,color:#fff
    class R1,R2,R3 blue
    class F1,F2,F3 purple
    class FR1,FR2,FR3 teal

Key Decisions:
- Resume (--resume session-name): When prior context is mostly valid, you just need to continue
- Fork (fork_session): When you want to explore two approaches from a shared analysis baseline
- Fresh start with summary: When prior tool results are stale (code has changed since last session)
- When resuming, inform the agent about specific file changes for targeted re-analysis rather than full re-exploration

Domain 1 Practice Questions

Question 1: Production data shows that in 12% of cases, your agent skips get_customer and calls lookup_order using only the customer's stated name. What change most effectively addresses this?

A) Add a programmatic prerequisite that blocks lookup_order and process_refund calls until get_customer has returned a verified customer ID.

B) Enhance the system prompt to state that customer verification is mandatory.

C) Add few-shot examples showing the agent always calling get_customer first.

D) Implement a routing classifier that enables only appropriate tools per request type.

Answer: A - When a specific tool sequence is required for critical business logic (verifying identity before processing refunds), programmatic enforcement provides deterministic guarantees that prompt-based approaches (B, C) cannot. Option D addresses tool availability, not ordering.

Question 2: Your multi-agent research system covers only visual arts when asked about "AI on creative industries." Coordinator logs show it decomposed the topic into only "AI in digital art," "AI in graphic design," and "AI in photography." What is the root cause?

A) The synthesis agent lacks instructions for identifying coverage gaps.

B) The coordinator agent's task decomposition is too narrow, resulting in subagent assignments that don't cover all relevant domains.

C) The web search agent's queries are not comprehensive enough.

D) The document analysis agent is filtering out non-visual creative industries.

Answer: B - The coordinator's logs directly reveal it decomposed "creative industries" into only visual arts subtasks. The subagents executed correctly within their assigned scope - the problem is what they were assigned.

Question 3: The synthesis agent frequently needs to verify claims, causing 2-3 round trips per task via the coordinator. 85% of verifications are simple fact-checks. What's the most effective approach?

A) Give the synthesis agent a scoped verify_fact tool for simple lookups, while complex verifications continue delegating through the coordinator.

B) Have the synthesis agent batch all verification needs to the end.

C) Give the synthesis agent access to all web search tools.

D) Have the web search agent proactively cache extra context.

Answer: A - This applies the principle of least privilege: the synthesis agent gets only what it needs for the 85% common case. Option B creates blocking dependencies. Option C over-provisions (violating separation of concerns). Option D relies on speculative caching.

Question 4: Your agentic loop uses a counter that terminates after 5 iterations regardless of task completion. Users report tasks being cut off mid-way. What should you change?

A) Increase the iteration cap to 15.

B) Replace the iteration cap with stop_reason-based termination that continues when stop_reason is "tool_use" and terminates when it's "end_turn".

C) Add a prompt instruction telling the model to finish within 5 iterations.

D) Use max_tokens to control when the model stops.

Answer: B - The agentic loop should be driven by stop_reason, not arbitrary iteration caps. The model signals completion via "end_turn" - trust that signal instead of guessing how many iterations are enough.

Question 5: Your coordinator spawns subagents sequentially (one at a time). Research on "quantum computing applications" takes 3 minutes as each subagent waits for the previous one to complete. How can you reduce latency?

A) Give each subagent a larger context window.

B) Use a single agent instead of multiple subagents.

C) Have the coordinator emit multiple Task tool calls in a single response to spawn subagents in parallel.

D) Cache results from previous research runs.

Answer: C - Parallel subagent execution is achieved by emitting multiple Task tool calls in a single coordinator response. This is a fundamental multi-agent optimization technique.

Question 6: A customer says "I need a refund for order #4521 AND I want to change my subscription plan." How should the agent handle this?

A) Process the refund first, then address the subscription change in a follow-up.

B) Decompose the request into distinct items, investigate each using shared context, then synthesize a unified resolution.

C) Ask the customer which issue they'd like to address first.

D) Escalate to a human agent because multi-concern requests are too complex.

Answer: B - Multi-concern requests should be decomposed, investigated in parallel using shared context, and synthesized into a unified response. This is the standard pattern for handling complex customer requests.

Domain 1 Worked Example: Building a Customer Support Agent

Let's walk through a complete customer support agent design that covers most Domain 1 concepts.

Scenario: You are building an agent to handle returns, billing disputes, and account issues. The agent has four MCP tools: get_customer, lookup_order, process_refund, and escalate_to_human. The target is 80%+ first-contact resolution.

Step 1 - Design the Agentic Loop:

sequenceDiagram
    participant C as Customer
    participant A as Agent
    participant T as Tools
    participant H as Human Agent

    C->>A: "I need a refund for<br/>order #4521"
    A->>T: get_customer(email)
    T-->>A: {id: 123, verified: true}
    A->>T: lookup_order(order_id: 4521)
    T-->>A: {status: "delivered",<br/>amount: $127.50, ...}
    A->>A: Check: amount < $500?
    A->>T: process_refund(order: 4521,<br/>amount: 127.50)
    T-->>A: {status: "processed"}
    A->>C: "Refund of $127.50<br/>processed for order #4521"

Step 2 - Add Programmatic Prerequisites:

The agent must call get_customer before lookup_order or process_refund. Implement a hook that checks whether a verified customer ID exists before allowing downstream tools:

  • lookup_order is blocked until get_customer has returned a verified customer ID
  • process_refund is blocked until both get_customer AND lookup_order have completed
  • This eliminates the 12% of cases where the agent skips verification

Step 3 - Add Compliance Hooks:

A tool call interception hook blocks refunds above $500 and redirects to escalate_to_human with a structured summary:
- Customer ID
- Order details
- Refund amount (above threshold)
- Recommended action: "Human review required for refund > $500"

Step 4 - Handle Multi-Concern Requests:

When a customer says "I need a refund for order #4521 AND I want to change my subscription," the agent:
1. Decomposes the request into two distinct items
2. Investigates each using shared customer context (verified once, used for both)
3. Synthesizes a unified resolution addressing both issues

Step 5 - Define Escalation Criteria:

Add explicit criteria to the system prompt with few-shot examples:
- Escalate immediately: Customer explicitly says "let me talk to a human"
- Escalate: Policy gaps (competitor price matching when policy only covers own-site)
- Resolve autonomously: Standard returns with photo evidence, simple billing questions
- Acknowledge + offer resolution: Frustrated customer with straightforward issue (escalate only if they reiterate preference for human)

This single worked example touches Task Statements 1.1 (agentic loop), 1.4 (enforcement), 1.5 (hooks), and connects to Domain 2 (tool design) and Domain 5 (escalation).

Domain 1 Key Patterns Summary

flowchart TB
    subgraph Patterns["Domain 1 Pattern Catalog"]
        direction TB
        P1["Agentic Loop<br/>stop_reason-based control"]
        P2["Hub-and-Spoke<br/>Coordinator + subagents"]
        P3["Parallel Spawning<br/>Multiple Task calls in one turn"]
        P4["Programmatic Enforcement<br/>Hooks > prompts for compliance"]
        P5["Iterative Refinement<br/>Coordinator re-delegates until sufficient"]
        P6["Session Forking<br/>Divergent exploration from shared baseline"]
        P7["Structured Handoffs<br/>Complete context for human agents"]
    end
    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    class P1,P2,P3,P4,P5,P6,P7 blue

Domain 2: Tool Design & MCP Integration (18%)

This domain covers how you design tool interfaces, handle errors, distribute tools across agents, and integrate MCP servers.

2.1 Effective Tool Interface Design

Tool descriptions are the primary mechanism LLMs use for tool selection. This is perhaps the most practically important concept in this domain.

flowchart TB
    subgraph Bad["BAD: Minimal Descriptions"]
        B1["get_customer:<br/>'Retrieves customer info'"]
        B2["lookup_order:<br/>'Retrieves order details'"]
        B3["Result: Model confuses<br/>the two tools"]
    end
    subgraph Good["GOOD: Detailed Descriptions"]
        G1["get_customer:<br/>'Look up a customer by email or<br/>phone. Returns name, account status,<br/>subscription tier. Use for identity<br/>verification before operations.'"]
        G2["lookup_order:<br/>'Retrieve order by order number<br/>(format: #NNNNN). Returns items,<br/>status, shipping. Use when user<br/>asks about a specific order.'"]
        G3["Result: Reliable<br/>tool selection"]
    end
    classDef red fill:#991b1b,stroke:#f87171,stroke-width:1px,color:#fff
    classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
    class B1,B2,B3 red
    class G1,G2,G3 green

Best Practices for Tool Descriptions:

  1. Include input formats: What types of identifiers does the tool accept?
  2. Include example queries: "Use when the user asks about order status"
  3. Include edge cases: "Returns null if the customer has no active orders"
  4. Include boundaries: "Do NOT use this for subscription queries; use get_subscription instead"

When Tools Have Overlapping Functions:
- Rename to eliminate ambiguity (e.g., analyze_content becomes extract_web_results)
- Split generic tools into purpose-specific ones (e.g., analyze_document becomes extract_data_points, summarize_content, and verify_claim_against_source)

Exam Tip: Watch for questions where tool descriptions are "minimal" or "near-identical." The first fix is always to expand descriptions, not to add routing classifiers or consolidate tools.

2.2 Structured Error Responses

MCP tools must return structured error metadata, not generic failure messages.

flowchart TB
    ERROR["Tool Error Occurs"] --> CLASSIFY{"Error Category?"}
    CLASSIFY -->|"Transient"| TRANS["Timeout, service unavailable<br/>isRetryable: true"]
    CLASSIFY -->|"Validation"| VAL["Invalid input format<br/>isRetryable: false"]
    CLASSIFY -->|"Business"| BUS["Policy violation<br/>isRetryable: false<br/>+ customer-friendly message"]
    CLASSIFY -->|"Permission"| PERM["Insufficient access<br/>isRetryable: false"]
    TRANS --> AGENT["Agent decides:<br/>retry with backoff"]
    VAL --> AGENT2["Agent decides:<br/>fix input and retry"]
    BUS --> AGENT3["Agent decides:<br/>explain to user"]
    PERM --> AGENT4["Agent decides:<br/>escalate or inform"]

    classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    class TRANS,VAL,BUS,PERM amber
    class AGENT,AGENT2,AGENT3,AGENT4 blue

The Structured Error Response Pattern:

Every error should include:
- isError: true (MCP flag)
- errorCategory: transient, validation, business, permission
- isRetryable: boolean
- description: Human-readable explanation

Critical Distinction - Access Failures vs Empty Results:
- Access failure: The database was unreachable (error - needs retry decision)
- Valid empty result: The query ran successfully but found no matches (success - not an error)

Returning empty results as success when the tool actually failed prevents any recovery and risks incomplete outputs.

2.3 Tool Distribution Across Agents

The Key Principle: Giving an agent access to too many tools (e.g., 18 instead of 4-5) degrades tool selection reliability. Each agent should only have tools relevant to its role.

flowchart TB
    subgraph Search["Search Agent Tools"]
        T1["web_search"]
        T2["fetch_url"]
        T3["search_academic"]
    end
    subgraph Analysis["Analysis Agent Tools"]
        T4["extract_data"]
        T5["summarize_content"]
        T6["compare_sources"]
    end
    subgraph Synthesis["Synthesis Agent Tools"]
        T7["compile_report"]
        T8["verify_fact"]
        T9["format_citations"]
    end
    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
    classDef teal fill:#0e7490,stroke:#22d3ee,stroke-width:1px,color:#fff
    class T1,T2,T3 blue
    class T4,T5,T6 purple
    class T7,T8,T9 teal

tool_choice Configuration:

Setting Behavior Use Case
"auto" Model may return text instead of calling a tool Default behavior - model decides
"any" Model must call a tool but can choose which Guarantee structured output when multiple schemas exist
{"type": "tool", "name": "..."} Model must call a specific named tool Force a specific extraction to run first

Exam Tip: tool_choice: "any" guarantees the model calls a tool. Forced selection guarantees it calls a specific tool. Know when to use each.

2.4 MCP Server Integration

flowchart LR
    subgraph Project["Project Scope: .mcp.json"]
        P1["Shared team tooling"]
        P2["Version-controlled"]
        P3["Uses env var expansion:<br/>${GITHUB_TOKEN}"]
    end
    subgraph User["User Scope: ~/.claude.json"]
        U1["Personal/experimental"]
        U2["NOT shared via VCS"]
        U3["Individual credentials"]
    end
    subgraph Both["Available Simultaneously"]
        B1["All tools discovered<br/>at connection time"]
    end
    Project --> Both
    User --> Both
    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
    classDef emerald fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
    class P1,P2,P3 blue
    class U1,U2,U3 purple
    class B1 emerald

MCP Resources vs MCP Tools:
- Resources: Expose content catalogs (issue summaries, documentation hierarchies, database schemas) - reduce exploratory tool calls
- Tools: Expose actions (search, create, update, delete)

Best Practices:
- Use community MCP servers for standard integrations (Jira, GitHub) rather than building custom ones
- Reserve custom servers for team-specific workflows
- Enhance tool descriptions to prevent the agent from preferring built-in tools (like Grep) over more capable MCP tools
- Configure shared servers in .mcp.json with ${ENV_VAR} expansion for auth tokens

2.5 Built-in Tools

Tool Purpose When to Use
Grep Content search Searching file contents for patterns (function names, error messages, imports)
Glob File path pattern matching Finding files by name or extension (e.g., **/*.test.tsx)
Read Load full file contents When you need to examine a complete file
Write Create or overwrite files Creating new files or complete rewrites
Edit Targeted modifications Changing specific sections using unique text matching
Bash Shell commands Running builds, tests, git operations

When Edit Fails: If Edit cannot find unique anchor text, fall back to Read + Write.

Incremental Codebase Understanding Pattern:
1. Start with Grep to find entry points
2. Use Read to follow imports and trace flows
3. Build understanding incrementally, rather than reading all files upfront

Domain 2 Practice Questions

Question 1: Your agent frequently calls get_customer when users ask about orders. Both tools have minimal descriptions and accept similar identifier formats. What's the most effective first step?

A) Add few-shot examples demonstrating correct tool selection.

B) Expand each tool's description to include input formats, example queries, edge cases, and boundaries explaining when to use it versus similar tools.

C) Implement a routing layer that pre-selects tools based on keywords.

D) Consolidate both tools into a single lookup_entity tool.

Answer: B - Tool descriptions are the primary mechanism LLMs use for tool selection. Expanding them is the lowest-effort, highest-leverage fix. Few-shot examples (A) add token overhead without fixing the root cause. A routing layer (C) is over-engineered. Consolidating (D) requires more effort than expanding descriptions.

Question 2: The web search subagent times out. How should failure information flow back to the coordinator?

A) Return structured error context including failure type, attempted query, partial results, and alternative approaches.

B) Implement automatic retry with exponential backoff, returning "search unavailable" after retries exhaust.

C) Catch the timeout and return an empty result set marked as successful.

D) Propagate the timeout exception to a top-level handler that terminates the workflow.

Answer: A - Structured error context gives the coordinator the information needed for intelligent recovery. Option B hides context behind a generic status. Option C suppresses the error (anti-pattern). Option D terminates unnecessarily.

Question 3: Your synthesis agent has access to 18 tools including web search, database queries, and report formatting. It frequently misuses the web search tools when it should be synthesizing. What should you do?

A) Restrict the synthesis agent's tool set to only synthesis-relevant tools (compile_report, verify_fact, format_citations), routing complex lookups through the coordinator.

B) Add few-shot examples showing the synthesis agent using only synthesis tools.

C) Increase the synthesis agent's context window to handle more tools.

D) Add a pre-processing step that removes irrelevant tools from the response.

Answer: A - The principle is scoped tool access. Giving agents tools outside their specialization leads to misuse. Restrict each agent to its relevant tools (4-5 is ideal).

Question 4: A new team member's Claude Code instance doesn't have access to the project's Jira MCP server, even though the team lead configured it. Where should the configuration be?

A) In ~/.claude.json on each developer's machine.

B) In .mcp.json at the project root, committed to version control.

C) In CLAUDE.md under a tools section.

D) In a .claude/config.json file.

Answer: B - Project-level MCP servers go in .mcp.json (version-controlled, shared). ~/.claude.json (A) is user-level and not shared. CLAUDE.md (C) is for instructions, not server configuration.

Question 5: You want to force the model to call extract_metadata before any enrichment tools. How do you configure tool_choice?

A) tool_choice: "auto" with a system prompt instruction.

B) tool_choice: "any" to guarantee a tool call.

C) tool_choice: {"type": "tool", "name": "extract_metadata"} to force the specific tool, then process subsequent steps in follow-up turns.

D) List extract_metadata first in the tools array.

Answer: C - Forced tool selection ensures a specific tool is called. "auto" (A) lets the model choose freely. "any" (B) guarantees a tool call but not which one. Tool ordering in the array (D) doesn't guarantee selection order.

Domain 2 Worked Example: Designing a Research Agent's Tool Suite

Let's design the tool distribution for a multi-agent research system to illustrate all Domain 2 concepts.

The System: A coordinator delegates to three subagents: Search, Analysis, and Synthesis. Each needs carefully scoped tools.

Step 1 - Tool Design with Differentiated Descriptions:

Instead of a generic analyze_content tool shared by all agents, split into three purpose-specific tools:

Tool Agent Description
web_search Search "Search the web for articles, papers, and news. Input: query string (1-5 keywords). Output: list of URLs with titles and snippets. Use for broad discovery of sources. Does NOT fetch page content - use fetch_url for that."
fetch_url Search "Fetch and extract text content from a specific URL. Input: URL string. Returns cleaned text with metadata (title, author, date, word count). Use after web_search to get full content."
extract_data_points Analysis "Extract specific data points (statistics, dates, names, claims) from a provided text block. Input: text + list of fields to extract. Returns structured JSON with extracted values and confidence. Use for systematic fact extraction from documents."
summarize_section Analysis "Summarize a section of text into 2-3 key points. Input: text (up to 5000 tokens). Returns bullet-point summary preserving source attribution. Use for condensing verbose source material."
verify_fact Synthesis "Quick lookup to verify a specific factual claim (date, name, statistic). Input: claim string. Returns: verified/unverified/conflicting with source. Use for 85% of simple verifications without round-tripping through coordinator."
compile_report Synthesis "Compile findings into a structured report with sections, citations, and methodology notes. Input: structured findings list with sources. Returns formatted report markdown."

Notice how each tool's description includes: input format, output format, when to use it, and boundaries.

Step 2 - Error Response Design:

flowchart TB
    subgraph WebSearch["web_search Error Handling"]
        WS1["Timeout → isRetryable: true<br/>errorCategory: transient<br/>'Search service timed out after 10s'"]
        WS2["Invalid query → isRetryable: false<br/>errorCategory: validation<br/>'Query must be 1-5 keywords'"]
        WS3["No results → NOT an error<br/>Valid empty result set<br/>isError: false, results: []"]
    end
    classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
    classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
    class WS1,WS2 amber
    class WS3 green

The critical distinction: a search that returns zero results is a successful query with no matches (not an error). A search that fails because the service is down is an access failure (an error requiring recovery).

Step 3 - MCP Server Configuration:

The team's shared Jira integration goes in .mcp.json with credential expansion:

{
  "jira": {
    "command": "jira-mcp-server",
    "env": {
      "JIRA_URL": "${JIRA_URL}",
      "JIRA_TOKEN": "${JIRA_TOKEN}"
    }
  }
}

A developer's experimental paper-search MCP server goes in ~/.claude.json (personal, not shared).

Step 4 - Tool Choice Configuration:

For the extraction phase: force extract_data_points first using tool_choice: {"type": "tool", "name": "extract_data_points"}. After extraction completes, switch to tool_choice: "auto" for enrichment steps.

For the synthesis phase: use tool_choice: "any" to guarantee structured output when multiple report formats exist (the model picks the right format tool but must call one).

Domain 2 Additional Practice Questions

Question 6: Your search agent returns "search unavailable" after a timeout, but the coordinator has no information about what was searched or whether partial results exist. How should you improve the error response?

A) Add automatic retry with exponential backoff.

B) Return structured error context including: errorCategory: "transient", the attempted query, any partial results from before the timeout, and suggested alternative approaches (narrower query terms).

C) Have the search agent cache results so timeouts don't lose data.

D) Increase the timeout threshold.

Answer: B - Structured error context enables intelligent coordinator recovery. "Search unavailable" hides all the information the coordinator needs to decide whether to retry, modify the query, or proceed with partial results.

Question 7: You want to find all callers of a specific function across your codebase. Which built-in tool should you use?

A) Glob - to find files matching a naming pattern.

B) Grep - to search file contents for the function name pattern.

C) Read - to load and examine each file.

D) Bash - to run a shell search command.

Answer: B - Grep is for content search (searching file contents for patterns like function names, error messages, or import statements). Glob is for file path pattern matching. Reading all files (C) is inefficient. Bash search commands (D) should use the built-in Grep tool.

Question 8: Your agent has a community MCP server for Jira AND a custom Jira-like server your team built. The agent sometimes calls the wrong one. What's the best approach?

A) Use the community MCP server for standard Jira operations, removing the custom server. Reserve custom servers for team-specific workflows that the community server doesn't support.

B) Rename both servers to have completely different names.

C) Add a routing layer that intercepts Jira-related calls.

D) Give each agent access to only one of the two servers.

Answer: A - Prefer existing community MCP servers for standard integrations, reserving custom servers for team-specific workflows. Having two overlapping servers creates ambiguity. Remove the redundant custom server if the community one covers your needs.


Domain 3: Claude Code Configuration & Workflows (20%)

This domain covers the practical configuration and customization of Claude Code for individual developers and teams.

3.1 CLAUDE.md Configuration Hierarchy

flowchart TB
    subgraph User["User Level: ~/.claude/CLAUDE.md"]
        U1["Personal preferences"]
        U2["NOT shared via VCS"]
        U3["Applies only to this user"]
    end
    subgraph Project["Project Level: .claude/CLAUDE.md or root CLAUDE.md"]
        P1["Universal coding standards"]
        P2["Shared via version control"]
        P3["Applies to ALL team members"]
    end
    subgraph Directory["Directory Level: subdirectory CLAUDE.md"]
        D1["Package-specific rules"]
        D2["Applies to that directory"]
        D3["Overrides project-level for scope"]
    end
    User --> Project --> Directory

    classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    classDef teal fill:#0e7490,stroke:#22d3ee,stroke-width:1px,color:#fff
    class U1,U2,U3 purple
    class P1,P2,P3 blue
    class D1,D2,D3 teal

Key Configuration Options:

Level Location Shared via VCS? Use Case
User ~/.claude/CLAUDE.md No Personal preferences, output format
Project .claude/CLAUDE.md or root CLAUDE.md Yes Team coding standards, testing conventions
Directory Subdirectory CLAUDE.md Yes Package-specific rules

Modular Organization:
- @import syntax for referencing external files to keep CLAUDE.md lean
- .claude/rules/ directory for topic-specific rule files (alternative to monolithic CLAUDE.md)
- /memory command to verify which memory files are loaded

Common Exam Pitfall: A new team member not receiving instructions because they're in ~/.claude/CLAUDE.md (user-level) instead of .claude/CLAUDE.md (project-level). User-level is personal and not version-controlled.

3.2 Custom Slash Commands and Skills

flowchart LR
    subgraph Commands["Slash Commands"]
        C1[".claude/commands/<br/>Project-scoped, shared"]
        C2["~/.claude/commands/<br/>User-scoped, personal"]
    end
    subgraph Skills["Skills"]
        S1[".claude/skills/<br/>with SKILL.md frontmatter"]
        S2["context: fork<br/>Isolated sub-agent context"]
        S3["allowed-tools<br/>Restrict tool access"]
        S4["argument-hint<br/>Prompt for parameters"]
    end
    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
    class C1,C2 blue
    class S1,S2,S3,S4 purple

Commands vs Skills:
- Commands (.claude/commands/): Simple, always available, for team-wide workflows
- Skills (.claude/skills/): Richer, with frontmatter config, for task-specific workflows

Skill Frontmatter Options:
- context: fork - Run in isolated sub-agent context (prevents verbose output from polluting main conversation)
- allowed-tools - Restrict which tools the skill can use (e.g., limit to file reads to prevent destructive actions)
- argument-hint - Prompt developers for required parameters when they invoke without arguments

When to Use What:
- Skills for on-demand, task-specific workflows (codebase analysis, brainstorming)
- CLAUDE.md for always-loaded universal standards

3.3 Path-Specific Rules

Path-specific rules let you apply conventions conditionally based on which files are being edited.

How It Works:

Create files in .claude/rules/ with YAML frontmatter:

---
paths: ["**/*.test.tsx"]
---
All test files must use React Testing Library.
Never use enzyme or shallow rendering.
Use data-testid attributes for component queries.

Why This Matters:
- Rules load only when editing matching files, reducing irrelevant context and token usage
- Glob patterns work across directories - perfect for test files spread throughout a codebase
- Superior to directory-level CLAUDE.md when conventions span multiple directories

Approach Best For
.claude/rules/ with globs Conventions by file type (test files, API files, Terraform) regardless of location
Directory CLAUDE.md Conventions specific to one package or directory
Root CLAUDE.md Universal standards that always apply

3.4 Plan Mode vs Direct Execution

flowchart TB
    TASK["Assess Task"] --> COMPLEX{"Complex?<br/>Multiple valid approaches?<br/>Multi-file changes?<br/>Architectural decisions?"}
    COMPLEX -->|"Yes"| PLAN["PLAN MODE<br/>Explore, design, then execute"]
    COMPLEX -->|"No"| DIRECT["DIRECT EXECUTION<br/>Single-file, clear scope"]
    PLAN --> EXPLORE["Use Explore subagent<br/>for verbose discovery"]
    PLAN --> IMPLEMENT["Switch to direct execution<br/>for implementation"]

    classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    classDef emerald fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
    class PLAN purple
    class DIRECT emerald
    class EXPLORE,IMPLEMENT blue
Use Plan Mode Use Direct Execution
Microservice restructuring Single-file bug fix with clear stack trace
Library migration affecting 45+ files Adding a date validation conditional
Choosing between integration approaches Simple refactoring with known scope
Multi-file architectural changes Well-understood, bounded changes

The Explore Subagent:
Use the Explore subagent to isolate verbose discovery output from the main conversation context. It returns summaries, preserving your context window for actual implementation.

3.5 Iterative Refinement Techniques

Four key techniques for getting better results from Claude Code:

1. Concrete Input/Output Examples
When prose descriptions produce inconsistent results, provide 2-3 concrete examples showing the exact transformation you want.

2. Test-Driven Iteration
Write test suites first covering expected behavior, edge cases, and performance requirements. Then iterate by sharing test failures.

3. The Interview Pattern
Have Claude ask questions to surface considerations you may not have anticipated before implementing. Great for unfamiliar domains.

4. Sequential vs Parallel Issue Resolution
- Single message when fixes interact with each other (interacting problems)
- Sequential iteration when problems are independent

3.6 CI/CD Integration

flowchart LR
    subgraph CI["CI Pipeline"]
        PR["PR Created"] --> RUN["claude -p 'Review this PR'"]
        RUN --> JSON["--output-format json<br/>--json-schema schema.json"]
        JSON --> POST["Post findings as<br/>inline PR comments"]
    end
    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    class PR,RUN,JSON,POST blue

Critical CLI Flags for CI:

Flag Purpose
-p / --print Non-interactive mode - processes prompt, outputs result, exits. Required for CI.
--output-format json Machine-parseable structured output
--json-schema Enforce a specific output schema

Key Concepts:
- Session context isolation: A Claude session that generated code is less effective at reviewing its own changes. Use an independent review instance.
- Include prior review findings when re-running reviews after new commits to avoid duplicate comments
- CLAUDE.md provides project context (testing standards, fixture conventions) to CI-invoked Claude Code
- Provide existing test files in context so test generation avoids duplicating covered scenarios

Domain 3 Practice Questions

Question 1: You want to create a /review slash command available to every developer who clones the repo. Where should you create it?

A) In the .claude/commands/ directory in the project repository.

B) In ~/.claude/commands/ in each developer's home directory.

C) In the CLAUDE.md file at the project root.

D) In a .claude/config.json file with a commands array.

Answer: A - Project-scoped commands in .claude/commands/ are version-controlled and automatically available to all developers. User-level ~/.claude/commands/ (B) is personal. CLAUDE.md (C) is for instructions. config.json (D) doesn't support command definitions.

Question 2: You've been assigned to restructure a monolithic app into microservices, involving changes across dozens of files. Which approach?

A) Enter plan mode to explore the codebase, understand dependencies, and design an implementation approach before making changes.

B) Start with direct execution, letting implementation reveal natural service boundaries.

C) Use direct execution with comprehensive upfront instructions.

D) Begin in direct execution and switch to plan mode only if you encounter unexpected complexity.

Answer: A - Plan mode is designed for complex tasks with large-scale changes, multiple valid approaches, and architectural decisions. Direct execution (B) risks costly rework. Upfront instructions (C) assume knowledge you don't have. Waiting for complexity (D) ignores that complexity is already stated in the requirements.

Question 3: Test files are spread throughout the codebase (e.g., Button.test.tsx next to Button.tsx). You want all tests to follow the same conventions. What's the most maintainable approach?

A) Create rule files in .claude/rules/ with YAML frontmatter specifying glob patterns (e.g., paths: ["/.test.tsx"]).*

B) Consolidate all conventions in root CLAUDE.md under headers.

C) Create skills in .claude/skills/ for each code type.

D) Place a separate CLAUDE.md in each subdirectory.

Answer: A - Glob patterns in .claude/rules/ apply conventions by file path regardless of directory location. Essential for test files spread throughout the codebase. CLAUDE.md headers (B) rely on inference. Skills (C) require manual invocation. Subdirectory CLAUDE.md (D) can't handle files in many directories.

Question 4: Your CI pipeline script runs `claude "Analyze this PR"` but the job hangs indefinitely. What's the fix?

A) Add the -p flag: claude -p "Analyze this pull request for security issues"

B) Set the environment variable CLAUDE_HEADLESS=true.

C) Redirect stdin from /dev/null.

D) Add the --batch flag.

Answer: A - The -p (or --print) flag is the documented way to run Claude Code non-interactively. It processes the prompt, outputs the result, and exits. Options B, C, and D reference non-existent features or Unix workarounds that don't properly address Claude Code's syntax.

Question 5: A team lead's personal CLAUDE.md contains critical testing conventions, but new team members aren't getting them. What's the configuration issue?

A) The new team members need to run /memory to load the conventions.

B) The conventions are in the team lead's user-level ~/.claude/CLAUDE.md, which is not shared via version control. They should be moved to .claude/CLAUDE.md at the project root.

C) The new team members need to clone the repo again to pick up CLAUDE.md changes.

D) The CLAUDE.md file needs to be imported using @import syntax.

Answer: B - ~/.claude/CLAUDE.md is user-level and not version-controlled. Moving to .claude/CLAUDE.md (project-level) makes it available to everyone via VCS.

Question 6: You want a codebase analysis skill that produces very verbose output. How do you prevent it from polluting the main conversation?

A) Add allowed-tools: [Read, Grep, Glob] to the SKILL.md frontmatter.

B) Add context: fork to the SKILL.md frontmatter so it runs in an isolated sub-agent context.

C) Create the skill as a user-level personal variant.

D) Add a max-tokens limit in the skill configuration.

Answer: B - context: fork runs the skill in isolated sub-agent context, preventing verbose output from polluting the main conversation. allowed-tools (A) restricts which tools are available, not context isolation.

Domain 3 Worked Example: Setting Up a Monorepo with Claude Code

Let's configure Claude Code for a monorepo with frontend (React), backend (Python API), infrastructure (Terraform), and tests spread throughout.

Step 1 - Root CLAUDE.md (Universal Standards):

The root CLAUDE.md contains team-wide conventions that always apply:
- Code review checklist
- Commit message format
- Branch naming conventions
- Security requirements (no secrets in code, no raw SQL)

Step 2 - Path-Specific Rules in .claude/rules/:

Create four rule files with glob patterns:

flowchart TB
    subgraph Rules[".claude/rules/ Directory"]
        R1["testing.md<br/>paths: ['**/*.test.*', '**/*.spec.*']<br/>React Testing Library, no enzyme,<br/>data-testid attributes"]
        R2["api-conventions.md<br/>paths: ['src/api/**/*', 'src/services/**/*']<br/>async/await, specific error handling,<br/>input validation patterns"]
        R3["terraform.md<br/>paths: ['terraform/**/*', '**/*.tf']<br/>module structure, naming conventions,<br/>state management rules"]
        R4["react-components.md<br/>paths: ['src/components/**/*.tsx']<br/>Functional style with hooks,<br/>prop types, accessibility"]
    end
    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    class R1,R2,R3,R4 blue

Why glob patterns instead of directory CLAUDE.md files? Because test files (*.test.tsx) are next to the components they test, spread across dozens of directories. A glob pattern **/*.test.* catches them all regardless of location.

Step 3 - Custom Skills:

Create a codebase analysis skill with context: fork so its verbose output stays isolated:

In .claude/skills/analyze-codebase/SKILL.md:

---
context: fork
allowed-tools: [Read, Grep, Glob]
argument-hint: "Which area? (api, frontend, terraform, all)"
---
Analyze the specified area of the codebase. Report: architecture overview, dependency map, test coverage gaps, and potential issues. Output a structured summary.

The context: fork prevents 2000 lines of discovery from polluting the main conversation. allowed-tools restricts to read-only operations. argument-hint prompts the developer for which area to analyze.

Step 4 - MCP Server Integration:

.mcp.json (project-scoped, committed to VCS):

{
  "github": {
    "command": "github-mcp-server",
    "env": { "GITHUB_TOKEN": "${GITHUB_TOKEN}" }
  }
}

~/.claude.json (personal, NOT committed):

{
  "paper-search": {
    "command": "arxiv-mcp-server"
  }
}

Step 5 - Plan Mode Decision:

Task Mode Why
Fix null pointer in user service Direct execution Single file, clear stack trace
Migrate from Express to Fastify Plan mode 45+ files, multiple valid approaches
Add date validation to signup form Direct execution One function, obvious implementation
Restructure API into microservices Plan mode Architectural decisions, dependency analysis

Domain 3 Additional Practice Questions

Question 7: You use the interview pattern before implementing a caching layer. Claude asks about cache invalidation strategies, failure modes, and consistency requirements - surfacing considerations you hadn't thought about. Which iterative refinement technique is this?

A) Test-driven iteration

B) Concrete input/output examples

C) The interview pattern - having Claude ask questions to surface considerations you may not have anticipated before implementing

D) Sequential issue resolution

Answer: C - The interview pattern is specifically designed for unfamiliar domains where the developer may not know all the considerations. Claude asks probing questions to ensure the design accounts for edge cases before implementation begins.

Question 8: Your CI pipeline re-runs code review after new commits, but it keeps re-reporting issues from the previous review that were already addressed. How do you fix this?

A) Clear the review context between runs.

B) Include prior review findings in context when re-running reviews after new commits, instructing Claude to report only new or still-unaddressed issues.

C) Use a different model for each review iteration.

D) Compare the new review output against the previous one programmatically.

Answer: B - Including prior findings in context with instructions to focus on new/unaddressed issues prevents duplicate comments. This is the standard pattern for iterative CI review.

Question 9: Natural language descriptions of a code transformation keep producing inconsistent results. Sometimes Claude uppercases variables, sometimes it camelCases them. What's the most effective technique?

A) Provide 2-3 concrete input/output examples showing the exact transformation you want.

B) Write more detailed prose instructions specifying the exact naming convention.

C) Add a post-processing step to enforce naming conventions.

D) Use a linter to catch inconsistencies.

Answer: A - When prose descriptions produce inconsistent results, concrete input/output examples are the most effective technique. They show the exact transformation unambiguously, eliminating interpretation differences.


Domain 4: Prompt Engineering & Structured Output (20%)

This domain tests your ability to design precise prompts, use few-shot examples, enforce structured output via tool_use, and design batch processing strategies.

4.1 Explicit Criteria Over Vague Instructions

flowchart LR
    subgraph Vague["VAGUE (Ineffective)"]
        V1["'Be conservative'"]
        V2["'Only report high-confidence findings'"]
        V3["'Check that comments are accurate'"]
    end
    subgraph Specific["SPECIFIC (Effective)"]
        S1["'Flag comments only when<br/>claimed behavior contradicts<br/>actual code behavior'"]
        S2["'Report: bugs, security issues<br/>Skip: minor style, local patterns'"]
        S3["'Severity: critical = data loss,<br/>high = wrong output,<br/>medium = perf regression'"]
    end
    Vague -->|"Replace with"| Specific
    classDef red fill:#991b1b,stroke:#f87171,stroke-width:1px,color:#fff
    classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
    class V1,V2,V3 red
    class S1,S2,S3 green

Key Principles:
- Define what to report and what to skip using categorical criteria, not confidence-based filtering
- High false positive rates destroy developer trust across all categories, not just the noisy ones
- Temporarily disable high false-positive categories to restore trust while improving those prompts
- Define severity levels with concrete code examples for each level

4.2 Few-Shot Prompting

Few-shot examples are the most effective technique for achieving consistently formatted, actionable output.

When to use few-shot examples:
- When detailed instructions alone produce inconsistent results
- For ambiguous-case handling (tool selection, edge cases)
- For demonstrating desired output format (location, issue, severity, suggested fix)
- For reducing hallucination in extraction tasks
- For distinguishing acceptable patterns from genuine issues (reducing false positives)

Best Practices:
- Use 2-4 targeted examples focusing on ambiguous scenarios
- Show reasoning for why one action was chosen over plausible alternatives
- Include examples demonstrating correct handling of varied document structures
- Examples enable the model to generalize to novel patterns, not just match pre-specified cases

4.3 Structured Output via Tool Use

flowchart TB
    subgraph Define["Define Extraction Tool"]
        D1["Name: extract_invoice"]
        D2["Input schema: JSON Schema"]
        D3["Fields: required + optional"]
    end
    subgraph Call["API Call"]
        C1["tool_choice controls behavior"]
        C2["Model calls the tool"]
        C3["Response in tool_use block"]
    end
    subgraph Validate["Validation Layer"]
        V1["Schema: syntax guaranteed"]
        V2["Semantic: NOT guaranteed"]
        V3["line_items may not sum<br/>to total (semantic error)"]
    end
    Define --> Call --> Validate
    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
    class D1,D2,D3,C1,C2,C3 blue
    class V1,V2,V3 amber

Critical Distinction:
- Strict JSON schemas via tool_use eliminate SYNTAX errors (missing braces, trailing commas)
- They do NOT prevent SEMANTIC errors (wrong values, items that don't sum correctly)

Schema Design Best Practices:

Pattern Purpose
Make fields optional/nullable when source may lack the data Prevents model from fabricating values to satisfy required fields
Add "unclear" enum value For ambiguous cases where the answer isn't clear
Add "other" + detail string field For extensible categorization beyond predefined enums
Include format normalization rules in prompts Handle inconsistent source formatting

4.4 Validation, Retry, and Feedback Loops

flowchart TB
    EXTRACT["Extract data from document"] --> VALIDATE{"Validation<br/>passes?"}
    VALIDATE -->|"Yes"| SUCCESS["Accept extraction"]
    VALIDATE -->|"No"| ANALYZE{"Is information<br/>in the document?"}
    ANALYZE -->|"Yes: format/structural error"| RETRY["Retry with:<br/>1. Original document<br/>2. Failed extraction<br/>3. Specific validation errors"]
    ANALYZE -->|"No: info absent from source"| FAIL["Mark as unavailable<br/>(retries won't help)"]
    RETRY --> VALIDATE

    classDef emerald fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
    classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
    classDef red fill:#991b1b,stroke:#f87171,stroke-width:1px,color:#fff
    class SUCCESS emerald
    class RETRY amber
    class FAIL red

When Retries Work: Format mismatches, structural output errors, field placement errors

When Retries DON'T Work: Information exists only in an external document not provided

Self-Correction Validation Patterns:
- Extract calculated_total alongside stated_total to flag discrepancies
- Add conflict_detected booleans for inconsistent source data
- Include detected_pattern fields to enable analysis of false positive patterns when developers dismiss findings

4.5 Batch Processing Strategies

flowchart LR
    subgraph Batch["Message Batches API"]
        B1["50% cost savings"]
        B2["Up to 24-hour processing"]
        B3["No latency SLA"]
        B4["No multi-turn tool calling"]
        B5["custom_id for correlation"]
    end
    subgraph Good["GOOD Use Cases"]
        G1["Overnight technical debt reports"]
        G2["Weekly audit analysis"]
        G3["Nightly test generation"]
    end
    subgraph Bad["BAD Use Cases"]
        BD1["Blocking pre-merge checks"]
        BD2["Real-time code review"]
        BD3["Interactive feedback"]
    end
    Batch --> Good
    Batch -.->|"NOT suitable"| Bad
    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
    classDef red fill:#991b1b,stroke:#f87171,stroke-width:1px,color:#fff
    class B1,B2,B3,B4,B5 blue
    class G1,G2,G3 green
    class BD1,BD2,BD3 red

Key Facts for the Exam:
- 50% cost savings but processing up to 24 hours with no guaranteed latency SLA
- Cannot do multi-turn tool calling within a single batch request
- Use custom_id to correlate batch request/response pairs
- Failure handling: Resubmit only failed documents (identified by custom_id), possibly with modifications (chunking oversized documents)
- Prompt refinement tip: Test on a sample set before batch-processing large volumes

4.6 Multi-Instance and Multi-Pass Review

flowchart TB
    subgraph Self["Self-Review (Weak)"]
        SR1["Same session retains<br/>reasoning context"]
        SR2["Less likely to question<br/>its own decisions"]
        SR3["Even extended thinking<br/>doesn't fully help"]
    end
    subgraph Independent["Independent Review (Strong)"]
        IR1["Separate Claude instance"]
        IR2["No prior reasoning context"]
        IR3["More effective at<br/>catching subtle issues"]
    end
    subgraph MultiPass["Multi-Pass (Best for Large PRs)"]
        MP1["Per-file local analysis"]
        MP2["Cross-file integration pass"]
        MP3["Avoids attention dilution<br/>and contradictions"]
    end
    classDef red fill:#991b1b,stroke:#f87171,stroke-width:1px,color:#fff
    classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    class SR1,SR2,SR3 red
    class IR1,IR2,IR3 green
    class MP1,MP2,MP3 blue

Why Self-Review Fails:
The model retains its reasoning context from generation, making it less likely to question its own decisions in the same session.

The Multi-Pass Pattern for Large PRs:
1. Per-file local analysis: Analyze each file individually for local issues
2. Cross-file integration pass: Examine cross-file data flow, consistency
3. This avoids attention dilution (superficial coverage of some files), contradictory findings (flagging a pattern in one file while approving it in another), and missed obvious bugs

Domain 4 Practice Questions

Question 1: Your automated code review reports have a 40% false positive rate on "misleading comments" but high accuracy on bugs and security. Developers are starting to ignore ALL findings. What should you do?

A) Add "be more conservative" to the system prompt.

B) Require the model to output confidence scores and filter below 80%.

C) Temporarily disable the "misleading comments" category while improving its prompt criteria, preserving developer trust in accurate categories.

D) Reduce the number of few-shot examples to make the model less aggressive.

Answer: C - High false positive rates in one category undermine trust across all categories. Temporarily disabling the noisy category preserves trust in accurate ones while you improve the prompt for that specific category.

Question 2: Your team wants to reduce API costs. You have: (1) blocking pre-merge checks, and (2) overnight technical debt reports. Your manager proposes switching both to the Message Batches API. How should you evaluate this?

A) Use batch processing for technical debt reports only; keep real-time calls for pre-merge checks.

B) Switch both to batch processing with status polling.

C) Keep real-time calls for both to avoid batch ordering issues.

D) Switch both to batch with timeout fallback.

Answer: A - The Batch API has processing up to 24 hours with no latency SLA. Suitable for overnight reports. Unsuitable for blocking pre-merge checks where developers wait.

Question 3: A PR modifies 14 files. Single-pass review produces inconsistent results: detailed feedback for some files, superficial for others, obvious bugs missed, and contradictory feedback. How should you restructure?

A) Split into focused passes: analyze each file individually for local issues, then run a separate integration pass examining cross-file data flow.

B) Require developers to split large PRs into 3-4 file submissions.

C) Switch to a higher-tier model with a larger context window.

D) Run three independent review passes and only flag issues appearing in at least two.

Answer: A - Splitting reviews into focused passes addresses the root cause: attention dilution. Per-file analysis ensures consistent depth; a separate integration pass catches cross-file issues. Larger context windows (C) don't solve attention quality. Consensus filtering (D) would suppress intermittently caught real bugs.

Question 4: Your extraction pipeline sometimes returns fabricated values for fields when the source document doesn't contain that information. How do you prevent this?

A) Add a post-processing step that validates all values against external databases.

B) Increase the model temperature to reduce deterministic fabrication.

C) Design schema fields as optional (nullable) when source documents may not contain the information, so the model can return null instead of fabricating values.

D) Add "do not hallucinate" to the system prompt.

Answer: C - Making fields nullable gives the model a valid way to say "this information isn't in the document." Required fields pressure the model to fabricate values to satisfy the schema.

Question 5: Your extraction validation fails because dates are in inconsistent formats (MM/DD/YYYY vs DD-MM-YYYY vs "January 5th, 2024"). Retrying produces the same errors. What approach should you take?

A) Define a regex-based post-processor to normalize all date formats after extraction.

B) Include format normalization rules in prompts alongside strict output schemas, with few-shot examples showing correct extraction from varied formats.

C) Pre-process all documents to a standard date format before extraction.

D) Add multiple date fields for each format variant.

Answer: B - Format normalization rules in prompts alongside output schemas handle inconsistent source formatting at extraction time. Few-shot examples showing various date formats ensure the model generalizes. Post-processing (A) shifts the problem. Pre-processing (C) requires format detection. Multiple fields (D) creates schema bloat.

Domain 4 Worked Example: Building a Data Extraction Pipeline

Let's design a complete structured data extraction pipeline for invoices, illustrating every Domain 4 concept.

Step 1 - Define the Extraction Tool with JSON Schema:

flowchart TB
    subgraph Schema["JSON Schema Design"]
        direction TB
        S1["vendor_name: required string"]
        S2["invoice_date: required string (ISO format)"]
        S3["due_date: optional string<br/>(nullable - may not be present)"]
        S4["line_items: required array"]
        S5["stated_total: required number"]
        S6["calculated_total: required number<br/>(sum of line items for validation)"]
        S7["currency: enum ['USD','EUR','GBP','other']"]
        S8["currency_detail: optional string<br/>(when currency is 'other')"]
        S9["confidence_notes: optional string<br/>(model explains any uncertainty)"]
        S10["conflict_detected: boolean<br/>(true if stated != calculated)"]
    end
    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
    class S1,S2,S4,S5,S6,S10 blue
    class S3,S7,S8,S9 amber

Notice the schema design patterns:
- due_date is nullable because not all invoices have one - preventing fabrication
- currency uses an enum with "other" + detail string for extensible categorization
- calculated_total alongside stated_total enables semantic self-correction
- conflict_detected flags discrepancies automatically

Step 2 - Add Few-Shot Examples:

Include 3 examples showing extraction from different invoice formats:
1. A standard corporate invoice with clear line items
2. A handwritten receipt with informal amounts ("about $50")
3. A multi-page invoice with the total on page 3 but items on pages 1-2

Each example shows the expected output, including how to handle ambiguity (use confidence_notes to explain) and informal measurements (normalize to standard units).

Step 3 - Implement Validation-Retry Loop:

flowchart TB
    DOC["Invoice Document"] --> EXTRACT["Claude: extract_invoice tool<br/>tool_choice: forced"]
    EXTRACT --> VALIDATE{"Validate<br/>extraction"}
    VALIDATE -->|"Schema valid +<br/>totals match"| ACCEPT["Accept extraction"]
    VALIDATE -->|"Totals mismatch"| RETRY1["Retry with:<br/>document + extraction +<br/>'Line items sum to X but<br/>stated total is Y. Re-extract<br/>checking for missed items.'"]
    VALIDATE -->|"Required field<br/>missing"| CHECK{"Is info in<br/>the document?"}
    CHECK -->|"Yes"| RETRY2["Retry with specific<br/>field-level error message"]
    CHECK -->|"No"| MARK["Mark field as null<br/>+ log for human review"]
    RETRY1 --> VALIDATE
    RETRY2 --> VALIDATE

    classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
    classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
    classDef red fill:#991b1b,stroke:#f87171,stroke-width:1px,color:#fff
    class ACCEPT green
    class RETRY1,RETRY2 amber
    class MARK red

The key insight: append specific validation errors to the retry prompt. "Line items sum to $1,175 but stated total is $1,250. Re-extract checking for missed items or tax/shipping." This guided retry is far more effective than a generic "try again."

Step 4 - Design Batch Strategy:

Process 10,000 invoices monthly. Strategy:
1. Test on a sample of 50 first to calibrate prompts and measure extraction quality
2. Submit monthly batch via Message Batches API (50% cost savings)
3. Handle failures by custom_id: resubmit only failed invoices with modifications (chunk multi-page invoices that exceeded context limits)
4. SLA calculation: 4-hour submission windows to guarantee 28-hour SLA with 24-hour batch processing

Step 5 - Human Review Routing:

flowchart LR
    subgraph Auto["Automated (No Review)"]
        A1["High confidence across all fields"]
        A2["No conflicts detected"]
        A3["Known document type"]
    end
    subgraph Sample["Stratified Random Sample"]
        S1["Random 5% of automated"]
        S2["Detect novel error patterns"]
        S3["Measure ongoing error rates"]
    end
    subgraph Review["Human Review Queue"]
        R1["Low confidence on any field"]
        R2["Conflict detected (stated != calc)"]
        R3["Unknown document type"]
        R4["Ambiguous/contradictory source"]
    end
    classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
    class A1,A2,A3 green
    class S1,S2,S3 blue
    class R1,R2,R3,R4 amber

Before automating any extractions, analyze accuracy by document type AND by field to verify consistent performance across all segments. Never trust aggregate metrics (97% overall may hide 60% accuracy on medical invoices).

Domain 4 Additional Practice Questions

Question 6: Your extraction tool uses "other" as a catch-all enum value but doesn't capture what the "other" actually is. Downstream systems can't process these records. How should you redesign the schema?

A) Add an "other_detail" string field that is required when the enum value is "other", capturing the specific value that didn't match predefined categories.

B) Add more enum values to cover all possible cases.

C) Remove the "other" option and force the model to choose from existing enums.

D) Post-process "other" values with a separate classification model.

Answer: A - The "other" + detail string pattern is the standard approach for extensible categorization. It captures novel values while keeping the enum clean. Adding more enums (B) is a losing battle. Removing "other" (C) forces fabrication. Post-processing (D) adds unnecessary complexity.

Question 7: You need to extract data from both invoices and contracts, but the document type is unknown at submission time. How should you configure tool_choice?

A) tool_choice: "auto" so the model decides which format to use.

B) tool_choice: "any" to guarantee structured output while letting the model pick the appropriate extraction schema (extract_invoice or extract_contract).

C) Force extract_invoice first, then try extract_contract if it fails.

D) Use a separate classifier to determine document type before extraction.

Answer: B - tool_choice: "any" guarantees the model calls a tool (producing structured output) while letting it choose the appropriate schema based on document content. "auto" (A) risks returning text instead of calling a tool. Sequential forcing (C) wastes API calls. A separate classifier (D) adds latency.

Question 8: Your code review system generates detailed findings but developers consistently dismiss findings about "unused imports" while accepting findings about "null pointer risks." The dismissal pattern is making it hard to improve the system. What should you implement?

A) Stop reporting unused imports entirely.

B) Add confidence scores to each finding.

C) Add a detected_pattern field to structured findings to track which code constructs trigger findings, enabling systematic analysis of dismissal patterns to improve prompts for those categories.

D) Weight findings based on historical acceptance rates.

Answer: C - The detected_pattern field enables systematic analysis of why developers dismiss certain findings. This data drives targeted prompt improvements for specific categories. Simply removing the category (A) loses valid findings. Confidence scores (B) don't explain dismissals.


Domain 5: Context Management & Reliability (15%)

This domain covers how you preserve critical information across long interactions, handle escalation, manage errors in multi-agent systems, and maintain information provenance.

5.1 Conversation Context Preservation

flowchart TB
    subgraph Risks["Context Risks"]
        R1["Progressive summarization<br/>loses specific values:<br/>dates, amounts, percentages"]
        R2["Lost-in-the-middle:<br/>models miss findings<br/>from middle sections"]
        R3["Tool result accumulation:<br/>40+ fields when only<br/>5 are relevant"]
    end
    subgraph Solutions["Mitigation Strategies"]
        S1["Extract transactional facts<br/>into persistent 'case facts'<br/>block outside summaries"]
        S2["Place key findings at<br/>beginning of input, use<br/>explicit section headers"]
        S3["Trim verbose tool outputs<br/>to only relevant fields<br/>before they accumulate"]
    end
    R1 --> S1
    R2 --> S2
    R3 --> S3
    classDef red fill:#991b1b,stroke:#f87171,stroke-width:1px,color:#fff
    classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
    class R1,R2,R3 red
    class S1,S2,S3 green

Three Critical Risks:

  1. Progressive Summarization condenses specific values (amounts, dates, order numbers) into vague summaries. Solution: Extract transactional facts into a persistent "case facts" block included in each prompt, outside summarized history.

  2. Lost in the Middle effect means models reliably process information at the beginning and end of long inputs but may miss findings from middle sections. Solution: Place key findings summaries at the beginning and use explicit section headers.

  3. Tool Result Accumulation consumes tokens disproportionately (e.g., order lookup returns 40+ fields when only 5 matter). Solution: Trim verbose outputs to only relevant fields before they accumulate.

For multi-agent systems:
- Require subagents to include metadata (dates, source locations, methodological context) in structured outputs
- Modify upstream agents to return structured data (key facts, citations, relevance scores) instead of verbose content and reasoning chains

5.2 Escalation and Ambiguity Resolution

flowchart TB
    REQUEST["Customer Request"] --> CHECK{"Escalation Needed?"}
    CHECK -->|"Customer explicitly<br/>asks for human"| IMMEDIATE["Escalate immediately<br/>without attempting investigation"]
    CHECK -->|"Policy gap or<br/>exception needed"| ESCALATE["Escalate: policy<br/>is ambiguous or silent"]
    CHECK -->|"Straightforward issue<br/>within capability"| RESOLVE["Acknowledge frustration<br/>+ offer resolution"]
    RESOLVE --> REITERATE{"Customer reiterates<br/>human preference?"}
    REITERATE -->|"Yes"| IMMEDIATE
    REITERATE -->|"No"| DONE["Resolve autonomously"]

    classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
    classDef red fill:#991b1b,stroke:#f87171,stroke-width:1px,color:#fff
    classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
    class IMMEDIATE,ESCALATE red
    class RESOLVE amber
    class DONE green

Appropriate Escalation Triggers:
- Customer explicitly requests a human agent (honor immediately)
- Policy exceptions or gaps (policy is ambiguous or silent on the request)
- Inability to make meaningful progress

What Does NOT Work as Escalation Triggers:
- Sentiment-based escalation (sentiment doesn't correlate with case complexity)
- Self-reported confidence scores (poorly calibrated)
- Complexity heuristics (the model doesn't accurately self-assess)

Multiple Customer Matches:
When a lookup returns multiple matches, ask for additional identifiers rather than selecting based on heuristics.

5.3 Error Propagation in Multi-Agent Systems

Error Propagation Design Principles:

Pattern Description
Structured error context Include failure type, attempted query, partial results, alternative approaches
Local recovery first Subagents implement local retry for transient failures
Propagate only unresolvable errors Include what was attempted and partial results
Never suppress errors Returning empty results as success prevents recovery
Never terminate on single failure Killing the entire workflow wastes completed work

Anti-Patterns:

flowchart LR
    subgraph Anti["Anti-Patterns"]
        A1["Generic 'search unavailable'<br/>hides valuable context"]
        A2["Empty results as success<br/>prevents recovery"]
        A3["Terminate entire workflow<br/>on single failure"]
    end
    subgraph Correct["Correct Patterns"]
        C1["Structured context enables<br/>intelligent coordinator decisions"]
        C2["Access failure clearly<br/>distinguished from empty result"]
        C3["Partial results + coverage<br/>annotations in synthesis"]
    end
    Anti -->|"Replace with"| Correct
    classDef red fill:#991b1b,stroke:#f87171,stroke-width:1px,color:#fff
    classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
    class A1,A2,A3 red
    class C1,C2,C3 green

5.4 Large Codebase Exploration

Signs of Context Degradation:
- Model starts giving inconsistent answers
- References "typical patterns" rather than specific classes discovered earlier
- Loses track of findings from earlier in the session

Mitigation Strategies:

Strategy When to Use
Scratchpad files Persist key findings across context boundaries; reference in subsequent questions
Subagent delegation Isolate verbose exploration; main agent coordinates high-level understanding
Summarize between phases Inject summaries into next phase's initial context
Crash recovery manifests Each agent exports state to a known location; coordinator loads on resume
/compact Reduce context usage during extended exploration sessions

5.5 Human Review Workflows and Confidence Calibration

The Danger of Aggregate Metrics:
97% overall accuracy may mask 60% accuracy on a specific document type or field. Always segment.

Confidence Calibration Pipeline:

flowchart LR
    subgraph Extract["Extraction"]
        E1["Model outputs<br/>field-level confidence"]
    end
    subgraph Calibrate["Calibration"]
        C1["Labeled validation set"]
        C2["Set review thresholds"]
    end
    subgraph Route["Routing"]
        R1["High confidence:<br/>stratified random sample"]
        R2["Low confidence:<br/>human review queue"]
        R3["Ambiguous source:<br/>human review queue"]
    end
    Extract --> Calibrate --> Route
    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    class E1,C1,C2,R1,R2,R3 blue

Key Practices:
- Stratified random sampling of high-confidence extractions for ongoing error rate measurement
- Analyze accuracy by document type AND field before reducing human review
- Route extractions with low model confidence or ambiguous/contradictory sources to human review

5.6 Information Provenance and Multi-Source Synthesis

Source attribution gets lost during summarization when findings are compressed without preserving claim-source mappings.

Provenance Preservation Rules:

  1. Structured claim-source mappings: Require subagents to output source URLs, document names, relevant excerpts that downstream agents must preserve through synthesis
  2. Temporal data: Require publication/collection dates to prevent temporal differences from being misinterpreted as contradictions
  3. Conflicting statistics: Annotate conflicts with source attribution rather than arbitrarily selecting one value
  4. Content-type-appropriate rendering: Financial data as tables, news as prose, technical findings as structured lists

Report Structure:
- Distinguish well-established findings from contested ones
- Preserve original source characterizations and methodological context
- Include coverage annotations indicating which topic areas have gaps due to unavailable sources

Domain 5 Practice Questions

Question 1: Your support agent handles a customer with 3 issues. After resolving the first, the agent responds to the third issue using details from the first (wrong order number, wrong amount). What's the root cause and fix?

A) The agent needs a larger context window.

B) Transactional facts are being lost in progressive summarization. Extract each issue's structured data (order ID, amount, status) into a separate persistent context layer.

C) The agent should handle one issue at a time.

D) Add few-shot examples showing multi-issue handling.

Answer: B - Progressive summarization condenses specific values into vague summaries, causing cross-contamination between issues. A persistent structured context layer keeps each issue's data distinct.

Question 2: A frustrated customer says "Just let me talk to a human!" while asking about a standard refund for a damaged item. The agent can handle this easily. What should it do?

A) Process the refund immediately since it's a straightforward case.

B) Explain that the issue can be resolved faster without waiting for a human.

C) Honor the customer's explicit request for a human agent immediately without first attempting investigation.

D) Ask why the customer wants a human to determine if escalation is truly needed.

Answer: C - When a customer explicitly requests a human agent, honor that request immediately. This is a firm rule. Even if the issue is straightforward, the customer's explicit preference takes priority.

Question 3: Your research system combines findings from multiple sources. The final report says "AI adoption is at 35%" without indicating which source. One source said 35% (2022 survey, US only), another said 47% (2024 survey, global). What went wrong?

A) The synthesis agent needs better summarization instructions.

B) Source attribution was lost during synthesis. Require subagents to output structured claim-source mappings including publication dates, and require the synthesis agent to preserve and merge these mappings, annotating conflicts rather than selecting one value.

C) The search agent should only return the most recent data.

D) Add a deduplication step to remove conflicting statistics.

Answer: B - The synthesis lost provenance (source, date, scope). The correct approach preserves claim-source mappings through synthesis and annotates conflicts with both values and their sources, including temporal context.

Question 4: During a long codebase exploration session, Claude starts referencing "typical patterns" instead of specific classes it discovered 30 turns ago. What should you do?

A) Restart the conversation and begin fresh.

B) Increase the model's context window.

C) Have agents maintain scratchpad files recording key findings, and reference them for subsequent questions. Use /compact to reduce context when it fills with verbose discovery output.

D) Use a larger model with better long-context performance.

Answer: C - Context degradation in extended sessions is addressed with scratchpad files (persist findings across context boundaries) and /compact (reduce context usage). Restarting (A) loses all progress. Context window size (B, D) doesn't solve attention quality degradation.

Question 5: Your extraction system achieves 97% overall accuracy. But when you start automating high-confidence extractions, error rates spike for medical documents. What went wrong?

A) The 97% aggregate accuracy masked poor performance on specific document types. You should have analyzed accuracy by document type and field before automating, using stratified random sampling for ongoing error detection.

B) The confidence scores need recalibration with a larger validation set.

C) Medical documents should be excluded from automated extraction entirely.

D) The model needs fine-tuning on medical documents.

Answer: A - Aggregate accuracy metrics mask poor performance on specific segments. Always stratify by document type and field, and use stratified random sampling of high-confidence extractions for ongoing error monitoring.

Domain 5 Worked Example: Multi-Source Research with Provenance

Let's trace how context and provenance flow through a research system to illustrate every Domain 5 concept.

Scenario: A multi-agent system researches "global AI adoption rates." Two sources provide conflicting data: Source A says 35% (US, 2022) and Source B says 47% (global, 2024).

Step 1 - Structured Subagent Output:

Each subagent must return structured claim-source mappings, not just text summaries:

flowchart TB
    subgraph SearchOutput["Search Agent Output"]
        SO1["claim: 'AI adoption at 35%'<br/>source: 'McKinsey 2022 Report'<br/>url: 'mckinsey.com/...'<br/>scope: 'US enterprises'<br/>date: '2022-11-15'"]
        SO2["claim: 'AI adoption at 47%'<br/>source: 'Gartner 2024 Survey'<br/>url: 'gartner.com/...'<br/>scope: 'Global organizations'<br/>date: '2024-03-20'"]
    end
    subgraph AnalysisOutput["Analysis Agent Output"]
        AO1["conflict_detected: true<br/>type: 'different_scope_and_time'<br/>resolution_note: 'Values differ<br/>due to geographic scope (US vs<br/>global) and temporal gap (2 years)'"]
    end
    subgraph SynthesisInput["What Synthesis Receives"]
        SI1["Both claims with full attribution<br/>Conflict annotation<br/>Resolution guidance"]
    end
    SearchOutput --> AnalysisOutput --> SynthesisInput

    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
    classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
    class SO1,SO2 blue
    class AO1 amber
    class SI1 green

Step 2 - Context Preservation:

The coordinator maintains a "case facts" block that persists across summarization:

Case Facts Block (always included in every prompt):
- Research topic: "Global AI adoption rates"
- Source A: McKinsey 2022, 35%, US enterprises
- Source B: Gartner 2024, 47%, global organizations
- Known conflict: scope and temporal differences
- Coverage gaps: No data on Asia-Pacific, no sector breakdown

This block sits outside the summarized conversation history. Even if the conversation gets summarized, these facts persist verbatim.

Step 3 - Synthesis with Provenance:

The final report must:
- Present BOTH values with full source attribution (not pick one)
- Distinguish well-established findings from contested ones
- Note the scope difference (US vs global) and temporal gap
- Include coverage annotations ("No data was available for Asia-Pacific adoption rates")
- Render financial data as tables, news as prose, technical findings as structured lists

Wrong output: "AI adoption is at 35%." (Lost source, lost conflict)

Right output: "AI adoption rates vary by scope and timeframe: McKinsey's 2022 US enterprise survey found 35% adoption, while Gartner's 2024 global survey found 47%. The difference likely reflects both geographic scope (US vs global) and the two-year gap between studies."

Step 4 - Error Handling:

If the academic paper search subagent times out:

flowchart LR
    subgraph Error["Error Context Returned"]
        E1["failure_type: 'timeout'"]
        E2["attempted_query: 'AI adoption<br/>rates peer-reviewed 2023-2024'"]
        E3["partial_results: [2 of 5<br/>papers retrieved]"]
        E4["alternatives: ['narrow to<br/>specific sector', 'try<br/>Google Scholar instead']"]
    end
    subgraph Recovery["Coordinator Decision"]
        R1["Option A: Retry with narrower query"]
        R2["Option B: Proceed with partial results"]
        R3["Option C: Annotate report with<br/>'academic literature partially reviewed'"]
    end
    Error --> Recovery
    classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    class E1,E2,E3,E4 amber
    class R1,R2,R3 blue

The coordinator chooses option C: proceed with partial results and annotate the report's coverage gaps. The synthesis output includes: "Note: Academic literature was partially reviewed due to search service limitations. Two of five targeted papers were retrieved. Findings from peer-reviewed sources may be incomplete."

Domain 5 Additional Practice Questions

Question 6: Your support agent handles a customer with two issues. For issue #1, it correctly identifies order #4521 ($127.50). But when addressing issue #2, it mentions "$127.50" in context where the second order is $89.99. What specific pattern caused this?

A) The agent confused the two order numbers.

B) Progressive summarization compressed the two issues' details into a shared context, causing the amount from issue #1 to contaminate issue #2. The fix is a separate structured data layer for each issue.

C) The model has a recency bias toward larger numbers.

D) The tool returned incorrect data for the second order.

Answer: B - This is a classic progressive summarization failure. Specific values (amounts, order numbers) from one issue bleed into another when compressed into a shared summary. A separate context layer per issue prevents cross-contamination.

Question 7: Your agent uses a lookup_customer tool that sometimes returns 3 customers with similar names. The agent picks the one with the most recent activity. This leads to operating on the wrong customer 8% of the time. What should it do instead?

A) Add a confidence score and only proceed when confidence is above 90%.

B) Ask the customer for additional identifiers (email, phone, last 4 of account number) to disambiguate, rather than selecting based on heuristics.

C) Return all three customers and let the agent try each one.

D) Use a fuzzy matching algorithm to improve the initial match.

Answer: B - When tool results return multiple matches, the agent should ask for additional identifiers. Heuristic selection (most recent activity) creates a systematic error rate. Only the customer knows which account is theirs.

Question 8: Your coordinator agent crashes mid-research. When restarted, it has no memory of what three subagents already discovered. How should you design crash recovery?

A) Store all context in the coordinator's conversation history.

B) Have subagents write results to a shared database.

C) Design structured state persistence: each agent exports its state (findings, partial results, progress) to a known location. The coordinator loads a manifest on resume and injects the recovered state into agent prompts.

D) Use session resumption to continue from the crash point.

Answer: C - Structured agent state exports with a manifest enable the coordinator to recover from crashes. Each agent saves its state to a known location. On resume, the coordinator loads the manifest and injects state into agent prompts. Session resumption (D) may have stale or missing tool results.

Question 9: Your synthesis agent receives findings from 5 subagents, but the final report only includes findings from subagents 1, 2, and 5. Subagents 3 and 4 each contributed valid findings. What's the most likely cause?

A) Subagents 3 and 4 returned errors that were silently suppressed.

B) The "lost-in-the-middle" effect caused findings from the middle of the aggregated input to be missed. The fix is to place key findings summaries at the beginning of the input and use explicit section headers.

C) The synthesis agent has a token limit that truncated the input.

D) Subagents 3 and 4 returned findings in a different format.

Answer: B - The lost-in-the-middle effect means models reliably process information at the beginning and end of long inputs but may miss middle sections. Placing summaries at the start and using explicit section headers mitigates this.


Technologies and Concepts Reference

This section provides a quick-reference summary of every technology and concept that may appear on the exam.

Claude Agent SDK

mindmap
  root((Agent SDK))
    Agent Definitions
      System prompts
      Tool restrictions
      allowedTools config
    Agentic Loops
      stop_reason handling
      tool_use vs end_turn
      Conversation history
    Hooks
      PostToolUse
      Tool call interception
      Data normalization
    Subagent Spawning
      Task tool
      Parallel execution
      Context passing
Concept Key Detail
stop_reason: "tool_use" Continue the loop, execute the requested tool
stop_reason: "end_turn" Terminate the loop, present the response
PostToolUse hook Transform tool results before model processes them
Tool call interception Block policy-violating actions before execution
allowedTools Must include "Task" for coordinator to spawn subagents
AgentDefinition Config for each subagent: description, system prompt, tools

Model Context Protocol (MCP)

Concept Key Detail
MCP servers Provide tools and resources to agents
MCP tools Actions: search, create, update, delete
MCP resources Content catalogs: issue summaries, schemas, doc hierarchies
isError flag Signal tool failure back to the agent
.mcp.json Project-level server config (shared, version-controlled)
~/.claude.json User-level server config (personal, not shared)
${ENV_VAR} expansion Credential management without committing secrets

Claude Code

Concept Key Detail
CLAUDE.md hierarchy User (~/.claude/) < Project (.claude/ or root) < Directory
.claude/rules/ Topic-specific rules with YAML frontmatter glob patterns
.claude/commands/ Project-scoped slash commands (shared via VCS)
~/.claude/commands/ User-scoped personal commands
.claude/skills/ Skills with SKILL.md frontmatter
context: fork Run skill in isolated sub-agent context
allowed-tools Restrict tool access during skill execution
argument-hint Prompt for required parameters
Plan mode Complex tasks, architectural decisions, multi-file changes
Direct execution Simple, well-scoped changes
Explore subagent Isolate verbose discovery output
/memory Verify loaded memory files
/compact Reduce context usage in extended sessions
--resume Continue a named session
fork_session Branch from shared analysis baseline

Claude Code CLI (for CI/CD)

Flag Purpose
-p / --print Non-interactive mode (required for CI)
--output-format json Machine-parseable structured output
--json-schema Enforce specific output schema

Claude API

Concept Key Detail
tool_use with JSON schemas Guaranteed schema-compliant structured output
tool_choice: "auto" Model may return text instead of calling a tool
tool_choice: "any" Model must call a tool (any available tool)
Forced tool selection {"type": "tool", "name": "..."} - specific tool required
stop_reason values "tool_use", "end_turn"
max_tokens Limit response length
System prompts Provide instructions and context

Message Batches API

Concept Key Detail
Cost savings 50% reduction
Processing window Up to 24 hours
Latency SLA None guaranteed
custom_id Correlate request/response pairs
Multi-turn tool calling NOT supported within a single batch request
Best for Non-blocking, latency-tolerant workloads

JSON Schema Patterns

Pattern Purpose
Required vs optional fields Prevent fabrication of missing data
Nullable fields Allow model to return null for absent information
Enum types Constrain categorical values
"other" + detail string Extensible categorization
"unclear" enum value Handle ambiguous cases
Strict mode Eliminates syntax errors (not semantic errors)

Exam Scenarios Deep Dive

Scenario 1: Customer Support Resolution Agent

Setup: You're building a customer support agent using the Claude Agent SDK. It handles returns, billing disputes, and account issues with MCP tools: get_customer, lookup_order, process_refund, escalate_to_human. Target: 80%+ first-contact resolution.

What to Know:
- Programmatic prerequisites for tool ordering (verify customer before refund)
- Structured error responses from MCP tools
- Escalation criteria (explicit customer request, policy gaps, inability to progress)
- Multi-concern request decomposition
- Handoff summaries for human agents
- Hook-based compliance enforcement (block refunds above threshold)

Scenario 2: Code Generation with Claude Code

Setup: Using Claude Code for code generation, refactoring, debugging, documentation. Custom slash commands, CLAUDE.md, plan mode decisions.

What to Know:
- CLAUDE.md configuration hierarchy
- .claude/commands/ vs ~/.claude/commands/
- .claude/rules/ with glob patterns
- Skills with context: fork and allowed-tools
- Plan mode vs direct execution decision criteria
- Iterative refinement techniques

Scenario 3: Multi-Agent Research System

Setup: Coordinator delegates to search, analysis, synthesis, and report subagents. Produces comprehensive cited reports.

What to Know:
- Hub-and-spoke coordinator architecture
- Task decomposition pitfalls (overly narrow coverage)
- Parallel subagent execution
- Context passing (explicit, structured, with metadata)
- Error propagation (structured context, not generic messages)
- Source attribution preservation through synthesis

Scenario 4: Developer Productivity

Setup: Agent helps engineers explore codebases, understand legacy systems, generate boilerplate. Uses built-in tools and MCP servers.

What to Know:
- Built-in tool selection (Grep vs Glob vs Read vs Edit vs Bash)
- Incremental codebase understanding pattern
- MCP server integration (project vs user scope)
- Tool description quality for reliable selection
- Scoped tool access per agent role

Scenario 5: Claude Code for CI

Setup: Automated code reviews, test generation, PR feedback in CI/CD pipeline.

What to Know:
- -p flag for non-interactive mode
- --output-format json and --json-schema for structured CI output
- Session context isolation (independent reviewer vs self-review)
- Batch API appropriateness (overnight reports vs blocking checks)
- CLAUDE.md for providing test standards to CI

Scenario 6: Structured Data Extraction

Setup: Extract information from unstructured documents, validate with JSON schemas, handle edge cases.

What to Know:
- tool_use with JSON schemas for structured output
- tool_choice configuration options
- Schema design (nullable fields, enums with "other" + detail)
- Validation-retry loops (when retries help vs when they don't)
- Batch processing strategies
- Human review workflows and confidence calibration


Preparation Exercises

Exercise 1: Build a Multi-Tool Agent with Escalation Logic

Objective: Practice agentic loops, tool integration, structured errors, and escalation.

Steps:
1. Define 3-4 MCP tools with detailed, differentiated descriptions. Include two with similar functionality requiring careful description.
2. Implement an agentic loop checking stop_reason for "tool_use" vs "end_turn".
3. Add structured error responses: errorCategory, isRetryable, human-readable descriptions.
4. Implement a hook that intercepts tool calls to enforce a business rule.
5. Test with multi-concern messages.

Domains covered: D1, D2, D5

Exercise 2: Configure Claude Code for a Team

Objective: Practice CLAUDE.md hierarchy, custom commands, path rules, MCP integration.

Steps:
1. Create project-level CLAUDE.md with universal standards.
2. Create .claude/rules/ files with glob patterns for different code areas.
3. Create a skill with context: fork and allowed-tools.
4. Configure MCP server in .mcp.json with env var expansion.
5. Test plan mode vs direct execution on tasks of varying complexity.

Domains covered: D3, D2

Exercise 3: Build a Structured Data Extraction Pipeline

Objective: Practice JSON schemas, tool_use, validation-retry, batch processing.

Steps:
1. Define extraction tool with required/optional/nullable fields and enum patterns.
2. Implement validation-retry loop with error feedback.
3. Add few-shot examples for varied document formats.
4. Design batch processing with Message Batches API, handle failures by custom_id.
5. Implement human review routing with field-level confidence scores.

Domains covered: D4, D5

Exercise 4: Design a Multi-Agent Research Pipeline

Objective: Practice subagent orchestration, context passing, error propagation, provenance.

Steps:
1. Build coordinator with at least two subagents, allowedTools including "Task".
2. Implement parallel subagent execution.
3. Design structured output separating content from metadata.
4. Simulate subagent timeout and verify structured error propagation.
5. Test with conflicting source data and verify provenance preservation.

Domains covered: D1, D2, D5


Comprehensive Question Bank

This section contains additional practice questions across all domains to help you prepare thoroughly.

Cross-Domain Questions

Question 1 (D1+D2): Your coordinator agent always routes every query through all four subagents (search, analysis, synthesis, report), even for simple factual questions that only need a quick search. How should you fix this?

A) Design the coordinator to analyze query requirements and dynamically select which subagents to invoke rather than always routing through the full pipeline.

B) Add a complexity classifier that categorizes queries before the coordinator sees them.

C) Give the coordinator a "quick_answer" tool for simple queries.

D) Reduce the number of subagents to two.

Answer: A - The coordinator should dynamically select which subagents to invoke based on query complexity. This is a core principle of coordinator design - not every query needs the full pipeline.

Question 2 (D3+D4): Your CI pipeline runs Claude Code to generate tests, but it keeps suggesting tests that duplicate existing coverage. What's the most effective fix?

A) Add "do not duplicate existing tests" to the prompt.

B) Post-process generated tests to filter duplicates.

C) Provide existing test files in context so test generation avoids suggesting scenarios already covered by the test suite, and document testing standards and available fixtures in CLAUDE.md.

D) Use a separate model to filter duplicate test suggestions.

Answer: C - Providing existing tests in context lets Claude see what's covered. CLAUDE.md documentation of testing standards and fixtures improves quality. Simply saying "don't duplicate" (A) is vague. Post-processing (B) wastes API calls.

Question 3 (D1+D5): Your agent uses a scratchpad file but still loses track of findings after 50+ turns. The scratchpad has grown to 200 lines of unstructured notes. What should you change?

A) Increase the context window.

B) Summarize key findings from one exploration phase before spawning subagents for the next phase, injecting structured summaries into initial context rather than relying on a single growing scratchpad.

C) Use /compact more frequently.

D) Start a new session every 25 turns.

Answer: B - Structured phase summaries injected into new context is more effective than an ever-growing unstructured scratchpad. The pattern: summarize findings, inject into next phase's context, delegate to subagents for the next phase.

Question 4 (D2+D5): Your MCP tool returns 40+ fields per order lookup, but the agent only needs 5 fields for the current task. Context is filling up with irrelevant data. What should you do?

A) Ask the backend team to create a slim API endpoint.

B) Use a larger context model.

C) Implement a PostToolUse hook that trims verbose tool outputs to only relevant fields before they accumulate in context.

D) Add a system prompt instruction to ignore irrelevant fields.

Answer: C - A PostToolUse hook deterministically trims tool output before the model processes it, preventing context bloat. This is exactly what hooks are designed for - transforming tool results before they enter the conversation.

Question 5 (D4+D5): Your extraction pipeline extracts "total: $1,250" but the line items sum to $1,175. The JSON schema validation passes because both are valid numbers. How do you catch this?

A) Add strict mode to the JSON schema.

B) Implement a post-processing calculator.

C) Design a self-correction validation flow: extract "calculated_total" alongside "stated_total" to flag discrepancies, adding a "conflict_detected" boolean for inconsistent source data.

D) Add a "verify totals" instruction to the prompt.

Answer: C - Schema validation catches syntax errors, not semantic errors. Having the model extract both calculated and stated totals with a conflict flag enables automatic detection of semantic mismatches.

Question 6 (D1+D3): You want to compare two different refactoring approaches before committing to one. You've already done significant codebase analysis. What's the best approach?

A) Start two separate sessions, re-doing the codebase analysis in each.

B) Use fork_session to create two independent branches from the shared analysis baseline to explore each approach.

C) Try approach A first, then undo all changes and try approach B.

D) Ask Claude to describe both approaches without implementing either.

Answer: B - fork_session creates independent branches from a shared analysis baseline. This avoids re-doing the analysis (A) and avoids the risk of incomplete undo (C) while exploring both approaches with full implementation.

Question 7 (D2): You have two MCP tools: analyze_content ("Analyzes content") and analyze_document ("Analyzes documents"). The agent frequently picks the wrong one. After expanding descriptions, some misrouting still occurs because the system prompt says "always analyze content from web sources first." What's the issue?

A) The tool names are too similar and should be renamed.

B) The descriptions need more examples.

C) The system prompt contains keyword-sensitive instructions ("analyze content") that create an unintended association with the analyze_content tool, overriding well-written descriptions. Review and rephrase the system prompt.

D) Add tool_choice forced selection.

Answer: C - System prompt wording can create unintended tool associations through keywords. "Always analyze content from web sources" biases toward the analyze_content tool. Rephrasing to avoid tool-name keywords fixes this.

Question 8 (D4): Your extraction tool handles invoices well but produces empty results for contracts. Adding "extract data from contracts" to the instructions doesn't help. What should you try?

A) Fine-tune the model on contract examples.

B) Create a separate tool for contracts.

C) Add 2-4 few-shot examples showing correct extraction from contracts with their specific format variations (inline clauses vs separate schedules, varied terminology).

D) Increase temperature to encourage more diverse outputs.

Answer: C - Few-shot examples demonstrating extraction from varied document structures (contracts vs invoices) is the most effective technique for this class of problem. The model needs to see what successful contract extraction looks like.

Question 9 (D3): A developer runs a codebase analysis skill that floods the conversation with 2000 lines of discovery output. Subsequent questions get confused responses. What configuration prevents this?

A) Add context: fork to the skill's SKILL.md frontmatter so it runs in an isolated sub-agent context, returning only the summary to the main conversation.

B) Add a max-output-length parameter to the skill.

C) Use /compact after running the skill.

D) Break the skill into smaller sub-skills.

Answer: A - context: fork isolates the skill in a sub-agent context. The verbose discovery output stays in the fork; only the summary returns to the main conversation.

Question 10 (D5): Your multi-agent system produces a report where Finding #3 says "According to a 2023 study..." but there's no source citation. You trace it back to the synthesis agent, which received the finding with full attribution from the analysis agent. What happened?

A) The analysis agent's attribution format is inconsistent.

B) The synthesis agent compressed findings during summarization without preserving claim-source mappings. Require the synthesis agent to preserve and merge structured claim-source mappings when combining findings.

C) The report template strips citations.

D) The source URL was broken.

Answer: B - Source attribution is lost during summarization when findings are compressed without preserving claim-source mappings. The fix is requiring the synthesis agent to preserve mappings through the synthesis process.


Out-of-Scope Topics (What NOT to Study)

These topics will NOT appear on the exam. Save your study time:

  • Fine-tuning Claude models or training custom models
  • Claude API authentication, billing, or account management
  • Detailed implementation of specific programming languages or frameworks
  • Deploying or hosting MCP servers (infrastructure, networking, containers)
  • Claude's internal architecture, training process, or model weights
  • Constitutional AI, RLHF, or safety training methodologies
  • Embedding models or vector database implementation details
  • Computer use (browser automation, desktop interaction)
  • Vision/image analysis capabilities
  • Streaming API implementation or server-sent events
  • Rate limiting, quotas, or API pricing calculations
  • OAuth, API key rotation, or authentication protocol details
  • Specific cloud provider configurations (AWS, GCP, Azure)
  • Performance benchmarking or model comparison metrics
  • Prompt caching implementation details (beyond knowing it exists)
  • Token counting algorithms or tokenization specifics

Exam Day Strategy

Reading the Question

  1. Read the scenario context carefully - It tells you which systems exist and what tools are available
  2. Identify the root cause in the question stem - Many questions describe a problem; find what's actually wrong before looking at answers
  3. Eliminate "over-engineered" options - If a simpler fix addresses the root cause, the complex option is wrong
  4. Watch for the "first step" qualifier - When asked for the "most effective first step," pick the lowest-effort, highest-leverage fix

Common Distractor Patterns

Distractor Type Example Why It's Wrong
Over-engineered "Deploy a classifier model" when prompt optimization hasn't been tried Jumps to infrastructure before simpler fixes
Solves wrong problem "Sentiment analysis" when the issue is case complexity Addresses a correlated but incorrect variable
Probabilistic where deterministic needed "Enhance system prompt" when financial operations need guaranteed ordering Prompt instructions have non-zero failure rate
Non-existent feature CLAUDE_HEADLESS=true environment variable References features that don't exist in Claude Code

Time Management

  • Don't spend too long on any single question - guess and move on
  • No penalty for guessing, so never leave a question blank
  • Mark difficult questions for review if the exam interface allows it

Final Pre-Exam Checklist


Quick Reference Cheat Sheet

Decision Matrix: Which Approach to Use?

flowchart TB
    Q1{"Need guaranteed<br/>tool ordering?"}
    Q1 -->|"Yes"| HOOK["Use programmatic<br/>hooks/prerequisites"]
    Q1 -->|"No"| Q2{"Need consistent<br/>output format?"}
    Q2 -->|"Yes"| FEW["Use few-shot<br/>examples"]
    Q2 -->|"No"| Q3{"Need structured<br/>data output?"}
    Q3 -->|"Yes"| TOOL["Use tool_use<br/>with JSON schema"]
    Q3 -->|"No"| Q4{"Complex multi-file<br/>task?"}
    Q4 -->|"Yes"| PLAN["Use plan mode"]
    Q4 -->|"No"| DIRECT["Use direct execution"]

    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    classDef green fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
    class HOOK,FEW,TOOL,PLAN,DIRECT green
    class Q1,Q2,Q3,Q4 blue

The "First Fix" Priority

When the exam asks for the "most effective" or "first step," default to this priority:

  1. Fix descriptions (tool descriptions, prompt criteria) - Lowest effort, highest leverage
  2. Add few-shot examples - Address ambiguity and inconsistency
  3. Add programmatic enforcement (hooks, prerequisites) - When compliance must be guaranteed
  4. Restructure architecture (split tools, add subagents) - When the design is fundamentally wrong

Key Numbers to Remember

Metric Value
Passing score 720 / 1000
Scenarios per exam 4 (from pool of 6)
Ideal tools per agent 4-5 (not 18)
Batch API cost savings 50%
Batch API processing window Up to 24 hours
Target experience level 6+ months with Claude
Few-shot examples 2-4 targeted examples
Highest-weighted domain D1: Agentic Architecture (27%)

You've reached the end of this comprehensive exam prep guide. Check your progress bar at the top - aim for 100% before sitting for the exam. Remember: the CCAF tests practical judgment about trade-offs, not memorization. Build real agents, configure real Claude Code projects, and extract real structured data to internalize these patterns.

Good luck on your certification.

Sign in to save and react.
Share Copied