The Anthropic Platform Stack: An Architect's Guide to Building on Claude

In January 2024, Anthropic was a model company. You called the Messages API, got a completion, and handled everything else yourself. Eighteen months later, enterprise contracts account for roughly 80% of Anthropic's revenue, the Model Context Protocol has 97 million monthly SDK downloads, and Claude Code holds the highest publicly reported score on SWE-bench Verified. The company ships an integration protocol, a managed agent runtime, a compliance API with 28 security vendor connections, and department-specific automation plugins.

If you are still thinking of Anthropic as "the company that makes Claude," you are working with an incomplete map.

Why this matters: The platform capabilities that surround a foundation model now determine more architectural decisions than the model itself. Choosing Claude is no longer just a model selection; it is a platform commitment with implications for integration patterns, governance posture, agent runtime, and cost structure. Architects who evaluate only the inference API miss the components that will shape 80% of their implementation work.

TL;DR

Anthropic offers three model tiers (Opus, Sonnet, Haiku) spanning a 5x cost range, all sharing a common API surface with tool use, structured output, and streaming.
The Model Context Protocol (MCP) has become the de facto standard for connecting AI agents to enterprise systems, with 41% of surveyed software organizations running MCP servers in production.
Prompt caching reduces input token costs by 90% and latency by up to 85%, making it the single highest-leverage optimization for production deployments.
Claude Code (80.8% SWE-bench Verified) and Claude Cowork represent two distinct agent surfaces: one for developers, one for business users, both extensible through the same MCP layer.
The Claude Platform on AWS (launched May 2026) allows enterprises to run the full Anthropic stack using IAM credentials within their own security perimeter.
Managed Agents operate in self-hosted sandboxes, keeping tool execution inside customer infrastructure while Anthropic handles orchestration.
The Compliance API and 28 security integrations span DLP, SASE, SIEM, identity management, and eDiscovery, designed for regulated environments with SOC 2 Type 2, ISO 27001, and HIPAA certifications.
Enterprise AI architecture is converging on a pattern where models are interchangeable, but integration layers and governance controls are not.

At a Glance

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1e40af', 'primaryTextColor': '#fff', 'primaryBorderColor': '#60a5fa', 'lineColor': '#94a3b8', 'textColor': '#e2e8f0', 'clusterBkg': '#1e293b', 'clusterBorder': '#334155', 'fontSize': '16px'}}}%%
flowchart TB
    subgraph Models["Foundation Models"]
        O["Opus"] --- So["Sonnet"] --- H["Haiku"]
    end

    subgraph APIs["API Layer"]
        M["Messages API"] --- B["Batch API"] --- S["Streaming"]
    end

    subgraph Agent["Agent Capabilities"]
        TU["Tool Use"] --- CE["Code Execution"] --- FA["Files API"] --- CU["Computer Use"]
    end

    subgraph Integration["Integration Layer"]
        MCP["MCP Protocol"] --- MC["MCP Connectors"] --- PL["Plugins"]
    end

    subgraph Products["Product Surfaces"]
        CC["Claude Code"] --- CW["Cowork"] --- CI["Claude.ai"]
    end

    subgraph Gov["Governance"]
        SSO["SSO/SAML"] --- COMP["Compliance API"] --- AUD["Audit Controls"]
    end

    Models --> APIs
    APIs --> Agent
    Agent --> Integration
    Integration --> Products
    Gov --- APIs
    Gov --- Products

    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
    classDef teal fill:#0e7490,stroke:#22d3ee,stroke-width:1px,color:#fff
    classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
    classDef emerald fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
    classDef rose fill:#be123c,stroke:#fb7185,stroke-width:1px,color:#fff
    classDef slate fill:#334155,stroke:#64748b,stroke-width:1px,color:#e2e8f0
    class O,So,H blue
    class M,B,S purple
    class TU,CE,FA,CU teal
    class MCP,MC,PL amber
    class CC,CW,CI emerald
    class SSO,COMP,AUD rose

This is the platform as it exists in mid-2026. Six layers, each with distinct architectural implications. The rest of this article works through them from bottom to top, then maps the deployment patterns that connect them.

Before the Platform

The history of Anthropic's product strategy can be read as three distinct eras, each defined by what the company shipped alongside the model.

%%{init: {'theme': 'base', 'themeVariables': {'cScale0': '#1e40af', 'cScale1': '#6d28d9', 'cScale2': '#b45309', 'cScale3': '#be123c', 'cScale4': '#047857', 'cScale5': '#0e7490', 'cScaleLabel0': '#e2e8f0', 'cScaleLabel1': '#e2e8f0', 'cScaleLabel2': '#e2e8f0', 'cScaleLabel3': '#e2e8f0', 'cScaleLabel4': '#e2e8f0', 'cScaleLabel5': '#e2e8f0', 'textColor': '#e2e8f0', 'lineColor': '#94a3b8', 'fontSize': '16px'}}}%%
timeline
    title Anthropic Platform Evolution
    2023 : Claude 1, 2 launched
         : Messages API only
         : No tool use
    2024 : Claude 3 family ships
         : Tool use, vision added
         : MCP announced Nov 2024
    2025 : Prompt caching, Batch API
         : Claude Code public beta
         : MCP donated to Linux Foundation
    2026 : Platform on AWS
         : Managed Agents, Cowork plugins
         : Compliance API, 28 security integrations
         : 1M-token context windows

In 2023, Anthropic was a pure inference provider. You sent text, received text. The company differentiated on safety and instruction-following quality, but the API surface was minimal. Integration work fell entirely on the customer.

The inflection came in late 2024. Three capabilities shipped in rapid succession: tool use (function calling), vision, and the Model Context Protocol. Tool use transformed Claude from a text generator into something that could act on external systems. MCP proposed a standard for how those connections should be structured. The combination turned "build a chatbot" into "build an agent," but the infrastructure to support agents at enterprise scale did not yet exist.

2025 filled the gaps. Prompt caching made long-context agent workflows economically viable (90% reduction in cached input costs). The Batch API introduced asynchronous processing at half the standard price. Claude Code entered public beta, demonstrating what a fully autonomous developer agent could look like. And in December 2025, Anthropic donated MCP to the Agentic AI Foundation under the Linux Foundation, signaling that the protocol would be governed as an open standard rather than a proprietary advantage.

By mid-2026, the platform includes model hosting, managed agent runtimes, self-hosted execution sandboxes, a compliance API, 28 security vendor integrations, and deployment directly into AWS infrastructure via IAM. The architectural decision is no longer "which model" but "how much of Anthropic's platform do we adopt, and where do we draw the boundary?"

[IMAGE: Timeline infographic showing Anthropic's evolution from inference-only API (2023) to full platform (2026), with key capability additions annotated at each stage, styled as a horizontal swim-lane diagram]

How the Stack Actually Works

The Model Layer: Three Tiers, One API

Every Claude model shares the same Messages API contract. The same tool definitions, system prompts, and structured output schemas work across all three tiers. This uniformity is an architectural feature, not an accident: it lets teams route requests to different models based on complexity, cost, or latency requirements without changing application code.

Opus (currently 4.7/4.8) targets problems requiring deep reasoning, sustained attention across long documents, and complex multi-step workflows. It supports a 1M-token context window and can produce up to 128K tokens in a single response. At $5/$25 per million input/output tokens, it is the premium tier.

Sonnet (4.6) occupies the middle ground that handles most production workloads. Its 1M-token context window matches Opus, with output capped at 64K tokens. At $3/$15 per million tokens, it delivers roughly 85-90% of Opus-level quality at 60% of the cost. Sonnet 4.6 was the first Sonnet generation preferred over the previous Opus in coding evaluations, a notable milestone.

Haiku (4.5) is the high-throughput, low-cost tier at $1/$5 per million tokens, with a 200K-token context window. It handles classification, entity extraction, content moderation, simple summarization, and routing decisions where speed matters more than depth.

[IMAGE: Decision matrix diagram showing model tier selection criteria across four axes: reasoning depth, latency sensitivity, cost per query, and context length requirement, with shaded regions indicating optimal tier for each combination]

The API Surface: Messages, Batch, and Streaming

The Messages API is the primary inference endpoint. A single request can include system prompts, multi-turn conversation history, tool definitions, images, and documents. The response can be a direct completion, a tool-use request (asking the application to execute a function and return the result), or a structured JSON output conforming to a provided schema.

The Batch API accepts up to thousands of requests as a single submission and processes them asynchronously within 24 hours at a flat 50% discount on all token prices. Batch processing combines with prompt caching: a pipeline that processes 1,000 documents against the same system prompt can cache that prompt once and process each document at the cached rate, stacking a 90% input cost reduction on top of the 50% batch discount. In practice, this can reduce per-document costs by 95% compared to naive real-time processing.

Streaming delivers tokens as they are generated, using server-sent events. For agent workflows, streaming provides progress visibility; for user-facing applications, it eliminates the perception of latency that comes from waiting for a complete response.

[IMAGE: Cost waterfall diagram showing how a $100 baseline API spend reduces through model tier selection, prompt caching, and batch processing, with final effective cost annotated at each stage]

Context Engineering: The Skill That Replaced Prompt Engineering

The shift from 4K-token context windows in 2022 to 1M-token windows in 2026 changed the fundamental constraint. The bottleneck is no longer "how do I fit my information into the context?" but "how do I structure a large context so the model uses it effectively?"

Context engineering encompasses several techniques:

Prompt caching is the highest-leverage optimization. When the first portion of a request (system prompt, tool definitions, reference documents) is identical across calls, Anthropic caches the key-value pairs from that prefix. Subsequent requests that share the prefix pay only 10% of the standard input cost for the cached portion, with time-to-first-token reduced by 50-85%. The cache has a 5-minute TTL for standard caching and a 1-hour option at 2x the write cost. In production workloads with consistent traffic, cache hit rates of 80-95% are typical for stable system prompts.

Structured context placement matters because retrieval quality degrades in the middle of very long contexts (a phenomenon documented across multiple model families). Placing the most critical information at the beginning and end of the context window, with supporting material in the middle, produces measurably better results.

Tool definitions as context is an underappreciated pattern. Each tool definition consumes tokens from the context budget. A system with 50 tools, each with detailed parameter schemas and descriptions, may consume 10,000-15,000 tokens before any user content appears. Architects must budget for this and consider dynamic tool loading (providing only the tools relevant to the current task) for tool-heavy applications.

%%{init: {'theme': 'base', 'themeVariables': {'actorBkg': '#1e40af', 'actorTextColor': '#fff', 'actorBorder': '#3b82f6', 'signalColor': '#94a3b8', 'signalTextColor': '#e2e8f0', 'labelBoxBkgColor': '#1e293b', 'labelBoxBorderColor': '#334155', 'labelTextColor': '#e2e8f0', 'loopTextColor': '#e2e8f0', 'noteBkgColor': '#1e293b', 'noteTextColor': '#e2e8f0', 'noteBorderColor': '#475569', 'activationBorderColor': '#3b82f6', 'activationBkgColor': '#1e3a5f', 'fontSize': '16px'}}}%%
sequenceDiagram
    participant App as Application
    participant Cache as Prompt Cache
    participant Claude as Claude API

    App->>Claude: Request 1: System prompt + tools + query
    Note over Cache: Cache WRITE<br/>1.25x input cost
    Claude-->>App: Response 1

    App->>Claude: Request 2: Same prefix + new query
    Note over Cache: Cache HIT<br/>0.1x input cost
    Claude-->>App: Response 2 (faster TTFT)

    App->>Claude: Request 3: Same prefix + new query
    Note over Cache: Cache HIT<br/>0.1x input cost
    Claude-->>App: Response 3 (faster TTFT)

    Note over App,Claude: 3 requests: paid full price once,<br/>cached rate twice

Agent Capabilities: Tool Use, Code Execution, Files, Computer Use

Anthropic's agent infrastructure has four primitives that compose into complex workflows:

Tool use (function calling) allows Claude to declare that it needs to execute an external function, specifying the function name and arguments as structured JSON. The application executes the function and returns the result. Claude can chain multiple tool calls within a single turn, enabling multi-step workflows where each step's output informs the next. This is the foundation for every agent pattern.

Code execution provides a sandboxed environment where Claude can write and run Python code to perform calculations, data analysis, and transformations. For architect-style use cases (capacity planning, cost modeling, data validation), this eliminates the round-trip latency of external tool calls for computational work.

The Files API enables persistent file storage across sessions. Uploaded files receive stable identifiers that can be referenced in subsequent requests without re-uploading. For RAG architectures, this means reference documents can be uploaded once and reused across thousands of queries, combining with prompt caching for substantial cost reduction.

Computer use gives Claude the ability to interact with graphical interfaces by reading screenshots and generating mouse/keyboard actions. The model achieved 94% accuracy on insurance industry benchmarks. Unlike traditional RPA, which breaks when interfaces change, computer use adapts to visual variations because it reasons about what it sees rather than following coordinate-based scripts. Enterprise adoption is early but the architectural implication is significant: any system with a UI becomes automatable without an API.

[IMAGE: Four-quadrant diagram of agent primitives: tool use (structured API calls), code execution (computation), files (persistent state), computer use (visual interaction), with example use cases in each quadrant and arrows showing common composition patterns]

MCP: The Integration Protocol

The Model Context Protocol deserves separate treatment because its impact extends beyond Anthropic's own platform. MCP standardizes how AI agents discover and connect to external tools and data sources. The analogy that has stuck is "USB-C for AI applications": a universal connector that replaces custom integrations.

An MCP server exposes tools (functions the agent can call), resources (data the agent can read), and prompts (pre-built interaction patterns). An MCP client (Claude, or any compatible agent framework) discovers available servers, reads their capabilities, and uses them as part of its workflow. The protocol handles capability negotiation, context sharing, and transport (both stdio for local servers and HTTP for remote ones).

The adoption numbers tell the strategic story. As of mid-2026: 97 million monthly SDK downloads, over 9,400 public servers, and native support from Anthropic, OpenAI, Google DeepMind, and Microsoft. Stacklok's 2026 software survey found 41% of surveyed organizations running MCP servers in limited or broad production. Forrester predicts 30% of enterprise application vendors will launch their own MCP servers in 2026. Anthropic donated MCP to the Agentic AI Foundation (under the Linux Foundation) in December 2025, removing the single-vendor governance concern.

For architects, MCP changes the integration calculus. Instead of building custom connectors between your AI agent and each enterprise system (Jira, Confluence, Slack, Salesforce, internal databases), you deploy or adopt MCP servers for each system and connect them through a standard protocol. The agent discovers available tools at runtime, reducing the coupling between the AI layer and the systems it touches.

[IMAGE: Comparison diagram showing traditional point-to-point AI integration (N x M custom connectors, spaghetti topology) versus MCP-based integration (N agents to 1 protocol to M servers, hub topology), with connection counts annotated]

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1e40af', 'primaryTextColor': '#fff', 'primaryBorderColor': '#60a5fa', 'lineColor': '#94a3b8', 'textColor': '#e2e8f0', 'clusterBkg': '#1e293b', 'clusterBorder': '#334155', 'fontSize': '16px'}}}%%
flowchart LR
    subgraph Agents["Agent Layer"]
        CC["Claude Code"]
        CW["Cowork"]
        Custom["Custom Agent"]
    end

    subgraph MCP_Layer["MCP Protocol"]
        Discovery["Tool Discovery"]
        Transport["Transport Layer"]
    end

    subgraph Servers["MCP Servers"]
        GH["GitHub"]
        Jira["Jira"]
        Slack["Slack"]
        DB["Database"]
        CRM["CRM"]
        Internal["Internal APIs"]
    end

    CC --> Discovery
    CW --> Discovery
    Custom --> Discovery
    Discovery --> Transport
    Transport --> GH
    Transport --> Jira
    Transport --> Slack
    Transport --> DB
    Transport --> CRM
    Transport --> Internal

    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
    classDef teal fill:#0e7490,stroke:#22d3ee,stroke-width:1px,color:#fff
    classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
    class CC,CW,Custom blue
    class Discovery,Transport purple
    class GH,Jira,Slack,DB,CRM,Internal teal

Product Surfaces: Code and Cowork

Anthropic has bifurcated its agent products along the developer/business-user divide.

Claude Code is the developer-facing agent. It reads entire repositories, understands cross-file dependencies, plans multi-step changes, executes them, runs tests, iterates on failures, and commits results. It scored 80.8% on SWE-bench Verified (the highest publicly reported result) and has accumulated over 101,000 GitHub stars. Claude Code supports plan mode (review before execution), multi-agent workflows with parallel sub-agents, and extensibility through MCP servers and custom hooks. For many engineering organizations, it is the primary touchpoint with the Anthropic platform.

Claude Cowork targets business users who need agent capabilities without writing code. Anthropic open-sourced 11 plugins covering finance, legal, marketing, and customer support workflows. In February 2026, the company added private plugin marketplaces, 12 new MCP connectors, cross-application workflows (Excel to PowerPoint, for example), and admin controls for plugin governance. Cowork sits between Claude.ai (general-purpose chat) and Claude Code (developer tooling), filling the enterprise productivity gap.

Both products connect to the same MCP layer. An MCP server that exposes your internal knowledge base is accessible from Claude Code, Cowork, and custom-built applications simultaneously.

[IMAGE: Product positioning map showing Claude.ai, Cowork, and Claude Code along two axes: technical expertise required (x-axis) and workflow complexity supported (y-axis), with example use cases positioned in each quadrant]

Governance: The Enterprise Gate

Enterprise AI adoption stalls on governance more often than on capability. Anthropic's governance layer addresses this with several components:

Identity management supports SAML 2.0 and OIDC SSO, with role-based access controls and integration with governance platforms like SailPoint for periodic access reviews.

The Compliance API provides programmatic access to conversation content and activity event logs. IT and security teams can pipe this data into their existing monitoring infrastructure.

28 security integrations span six categories: data loss prevention (DLP), secure access service edge (SASE), SIEM, identity management, eDiscovery, and AI observability. These are not theoretical; they ship as production integrations with named vendors.

Data retention controls range from configurable retention periods to Zero Data Retention (ZDR), where no inputs or outputs persist after response delivery. Encryption uses TLS 1.2+ in transit and AES-256 at rest.

Certifications include SOC 2 Type 2, ISO 27001, and HIPAA compliance. Bring Your Own Key (BYOK) support entered beta in H1 2026.

The Claude Platform on AWS (launched May 2026) represents the deployment model many enterprises have waited for: the full Anthropic API, Managed Agents, and MCP connectors, all accessible via AWS IAM credentials within the customer's existing security perimeter.

[IMAGE: Enterprise governance architecture showing the Compliance API feeding into existing security tooling (SIEM, DLP, eDiscovery), with data flow arrows and retention policy decision points annotated]

Seeing It in Motion

Two deployment patterns illustrate how the stack components compose in practice.

Pattern: Enterprise Knowledge Agent

%%{init: {'theme': 'base', 'themeVariables': {'actorBkg': '#1e40af', 'actorTextColor': '#fff', 'actorBorder': '#3b82f6', 'signalColor': '#94a3b8', 'signalTextColor': '#e2e8f0', 'labelBoxBkgColor': '#1e293b', 'labelBoxBorderColor': '#334155', 'labelTextColor': '#e2e8f0', 'loopTextColor': '#e2e8f0', 'noteBkgColor': '#1e293b', 'noteTextColor': '#e2e8f0', 'noteBorderColor': '#475569', 'activationBorderColor': '#3b82f6', 'activationBkgColor': '#1e3a5f', 'fontSize': '16px'}}}%%
sequenceDiagram
    participant User
    participant Cowork as Claude Cowork
    participant MCP as MCP Layer
    participant KB as Knowledge Base
    participant Jira as Jira Server
    participant Compliance as Compliance API

    User->>Cowork: "Summarize Q2 incidents"
    Cowork->>MCP: Discover available tools
    MCP-->>Cowork: KB search, Jira query available
    Cowork->>KB: Search incident reports
    KB-->>Cowork: 23 documents found
    Cowork->>Jira: Query P1/P2 tickets Q2
    Jira-->>Cowork: 47 tickets returned
    Cowork-->>User: Structured summary with links
    Note over Compliance: Activity logged via<br/>Compliance API

Pattern: Developer CI/CD Agent

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1e40af', 'primaryTextColor': '#fff', 'primaryBorderColor': '#60a5fa', 'lineColor': '#94a3b8', 'textColor': '#e2e8f0', 'clusterBkg': '#1e293b', 'clusterBorder': '#334155', 'fontSize': '16px'}}}%%
flowchart TD
    Issue["GitHub Issue"] --> CC["Claude Code reads issue"]
    CC --> Plan["Plan mode: review approach"]
    Plan --> Edit["Multi-file code changes"]
    Edit --> Test["Run test suite"]
    Test -->|Pass| PR["Create pull request"]
    Test -->|Fail| Fix["Iterate on failures"]
    Fix --> Test
    PR --> Review["Human review"]
    Review --> Merge["Merge and deploy"]

    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
    classDef teal fill:#0e7490,stroke:#22d3ee,stroke-width:1px,color:#fff
    classDef emerald fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
    classDef rose fill:#be123c,stroke:#fb7185,stroke-width:1px,color:#fff
    class Issue blue
    class CC,Plan,Edit purple
    class Test,Fix rose
    class PR,Review teal
    class Merge emerald

By the Numbers

The economics of the Anthropic platform have shifted substantially as cost optimization features matured. The table below captures the current state as of mid-2026.

Metric	Value	Source
Opus 4.7 pricing	$5 / $25 per M input/output tokens	Anthropic pricing page
Sonnet 4.6 pricing	$3 / $15 per M input/output tokens	Anthropic pricing page
Haiku 4.5 pricing	$1 / $5 per M input/output tokens	Anthropic pricing page
Context window (Opus/Sonnet)	1,000,000 tokens	Anthropic docs
Context window (Haiku)	200,000 tokens	Anthropic docs
Max output (Opus, sync)	128K tokens	Anthropic docs
Max output (Opus, batch)	300K tokens	Anthropic docs
Prompt cache hit cost	0.1x base input price	Anthropic docs
Prompt cache latency reduction	50-85% TTFT reduction	Anthropic docs
Batch API discount	50% off all token prices	Anthropic docs
Combined savings ceiling	Up to 95% with cache + batch	Calculated
MCP monthly SDK downloads	97 million	Industry reports, mid-2026
MCP public servers	9,400+	Industry reports, mid-2026
Claude Code SWE-bench score	80.8% (Verified)	Anthropic
Enterprise security integrations	28 vendors	Anthropic, May 2026
Computer use accuracy	94% (insurance benchmarks)	Anthropic benchmarks
Orgs with MCP in production	41% of surveyed software orgs	Stacklok 2026 survey

The cost optimization potential deserves emphasis. A naive implementation calling Sonnet synchronously at full price for every request operates at a fundamentally different cost point than one that routes simple requests to Haiku, caches stable context, and batch-processes offline work. The difference can be 10-20x on the same workload.

[IMAGE: Stacked bar chart comparing effective per-query cost across four optimization levels: no optimization, model routing only, model routing plus caching, and model routing plus caching plus batch, using a realistic enterprise workload mix of 60% simple / 30% moderate / 10% complex queries]

A Concrete Example

Consider a financial services firm building an internal research agent that analysts use to investigate companies. The agent needs to pull data from an internal knowledge base, query a CRM, search public filings, and produce structured summaries.

Step 1: Model and cost architecture. The architect routes queries through a classifier. Simple lookups ("What is Company X's latest revenue?") go to Haiku at $1/$5. Synthesis queries requiring cross-source reasoning ("Compare Company X's margin trajectory against its three closest competitors over the past 8 quarters") go to Sonnet at $3/$15. The system prompt (2,000 tokens) and tool definitions (12 tools, approximately 8,000 tokens) are identical across all requests and cached.

Step 2: Integration via MCP. Three MCP servers are deployed: one wrapping the internal knowledge base (Confluence pages and research reports), one connecting to Salesforce, and one proxying a public filings API. The agent discovers these at runtime through MCP's tool discovery mechanism.

Step 3: A request flows through the system. An analyst asks: "Summarize Acme Corp's Q1 performance and flag any risk factors mentioned in our internal notes."

The classifier routes to Sonnet (cross-source synthesis required).
The cached system prompt and tool definitions load at 0.1x cost (approximately 1,000 tokens billed instead of 10,000).
Claude calls the knowledge base MCP server, retrieves 3 relevant internal reports.
Claude calls the filings MCP server, retrieves the Q1 10-Q summary.
Claude synthesizes a structured response with citations.
The Compliance API logs the interaction for the firm's regulatory record.

Step 4: Cost accounting. The request consumed approximately 45,000 input tokens (10,000 cached at 0.1x, 35,000 fresh from retrieved documents) and 2,000 output tokens. Effective input cost: (10,000 x 0.1 + 35,000) x $3/1M = $0.108. Output cost: 2,000 x $15/1M = $0.030. Total: approximately $0.14 per analyst query. At 500 queries per day, the monthly API cost runs around $2,100, well within budget for a tool replacing hours of manual research per query.

[IMAGE: Annotated request flow diagram for this financial research agent, showing token counts at each stage, cost calculations at decision points, and the MCP server connections to enterprise systems]

Where It Breaks

Vendor lock-in through the integration layer. MCP is an open standard, but Anthropic's managed agent runtime, Compliance API, and Cowork plugin ecosystem are proprietary. Organizations that build deeply on Managed Agents and Cowork create switching costs that extend well beyond model substitution. The model layer is increasingly commodity; the integration and governance layers are not.

Context window is not comprehension. A 1M-token context window can hold 500 pages of text, but retrieval quality still degrades with context length, particularly for information buried in the middle of large documents. Architects should not treat the context window as a replacement for retrieval; it is a complement. RAG with selective context loading consistently outperforms "dump everything into the window" approaches for knowledge-intensive tasks.

MCP security surface. Each MCP server is a new attack surface. A compromised server can feed malicious tool results to the agent, potentially triggering unintended actions. The protocol's 2026 roadmap includes OAuth 2.1 and audit trails, but as of mid-2026, authentication and authorization for MCP servers remain the deployer's responsibility.

Computer use is slow and expensive. Each screenshot-action cycle requires a vision-capable model call. Complex workflows involving dozens of UI interactions accumulate significant latency and token costs. Computer use is architecturally elegant for automating systems without APIs, but it should not be the first choice when an API exists.

Batch API latency. The 24-hour processing window is fine for overnight pipelines but unusable for anything interactive. There is no priority tier between real-time and batch; architects who need "slightly delayed but cheaper" must build their own queuing layer.

Prompt cache eviction. The 5-minute TTL means sporadic traffic patterns (fewer than one request per 5 minutes with the same prefix) will see cache misses on most requests, paying the 1.25x write cost without the 0.1x read benefit. Workloads need consistent request volume to realize caching savings.

Alternative Designs

Platform	Strengths	Weaknesses	Best when
Anthropic (Claude)	Deepest agent infrastructure; MCP ecosystem; strong governance/compliance; 1M context	Higher per-token cost than some alternatives; batch-only async tier; platform lock-in risk through Cowork/Managed Agents	Building agent-heavy enterprise systems with regulatory requirements
OpenAI (GPT)	Largest developer ecosystem; broadest model range (GPT, o-series, image, audio); Assistants API with built-in RAG	Governance and compliance tools less mature; function calling predates MCP (now adopting it); pricing volatile	Consumer-facing products; multi-modal applications requiring image/audio generation
Google (Gemini)	Native integration with Google Cloud and Workspace; 2M-token context window (Gemini 1.5 Pro); Vertex AI MLOps	Weaker agentic tooling; enterprise governance less proven outside GCP shops; smaller third-party ecosystem	Google Cloud-native organizations; extremely long-context use cases
AWS Bedrock (multi-model)	Model-agnostic; runs Claude, Llama, Mistral behind one API; deep AWS integration; Guardrails for governance	Managed service layer adds latency and cost; feature availability lags native APIs; less control over model-specific capabilities	Multi-model strategies; organizations committed to AWS

The comparison is deliberately platform-level, not model-level. Model benchmarks shift quarterly; platform capabilities compound over years.

[IMAGE: Radar chart comparing the four platforms across six axes: agent infrastructure maturity, governance depth, ecosystem size, cost optimization levers, context window capacity, and multi-modal breadth]

How It Is Used in Practice

Block (formerly Square) reported 50-75% time savings on common engineering tasks after deploying MCP-compatible Goose agents connected to internal development tools. The deployment uses MCP servers wrapping internal APIs, with Claude Code as the primary developer interface.

Financial services firms are the heaviest adopters of the Compliance API and governance features. The combination of SOC 2 Type 2, HIPAA compliance, Zero Data Retention, and 28 security integrations addresses regulatory requirements that previously blocked AI adoption in banking and insurance.

The Claude Platform on AWS deployment model (May 2026) resolves a common enterprise objection: data leaving the organization's cloud boundary. With IAM-credential access to the full Anthropic stack, organizations can run agents, MCP connectors, and Managed Agents within their AWS accounts while Anthropic handles orchestration.

Managed Agents with self-hosted sandboxes represent a production pattern for organizations that want Anthropic's orchestration (context management, error recovery, tool routing) but need tool execution on their own infrastructure. The agent loop runs on Anthropic; the tool calls execute in sandboxes hosted on Cloudflare, Daytona, Modal, or Vercel within the customer's environment.

[IMAGE: Deployment topology diagram showing the split between Anthropic-hosted orchestration and customer-hosted execution sandbox, with data flow arrows and trust boundaries clearly marked]

Insights Worth Remembering

The model is the least sticky part of the stack. Organizations can swap models with a configuration change. They cannot swap integration protocols, governance infrastructure, or agent runtimes without re-architecture. This asymmetry is the real lock-in vector.
MCP's value is not what it connects to today, but that it exists as a standard. The 9,400 servers and 97 million downloads create a network effect. Each new MCP server makes every MCP-compatible agent more capable without any change to the agent itself.
Context engineering has replaced prompt engineering as the high-leverage skill. Writing a good prompt is table stakes. Designing a system that manages what enters the context window, when, and in what order is what separates a $0.14 query from a $1.40 one.
Prompt caching is not an optimization; it is an architectural pattern. Systems should be designed from the start to maximize prefix stability. This means structuring system prompts, tool definitions, and reference context as a stable prefix, with only the user query and retrieved documents varying per request.
The Cowork/Code split reflects a real organizational divide. Developers and business users have fundamentally different interaction models with AI agents. A single product surface cannot serve both well. The shared MCP layer is what keeps them from becoming silos.
Computer use is a strategic capability, not a primary integration method. It exists to automate the systems that have no API. Using it on systems that do have APIs wastes tokens and adds fragility. Treat it as the fallback, not the default.
Enterprise AI adoption is gated by governance, not capability. The 28 security integrations and Compliance API exist because the model was already capable enough; the barrier was giving security teams the visibility and control they require.
Batch plus cache is the enterprise cost structure. The combination of 50% batch discount and 90% cache reduction on stable prefixes creates a cost floor that makes high-volume AI pipelines economically viable in ways that per-request pricing does not.

Open Questions

Will MCP's governance keep pace with its adoption? The protocol is now under the Linux Foundation, but the 2026 roadmap for OAuth 2.1 support, audit trails, and gateway behavior is still in progress. Enterprise deployments are running ahead of the protocol's security maturity in some areas. Whether the governance features land before a high-profile MCP server compromise shapes enterprise trust is genuinely uncertain.

How does the Managed Agent boundary evolve? Today, Anthropic handles orchestration while customers host execution sandboxes. The likely direction is more customer control over orchestration, but how much and how soon is unclear. Organizations building on Managed Agents should plan for the interface to shift.

Can Anthropic maintain the model-as-commodity position? The current strategy treats models as one layer in a larger platform. If a competitor ships a model with dramatically superior capabilities (a genuine reasoning leap, not incremental benchmark gains), the platform advantages could become secondary. The counter-argument: the 80% enterprise revenue share suggests customers are buying the platform, not just the model.

What happens when MCP servers proliferate beyond curation? At 9,400+ public servers, quality and security vary widely. The ecosystem needs something analogous to package registries with verified publishers, vulnerability scanning, and deprecation signals. This infrastructure does not yet exist.

Will computer use economics improve enough for high-volume use? Current costs make it viable for low-frequency, high-value automation (insurance processing, compliance checks) but prohibitive for high-volume workflows. Architectural improvements to reduce the number of screenshot-action cycles per task would expand the addressable use cases substantially.

Sources and Further Reading