The Anthropic Platform Stack: An Architect's Guide to Building on Claude
June 02, 2026 · 27 min read
In January 2024, Anthropic was a model company. You called the Messages API, got a completion, and handled everything else yourself. Eighteen months later, enterprise contracts account for roughly 80% of Anthropic's revenue, the Model Context Protocol has 97 million monthly SDK downloads, and Claude Code holds the highest publicly reported score on SWE-bench Verified. The company ships an integration protocol, a managed agent runtime, a compliance API with 28 security vendor connections, and department-specific automation plugins.
If you are still thinking of Anthropic as "the company that makes Claude," you are working with an incomplete map.
Why this matters: The platform capabilities that surround a foundation model now determine more architectural decisions than the model itself. Choosing Claude is no longer just a model selection; it is a platform commitment with implications for integration patterns, governance posture, agent runtime, and cost structure. Architects who evaluate only the inference API miss the components that will shape 80% of their implementation work.
TL;DR
- Anthropic offers three model tiers (Opus, Sonnet, Haiku) spanning a 5x cost range, all sharing a common API surface with tool use, structured output, and streaming.
- The Model Context Protocol (MCP) has become the de facto standard for connecting AI agents to enterprise systems, with 41% of surveyed software organizations running MCP servers in production.
- Prompt caching reduces input token costs by 90% and latency by up to 85%, making it the single highest-leverage optimization for production deployments.
- Claude Code (80.8% SWE-bench Verified) and Claude Cowork represent two distinct agent surfaces: one for developers, one for business users, both extensible through the same MCP layer.
- The Claude Platform on AWS (launched May 2026) allows enterprises to run the full Anthropic stack using IAM credentials within their own security perimeter.
- Managed Agents operate in self-hosted sandboxes, keeping tool execution inside customer infrastructure while Anthropic handles orchestration.
- The Compliance API and 28 security integrations span DLP, SASE, SIEM, identity management, and eDiscovery, designed for regulated environments with SOC 2 Type 2, ISO 27001, and HIPAA certifications.
- Enterprise AI architecture is converging on a pattern where models are interchangeable, but integration layers and governance controls are not.
At a Glance
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1e40af', 'primaryTextColor': '#fff', 'primaryBorderColor': '#60a5fa', 'lineColor': '#94a3b8', 'textColor': '#e2e8f0', 'clusterBkg': '#1e293b', 'clusterBorder': '#334155', 'fontSize': '16px'}}}%%
flowchart TB
subgraph Models["Foundation Models"]
O["Opus"] --- So["Sonnet"] --- H["Haiku"]
end
subgraph APIs["API Layer"]
M["Messages API"] --- B["Batch API"] --- S["Streaming"]
end
subgraph Agent["Agent Capabilities"]
TU["Tool Use"] --- CE["Code Execution"] --- FA["Files API"] --- CU["Computer Use"]
end
subgraph Integration["Integration Layer"]
MCP["MCP Protocol"] --- MC["MCP Connectors"] --- PL["Plugins"]
end
subgraph Products["Product Surfaces"]
CC["Claude Code"] --- CW["Cowork"] --- CI["Claude.ai"]
end
subgraph Gov["Governance"]
SSO["SSO/SAML"] --- COMP["Compliance API"] --- AUD["Audit Controls"]
end
Models --> APIs
APIs --> Agent
Agent --> Integration
Integration --> Products
Gov --- APIs
Gov --- Products
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
classDef teal fill:#0e7490,stroke:#22d3ee,stroke-width:1px,color:#fff
classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
classDef emerald fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
classDef rose fill:#be123c,stroke:#fb7185,stroke-width:1px,color:#fff
classDef slate fill:#334155,stroke:#64748b,stroke-width:1px,color:#e2e8f0
class O,So,H blue
class M,B,S purple
class TU,CE,FA,CU teal
class MCP,MC,PL amber
class CC,CW,CI emerald
class SSO,COMP,AUD rose
This is the platform as it exists in mid-2026. Six layers, each with distinct architectural implications. The rest of this article works through them from bottom to top, then maps the deployment patterns that connect them.
Before the Platform
The history of Anthropic's product strategy can be read as three distinct eras, each defined by what the company shipped alongside the model.
%%{init: {'theme': 'base', 'themeVariables': {'cScale0': '#1e40af', 'cScale1': '#6d28d9', 'cScale2': '#b45309', 'cScale3': '#be123c', 'cScale4': '#047857', 'cScale5': '#0e7490', 'cScaleLabel0': '#e2e8f0', 'cScaleLabel1': '#e2e8f0', 'cScaleLabel2': '#e2e8f0', 'cScaleLabel3': '#e2e8f0', 'cScaleLabel4': '#e2e8f0', 'cScaleLabel5': '#e2e8f0', 'textColor': '#e2e8f0', 'lineColor': '#94a3b8', 'fontSize': '16px'}}}%%
timeline
title Anthropic Platform Evolution
2023 : Claude 1, 2 launched
: Messages API only
: No tool use
2024 : Claude 3 family ships
: Tool use, vision added
: MCP announced Nov 2024
2025 : Prompt caching, Batch API
: Claude Code public beta
: MCP donated to Linux Foundation
2026 : Platform on AWS
: Managed Agents, Cowork plugins
: Compliance API, 28 security integrations
: 1M-token context windows
In 2023, Anthropic was a pure inference provider. You sent text, received text. The company differentiated on safety and instruction-following quality, but the API surface was minimal. Integration work fell entirely on the customer.
The inflection came in late 2024. Three capabilities shipped in rapid succession: tool use (function calling), vision, and the Model Context Protocol. Tool use transformed Claude from a text generator into something that could act on external systems. MCP proposed a standard for how those connections should be structured. The combination turned "build a chatbot" into "build an agent," but the infrastructure to support agents at enterprise scale did not yet exist.
2025 filled the gaps. Prompt caching made long-context agent workflows economically viable (90% reduction in cached input costs). The Batch API introduced asynchronous processing at half the standard price. Claude Code entered public beta, demonstrating what a fully autonomous developer agent could look like. And in December 2025, Anthropic donated MCP to the Agentic AI Foundation under the Linux Foundation, signaling that the protocol would be governed as an open standard rather than a proprietary advantage.
By mid-2026, the platform includes model hosting, managed agent runtimes, self-hosted execution sandboxes, a compliance API, 28 security vendor integrations, and deployment directly into AWS infrastructure via IAM. The architectural decision is no longer "which model" but "how much of Anthropic's platform do we adopt, and where do we draw the boundary?"
[IMAGE: Timeline infographic showing Anthropic's evolution from inference-only API (2023) to full platform (2026), with key capability additions annotated at each stage, styled as a horizontal swim-lane diagram]
How the Stack Actually Works
The Model Layer: Three Tiers, One API
Every Claude model shares the same Messages API contract. The same tool definitions, system prompts, and structured output schemas work across all three tiers. This uniformity is an architectural feature, not an accident: it lets teams route requests to different models based on complexity, cost, or latency requirements without changing application code.
Opus (currently 4.7/4.8) targets problems requiring deep reasoning, sustained attention across long documents, and complex multi-step workflows. It supports a 1M-token context window and can produce up to 128K tokens in a single response. At $5/$25 per million input/output tokens, it is the premium tier.
Sonnet (4.6) occupies the middle ground that handles most production workloads. Its 1M-token context window matches Opus, with output capped at 64K tokens. At $3/$15 per million tokens, it delivers roughly 85-90% of Opus-level quality at 60% of the cost. Sonnet 4.6 was the first Sonnet generation preferred over the previous Opus in coding evaluations, a notable milestone.
Haiku (4.5) is the high-throughput, low-cost tier at $1/$5 per million tokens, with a 200K-token context window. It handles classification, entity extraction, content moderation, simple summarization, and routing decisions where speed matters more than depth.
[IMAGE: Decision matrix diagram showing model tier selection criteria across four axes: reasoning depth, latency sensitivity, cost per query, and context length requirement, with shaded regions indicating optimal tier for each combination]
The API Surface: Messages, Batch, and Streaming
The Messages API is the primary inference endpoint. A single request can include system prompts, multi-turn conversation history, tool definitions, images, and documents. The response can be a direct completion, a tool-use request (asking the application to execute a function and return the result), or a structured JSON output conforming to a provided schema.
The Batch API accepts up to thousands of requests as a single submission and processes them asynchronously within 24 hours at a flat 50% discount on all token prices. Batch processing combines with prompt caching: a pipeline that processes 1,000 documents against the same system prompt can cache that prompt once and process each document at the cached rate, stacking a 90% input cost reduction on top of the 50% batch discount. In practice, this can reduce per-document costs by 95% compared to naive real-time processing.
Streaming delivers tokens as they are generated, using server-sent events. For agent workflows, streaming provides progress visibility; for user-facing applications, it eliminates the perception of latency that comes from waiting for a complete response.
[IMAGE: Cost waterfall diagram showing how a $100 baseline API spend reduces through model tier selection, prompt caching, and batch processing, with final effective cost annotated at each stage]
Context Engineering: The Skill That Replaced Prompt Engineering
The shift from 4K-token context windows in 2022 to 1M-token windows in 2026 changed the fundamental constraint. The bottleneck is no longer "how do I fit my information into the context?" but "how do I structure a large context so the model uses it effectively?"
Context engineering encompasses several techniques:
Prompt caching is the highest-leverage optimization. When the first portion of a request (system prompt, tool definitions, reference documents) is identical across calls, Anthropic caches the key-value pairs from that prefix. Subsequent requests that share the prefix pay only 10% of the standard input cost for the cached portion, with time-to-first-token reduced by 50-85%. The cache has a 5-minute TTL for standard caching and a 1-hour option at 2x the write cost. In production workloads with consistent traffic, cache hit rates of 80-95% are typical for stable system prompts.
Structured context placement matters because retrieval quality degrades in the middle of very long contexts (a phenomenon documented across multiple model families). Placing the most critical information at the beginning and end of the context window, with supporting material in the middle, produces measurably better results.
Tool definitions as context is an underappreciated pattern. Each tool definition consumes tokens from the context budget. A system with 50 tools, each with detailed parameter schemas and descriptions, may consume 10,000-15,000 tokens before any user content appears. Architects must budget for this and consider dynamic tool loading (providing only the tools relevant to the current task) for tool-heavy applications.
%%{init: {'theme': 'base', 'themeVariables': {'actorBkg': '#1e40af', 'actorTextColor': '#fff', 'actorBorder': '#3b82f6', 'signalColor': '#94a3b8', 'signalTextColor': '#e2e8f0', 'labelBoxBkgColor': '#1e293b', 'labelBoxBorderColor': '#334155', 'labelTextColor': '#e2e8f0', 'loopTextColor': '#e2e8f0', 'noteBkgColor': '#1e293b', 'noteTextColor': '#e2e8f0', 'noteBorderColor': '#475569', 'activationBorderColor': '#3b82f6', 'activationBkgColor': '#1e3a5f', 'fontSize': '16px'}}}%%
sequenceDiagram
participant App as Application
participant Cache as Prompt Cache
participant Claude as Claude API
App->>Claude: Request 1: System prompt + tools + query
Note over Cache: Cache WRITE<br/>1.25x input cost
Claude-->>App: Response 1
App->>Claude: Request 2: Same prefix + new query
Note over Cache: Cache HIT<br/>0.1x input cost
Claude-->>App: Response 2 (faster TTFT)
App->>Claude: Request 3: Same prefix + new query
Note over Cache: Cache HIT<br/>0.1x input cost
Claude-->>App: Response 3 (faster TTFT)
Note over App,Claude: 3 requests: paid full price once,<br/>cached rate twice
Agent Capabilities: Tool Use, Code Execution, Files, Computer Use
Anthropic's agent infrastructure has four primitives that compose into complex workflows:
Tool use (function calling) allows Claude to declare that it needs to execute an external function, specifying the function name and arguments as structured JSON. The application executes the function and returns the result. Claude can chain multiple tool calls within a single turn, enabling multi-step workflows where each step's output informs the next. This is the foundation for every agent pattern.
Code execution provides a sandboxed environment where Claude can write and run Python code to perform calculations, data analysis, and transformations. For architect-style use cases (capacity planning, cost modeling, data validation), this eliminates the round-trip latency of external tool calls for computational work.
The Files API enables persistent file storage across sessions. Uploaded files receive stable identifiers that can be referenced in subsequent requests without re-uploading. For RAG architectures, this means reference documents can be uploaded once and reused across thousands of queries, combining with prompt caching for substantial cost reduction.
Computer use gives Claude the ability to interact with graphical interfaces by reading screenshots and generating mouse/keyboard actions. The model achieved 94% accuracy on insurance industry benchmarks. Unlike traditional RPA, which breaks when interfaces change, computer use adapts to visual variations because it reasons about what it sees rather than following coordinate-based scripts. Enterprise adoption is early but the architectural implication is significant: any system with a UI becomes automatable without an API.
[IMAGE: Four-quadrant diagram of agent primitives: tool use (structured API calls), code execution (computation), files (persistent state), computer use (visual interaction), with example use cases in each quadrant and arrows showing common composition patterns]
MCP: The Integration Protocol
The Model Context Protocol deserves separate treatment because its impact extends beyond Anthropic's own platform. MCP standardizes how AI agents discover and connect to external tools and data sources. The analogy that has stuck is "USB-C for AI applications": a universal connector that replaces custom integrations.
An MCP server exposes tools (functions the agent can call), resources (data the agent can read), and prompts (pre-built interaction patterns). An MCP client (Claude, or any compatible agent framework) discovers available servers, reads their capabilities, and uses them as part of its workflow. The protocol handles capability negotiation, context sharing, and transport (both stdio for local servers and HTTP for remote ones).
The adoption numbers tell the strategic story. As of mid-2026: 97 million monthly SDK downloads, over 9,400 public servers, and native support from Anthropic, OpenAI, Google DeepMind, and Microsoft. Stacklok's 2026 software survey found 41% of surveyed organizations running MCP servers in limited or broad production. Forrester predicts 30% of enterprise application vendors will launch their own MCP servers in 2026. Anthropic donated MCP to the Agentic AI Foundation (under the Linux Foundation) in December 2025, removing the single-vendor governance concern.
For architects, MCP changes the integration calculus. Instead of building custom connectors between your AI agent and each enterprise system (Jira, Confluence, Slack, Salesforce, internal databases), you deploy or adopt MCP servers for each system and connect them through a standard protocol. The agent discovers available tools at runtime, reducing the coupling between the AI layer and the systems it touches.
[IMAGE: Comparison diagram showing traditional point-to-point AI integration (N x M custom connectors, spaghetti topology) versus MCP-based integration (N agents to 1 protocol to M servers, hub topology), with connection counts annotated]
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1e40af', 'primaryTextColor': '#fff', 'primaryBorderColor': '#60a5fa', 'lineColor': '#94a3b8', 'textColor': '#e2e8f0', 'clusterBkg': '#1e293b', 'clusterBorder': '#334155', 'fontSize': '16px'}}}%%
flowchart LR
subgraph Agents["Agent Layer"]
CC["Claude Code"]
CW["Cowork"]
Custom["Custom Agent"]
end
subgraph MCP_Layer["MCP Protocol"]
Discovery["Tool Discovery"]
Transport["Transport Layer"]
end
subgraph Servers["MCP Servers"]
GH["GitHub"]
Jira["Jira"]
Slack["Slack"]
DB["Database"]
CRM["CRM"]
Internal["Internal APIs"]
end
CC --> Discovery
CW --> Discovery
Custom --> Discovery
Discovery --> Transport
Transport --> GH
Transport --> Jira
Transport --> Slack
Transport --> DB
Transport --> CRM
Transport --> Internal
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
classDef teal fill:#0e7490,stroke:#22d3ee,stroke-width:1px,color:#fff
classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
class CC,CW,Custom blue
class Discovery,Transport purple
class GH,Jira,Slack,DB,CRM,Internal teal
Product Surfaces: Code and Cowork
Anthropic has bifurcated its agent products along the developer/business-user divide.
Claude Code is the developer-facing agent. It reads entire repositories, understands cross-file dependencies, plans multi-step changes, executes them, runs tests, iterates on failures, and commits results. It scored 80.8% on SWE-bench Verified (the highest publicly reported result) and has accumulated over 101,000 GitHub stars. Claude Code supports plan mode (review before execution), multi-agent workflows with parallel sub-agents, and extensibility through MCP servers and custom hooks. For many engineering organizations, it is the primary touchpoint with the Anthropic platform.
Claude Cowork targets business users who need agent capabilities without writing code. Anthropic open-sourced 11 plugins covering finance, legal, marketing, and customer support workflows. In February 2026, the company added private plugin marketplaces, 12 new MCP connectors, cross-application workflows (Excel to PowerPoint, for example), and admin controls for plugin governance. Cowork sits between Claude.ai (general-purpose chat) and Claude Code (developer tooling), filling the enterprise productivity gap.
Both products connect to the same MCP layer. An MCP server that exposes your internal knowledge base is accessible from Claude Code, Cowork, and custom-built applications simultaneously.
[IMAGE: Product positioning map showing Claude.ai, Cowork, and Claude Code along two axes: technical expertise required (x-axis) and workflow complexity supported (y-axis), with example use cases positioned in each quadrant]
Governance: The Enterprise Gate
Enterprise AI adoption stalls on governance more often than on capability. Anthropic's governance layer addresses this with several components:
Identity management supports SAML 2.0 and OIDC SSO, with role-based access controls and integration with governance platforms like SailPoint for periodic access reviews.
The Compliance API provides programmatic access to conversation content and activity event logs. IT and security teams can pipe this data into their existing monitoring infrastructure.
28 security integrations span six categories: data loss prevention (DLP), secure access service edge (SASE), SIEM, identity management, eDiscovery, and AI observability. These are not theoretical; they ship as production integrations with named vendors.
Data retention controls range from configurable retention periods to Zero Data Retention (ZDR), where no inputs or outputs persist after response delivery. Encryption uses TLS 1.2+ in transit and AES-256 at rest.
Certifications include SOC 2 Type 2, ISO 27001, and HIPAA compliance. Bring Your Own Key (BYOK) support entered beta in H1 2026.
The Claude Platform on AWS (launched May 2026) represents the deployment model many enterprises have waited for: the full Anthropic API, Managed Agents, and MCP connectors, all accessible via AWS IAM credentials within the customer's existing security perimeter.
[IMAGE: Enterprise governance architecture showing the Compliance API feeding into existing security tooling (SIEM, DLP, eDiscovery), with data flow arrows and retention policy decision points annotated]
Seeing It in Motion
Two deployment patterns illustrate how the stack components compose in practice.
Pattern: Enterprise Knowledge Agent
%%{init: {'theme': 'base', 'themeVariables': {'actorBkg': '#1e40af', 'actorTextColor': '#fff', 'actorBorder': '#3b82f6', 'signalColor': '#94a3b8', 'signalTextColor': '#e2e8f0', 'labelBoxBkgColor': '#1e293b', 'labelBoxBorderColor': '#334155', 'labelTextColor': '#e2e8f0', 'loopTextColor': '#e2e8f0', 'noteBkgColor': '#1e293b', 'noteTextColor': '#e2e8f0', 'noteBorderColor': '#475569', 'activationBorderColor': '#3b82f6', 'activationBkgColor': '#1e3a5f', 'fontSize': '16px'}}}%%
sequenceDiagram
participant User
participant Cowork as Claude Cowork
participant MCP as MCP Layer
participant KB as Knowledge Base
participant Jira as Jira Server
participant Compliance as Compliance API
User->>Cowork: "Summarize Q2 incidents"
Cowork->>MCP: Discover available tools
MCP-->>Cowork: KB search, Jira query available
Cowork->>KB: Search incident reports
KB-->>Cowork: 23 documents found
Cowork->>Jira: Query P1/P2 tickets Q2
Jira-->>Cowork: 47 tickets returned
Cowork-->>User: Structured summary with links
Note over Compliance: Activity logged via<br/>Compliance API
Pattern: Developer CI/CD Agent
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1e40af', 'primaryTextColor': '#fff', 'primaryBorderColor': '#60a5fa', 'lineColor': '#94a3b8', 'textColor': '#e2e8f0', 'clusterBkg': '#1e293b', 'clusterBorder': '#334155', 'fontSize': '16px'}}}%%
flowchart TD
Issue["GitHub Issue"] --> CC["Claude Code reads issue"]
CC --> Plan["Plan mode: review approach"]
Plan --> Edit["Multi-file code changes"]
Edit --> Test["Run test suite"]
Test -->|Pass| PR["Create pull request"]
Test -->|Fail| Fix["Iterate on failures"]
Fix --> Test
PR --> Review["Human review"]
Review --> Merge["Merge and deploy"]
classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
classDef teal fill:#0e7490,stroke:#22d3ee,stroke-width:1px,color:#fff
classDef emerald fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff
classDef rose fill:#be123c,stroke:#fb7185,stroke-width:1px,color:#fff
class Issue blue
class CC,Plan,Edit purple
class Test,Fix rose
class PR,Review teal
class Merge emerald
By the Numbers
The economics of the Anthropic platform have shifted substantially as cost optimization features matured. The table below captures the current state as of mid-2026.
| Metric | Value | Source |
|---|---|---|
| Opus 4.7 pricing | $5 / $25 per M input/output tokens | Anthropic pricing page |
| Sonnet 4.6 pricing | $3 / $15 per M input/output tokens | Anthropic pricing page |
| Haiku 4.5 pricing | $1 / $5 per M input/output tokens | Anthropic pricing page |
| Context window (Opus/Sonnet) | 1,000,000 tokens | Anthropic docs |
| Context window (Haiku) | 200,000 tokens | Anthropic docs |
| Max output (Opus, sync) | 128K tokens | Anthropic docs |
| Max output (Opus, batch) | 300K tokens | Anthropic docs |
| Prompt cache hit cost | 0.1x base input price | Anthropic docs |
| Prompt cache latency reduction | 50-85% TTFT reduction | Anthropic docs |
| Batch API discount | 50% off all token prices | Anthropic docs |
| Combined savings ceiling | Up to 95% with cache + batch | Calculated |
| MCP monthly SDK downloads | 97 million | Industry reports, mid-2026 |
| MCP public servers | 9,400+ | Industry reports, mid-2026 |
| Claude Code SWE-bench score | 80.8% (Verified) | Anthropic |
| Enterprise security integrations | 28 vendors | Anthropic, May 2026 |
| Computer use accuracy | 94% (insurance benchmarks) | Anthropic benchmarks |
| Orgs with MCP in production | 41% of surveyed software orgs | Stacklok 2026 survey |
The cost optimization potential deserves emphasis. A naive implementation calling Sonnet synchronously at full price for every request operates at a fundamentally different cost point than one that routes simple requests to Haiku, caches stable context, and batch-processes offline work. The difference can be 10-20x on the same workload.
[IMAGE: Stacked bar chart comparing effective per-query cost across four optimization levels: no optimization, model routing only, model routing plus caching, and model routing plus caching plus batch, using a realistic enterprise workload mix of 60% simple / 30% moderate / 10% complex queries]
A Concrete Example
Consider a financial services firm building an internal research agent that analysts use to investigate companies. The agent needs to pull data from an internal knowledge base, query a CRM, search public filings, and produce structured summaries.
Step 1: Model and cost architecture. The architect routes queries through a classifier. Simple lookups ("What is Company X's latest revenue?") go to Haiku at $1/$5. Synthesis queries requiring cross-source reasoning ("Compare Company X's margin trajectory against its three closest competitors over the past 8 quarters") go to Sonnet at $3/$15. The system prompt (2,000 tokens) and tool definitions (12 tools, approximately 8,000 tokens) are identical across all requests and cached.
Step 2: Integration via MCP. Three MCP servers are deployed: one wrapping the internal knowledge base (Confluence pages and research reports), one connecting to Salesforce, and one proxying a public filings API. The agent discovers these at runtime through MCP's tool discovery mechanism.
Step 3: A request flows through the system. An analyst asks: "Summarize Acme Corp's Q1 performance and flag any risk factors mentioned in our internal notes."
- The classifier routes to Sonnet (cross-source synthesis required).
- The cached system prompt and tool definitions load at 0.1x cost (approximately 1,000 tokens billed instead of 10,000).
- Claude calls the knowledge base MCP server, retrieves 3 relevant internal reports.
- Claude calls the filings MCP server, retrieves the Q1 10-Q summary.
- Claude synthesizes a structured response with citations.
- The Compliance API logs the interaction for the firm's regulatory record.
Step 4: Cost accounting. The request consumed approximately 45,000 input tokens (10,000 cached at 0.1x, 35,000 fresh from retrieved documents) and 2,000 output tokens. Effective input cost: (10,000 x 0.1 + 35,000) x $3/1M = $0.108. Output cost: 2,000 x $15/1M = $0.030. Total: approximately $0.14 per analyst query. At 500 queries per day, the monthly API cost runs around $2,100, well within budget for a tool replacing hours of manual research per query.
[IMAGE: Annotated request flow diagram for this financial research agent, showing token counts at each stage, cost calculations at decision points, and the MCP server connections to enterprise systems]
Where It Breaks
Vendor lock-in through the integration layer. MCP is an open standard, but Anthropic's managed agent runtime, Compliance API, and Cowork plugin ecosystem are proprietary. Organizations that build deeply on Managed Agents and Cowork create switching costs that extend well beyond model substitution. The model layer is increasingly commodity; the integration and governance layers are not.
Context window is not comprehension. A 1M-token context window can hold 500 pages of text, but retrieval quality still degrades with context length, particularly for information buried in the middle of large documents. Architects should not treat the context window as a replacement for retrieval; it is a complement. RAG with selective context loading consistently outperforms "dump everything into the window" approaches for knowledge-intensive tasks.
MCP security surface. Each MCP server is a new attack surface. A compromised server can feed malicious tool results to the agent, potentially triggering unintended actions. The protocol's 2026 roadmap includes OAuth 2.1 and audit trails, but as of mid-2026, authentication and authorization for MCP servers remain the deployer's responsibility.
Computer use is slow and expensive. Each screenshot-action cycle requires a vision-capable model call. Complex workflows involving dozens of UI interactions accumulate significant latency and token costs. Computer use is architecturally elegant for automating systems without APIs, but it should not be the first choice when an API exists.
Batch API latency. The 24-hour processing window is fine for overnight pipelines but unusable for anything interactive. There is no priority tier between real-time and batch; architects who need "slightly delayed but cheaper" must build their own queuing layer.
Prompt cache eviction. The 5-minute TTL means sporadic traffic patterns (fewer than one request per 5 minutes with the same prefix) will see cache misses on most requests, paying the 1.25x write cost without the 0.1x read benefit. Workloads need consistent request volume to realize caching savings.
Alternative Designs
| Platform | Strengths | Weaknesses | Best when |
|---|---|---|---|
| Anthropic (Claude) | Deepest agent infrastructure; MCP ecosystem; strong governance/compliance; 1M context | Higher per-token cost than some alternatives; batch-only async tier; platform lock-in risk through Cowork/Managed Agents | Building agent-heavy enterprise systems with regulatory requirements |
| OpenAI (GPT) | Largest developer ecosystem; broadest model range (GPT, o-series, image, audio); Assistants API with built-in RAG | Governance and compliance tools less mature; function calling predates MCP (now adopting it); pricing volatile | Consumer-facing products; multi-modal applications requiring image/audio generation |
| Google (Gemini) | Native integration with Google Cloud and Workspace; 2M-token context window (Gemini 1.5 Pro); Vertex AI MLOps | Weaker agentic tooling; enterprise governance less proven outside GCP shops; smaller third-party ecosystem | Google Cloud-native organizations; extremely long-context use cases |
| AWS Bedrock (multi-model) | Model-agnostic; runs Claude, Llama, Mistral behind one API; deep AWS integration; Guardrails for governance | Managed service layer adds latency and cost; feature availability lags native APIs; less control over model-specific capabilities | Multi-model strategies; organizations committed to AWS |
The comparison is deliberately platform-level, not model-level. Model benchmarks shift quarterly; platform capabilities compound over years.
[IMAGE: Radar chart comparing the four platforms across six axes: agent infrastructure maturity, governance depth, ecosystem size, cost optimization levers, context window capacity, and multi-modal breadth]
How It Is Used in Practice
Block (formerly Square) reported 50-75% time savings on common engineering tasks after deploying MCP-compatible Goose agents connected to internal development tools. The deployment uses MCP servers wrapping internal APIs, with Claude Code as the primary developer interface.
Financial services firms are the heaviest adopters of the Compliance API and governance features. The combination of SOC 2 Type 2, HIPAA compliance, Zero Data Retention, and 28 security integrations addresses regulatory requirements that previously blocked AI adoption in banking and insurance.
The Claude Platform on AWS deployment model (May 2026) resolves a common enterprise objection: data leaving the organization's cloud boundary. With IAM-credential access to the full Anthropic stack, organizations can run agents, MCP connectors, and Managed Agents within their AWS accounts while Anthropic handles orchestration.
Managed Agents with self-hosted sandboxes represent a production pattern for organizations that want Anthropic's orchestration (context management, error recovery, tool routing) but need tool execution on their own infrastructure. The agent loop runs on Anthropic; the tool calls execute in sandboxes hosted on Cloudflare, Daytona, Modal, or Vercel within the customer's environment.
[IMAGE: Deployment topology diagram showing the split between Anthropic-hosted orchestration and customer-hosted execution sandbox, with data flow arrows and trust boundaries clearly marked]
Insights Worth Remembering
-
The model is the least sticky part of the stack. Organizations can swap models with a configuration change. They cannot swap integration protocols, governance infrastructure, or agent runtimes without re-architecture. This asymmetry is the real lock-in vector.
-
MCP's value is not what it connects to today, but that it exists as a standard. The 9,400 servers and 97 million downloads create a network effect. Each new MCP server makes every MCP-compatible agent more capable without any change to the agent itself.
-
Context engineering has replaced prompt engineering as the high-leverage skill. Writing a good prompt is table stakes. Designing a system that manages what enters the context window, when, and in what order is what separates a $0.14 query from a $1.40 one.
-
Prompt caching is not an optimization; it is an architectural pattern. Systems should be designed from the start to maximize prefix stability. This means structuring system prompts, tool definitions, and reference context as a stable prefix, with only the user query and retrieved documents varying per request.
-
The Cowork/Code split reflects a real organizational divide. Developers and business users have fundamentally different interaction models with AI agents. A single product surface cannot serve both well. The shared MCP layer is what keeps them from becoming silos.
-
Computer use is a strategic capability, not a primary integration method. It exists to automate the systems that have no API. Using it on systems that do have APIs wastes tokens and adds fragility. Treat it as the fallback, not the default.
-
Enterprise AI adoption is gated by governance, not capability. The 28 security integrations and Compliance API exist because the model was already capable enough; the barrier was giving security teams the visibility and control they require.
-
Batch plus cache is the enterprise cost structure. The combination of 50% batch discount and 90% cache reduction on stable prefixes creates a cost floor that makes high-volume AI pipelines economically viable in ways that per-request pricing does not.
Open Questions
Will MCP's governance keep pace with its adoption? The protocol is now under the Linux Foundation, but the 2026 roadmap for OAuth 2.1 support, audit trails, and gateway behavior is still in progress. Enterprise deployments are running ahead of the protocol's security maturity in some areas. Whether the governance features land before a high-profile MCP server compromise shapes enterprise trust is genuinely uncertain.
How does the Managed Agent boundary evolve? Today, Anthropic handles orchestration while customers host execution sandboxes. The likely direction is more customer control over orchestration, but how much and how soon is unclear. Organizations building on Managed Agents should plan for the interface to shift.
Can Anthropic maintain the model-as-commodity position? The current strategy treats models as one layer in a larger platform. If a competitor ships a model with dramatically superior capabilities (a genuine reasoning leap, not incremental benchmark gains), the platform advantages could become secondary. The counter-argument: the 80% enterprise revenue share suggests customers are buying the platform, not just the model.
What happens when MCP servers proliferate beyond curation? At 9,400+ public servers, quality and security vary widely. The ecosystem needs something analogous to package registries with verified publishers, vulnerability scanning, and deprecation signals. This infrastructure does not yet exist.
Will computer use economics improve enough for high-volume use? Current costs make it viable for low-frequency, high-value automation (insurance processing, compliance checks) but prohibitive for high-volume workflows. Architectural improvements to reduce the number of screenshot-action cycles per task would expand the addressable use cases substantially.
Sources and Further Reading
Primary Sources
- Anthropic, "Claude Enterprise," anthropic.com - Enterprise product overview and governance features.
- Anthropic, "Prompt Caching," Claude API Docs - Technical documentation on caching mechanics, TTL, and pricing.
- Anthropic, "New Capabilities for Building Agents on the Anthropic API," anthropic.com, 2025 - Code execution, Files API, and MCP connector announcements.
- Anthropic, "Introducing the Model Context Protocol," anthropic.com, November 2024 - Original MCP announcement.
- Anthropic, "Models Overview," Claude API Docs - Model specifications, context windows, and capabilities.
- Anthropic, "Pricing," Claude API Docs - Current token pricing across all model tiers.
Industry Analysis and Adoption Data
- SecurityWeek, "Anthropic Expands Claude's Enterprise Security Governance With 28 New Integrations," May 2026 - Coverage of the Compliance API and security integration launch.
- TechCrunch, "Anthropic launches new push for enterprise agents with plug-ins," February 2026 - Cowork enterprise plugin expansion.
- TechCrunch, "Anthropic brings agentic plug-ins to Cowork," January 2026 - Initial Cowork plugin launch.
- InfoQ, "Anthropic Launches Claude Platform on AWS," May 2026 - AWS platform deployment coverage.
- Caylent, "Claude Platform on AWS: An Architecture Decision Guide," 2026 - AWS-specific architecture guidance.
- Digital Applied, "MCP Adoption Statistics 2026" - MCP adoption metrics and survey data.
Research
- Phan et al., 2025, "Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems," arXiv:2604.14228 - Academic analysis of Claude Code's agent architecture.
- Anthropic, "2026 Agentic Coding Trends Report" - Industry survey on agentic coding adoption.
Additional Resources
- WorkOS, "Everything Your Team Needs to Know About MCP in 2026" - Practical MCP implementation guide.
- MCP Wikipedia entry - Protocol history and standards governance.
- Anthropic knowledge-work-plugins repository, GitHub - Open-source Cowork plugin implementations.