← Blog

Microsoft Azure AI Foundry: The Enterprise AI Development Platform

June 02, 2026 · 24 min read

In November 2023, Microsoft launched Azure AI Studio. One year later, at Ignite 2024, it became Azure AI Foundry. One year after that, at Ignite 2025, it became Microsoft Foundry. Three names in three years. The rebranding velocity is unusual even by Microsoft standards, and it obscures something worth paying attention to: each rename accompanied a genuine architectural expansion. What started as a model playground is now a platform-as-a-service that unifies model hosting, agent orchestration, evaluation pipelines, content safety, and enterprise governance under a single Azure resource provider.

The platform serves over 80,000 enterprises, including 80 percent of Fortune 500 companies (Microsoft Foundry product page). It hosts 1,900+ models from OpenAI, Anthropic, Meta, Mistral, DeepSeek, xAI, Cohere, and Hugging Face. It provides a managed agent runtime, a prompt flow visual builder, built-in evaluators for groundedness and coherence, and a local SDK that runs models on-device with zero cloud dependency.

Why this matters: Enterprise AI platform selection has become one of the most consequential infrastructure decisions of 2026. The choice between Foundry, AWS Bedrock, and Google Vertex AI determines not just which models you can access, but how you govern them, how you evaluate them, and how tightly they integrate with your existing identity and compliance stack. Understanding Foundry's architecture, not just its marketing, is the prerequisite.

TL;DR

  • Microsoft Foundry (formerly Azure AI Foundry, formerly Azure AI Studio) is a unified PaaS for building, deploying, evaluating, and governing AI applications on Azure.
  • The resource hierarchy follows a Foundry Resource (governance) > Project (development isolation) model, with RBAC scoped at both levels.
  • The model catalog offers 1,900+ models with two deployment paths: serverless (pay-per-token) and managed compute (hourly VM billing). Serverless further splits into Global, Data Zone, and Regional tiers for data residency.
  • Foundry Agent Service supports three agent types: prompt agents (no-code), workflow agents (multi-step YAML/visual), and hosted agents (custom code in containers on isolated Micro VMs).
  • Built-in evaluators cover groundedness, coherence, relevance, fluency, and safety metrics; the azure-ai-evaluation SDK runs them programmatically.
  • Foundry Tools (the rebrand of Azure AI Services) consolidates Speech, Vision, Language, Translator, Content Understanding, and Document Intelligence under the Foundry umbrella.
  • Foundry Local runs models entirely on-device (GPU, NPU, or CPU) with SDKs for Python, JavaScript, C#, and Rust, and an OpenAI-compatible API.
  • The platform is free to explore; costs accrue at the deployment level, with GPT-4o at $2.50/$10.00 per million input/output tokens (Global Standard).

At a Glance

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1e40af', 'primaryTextColor': '#fff', 'primaryBorderColor': '#60a5fa', 'lineColor': '#94a3b8', 'textColor': '#e2e8f0', 'clusterBkg': '#1e293b', 'clusterBorder': '#334155', 'fontSize': '16px'}}}%%
flowchart LR
    subgraph Develop["Development"]
        MC["Model Catalog<br/>1900+ models"]
        PF["Prompt Flow"]
        AS["Agent Service"]
        FL["Foundry Local"]
    end
    subgraph Evaluate["Evaluation"]
        BI["Built-in Evaluators"]
        CE["Custom Evaluators"]
        TR["Tracing + Observability"]
    end
    subgraph Deploy["Deployment"]
        SV["Serverless API"]
        MN["Managed Compute"]
        BA["Batch Processing"]
    end
    subgraph Govern["Governance"]
        CS["Content Safety"]
        CP["Control Plane"]
        RB["RBAC + Entra ID"]
    end
    MC --> PF
    MC --> AS
    PF --> BI
    AS --> BI
    BI --> SV
    CE --> SV
    BI --> MN
    SV --> CS
    MN --> CS
    CS --> CP
    CP --> RB

    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
    classDef teal fill:#0e7490,stroke:#22d3ee,stroke-width:1px,color:#fff
    classDef emerald fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff

    class MC,PF,AS,FL blue
    class BI,CE,TR purple
    class SV,MN,BA teal
    class CS,CP,RB emerald

[IMAGE: Screenshot of the Microsoft Foundry portal showing the model catalog with filter sidebar, model cards for GPT-5, Claude, and Llama, and the deployment type selector]

Before Foundry

Enterprise AI development on Azure evolved through a series of disconnected services, each solving one piece of the puzzle but leaving integration to the customer.

Azure Cognitive Services (launched 2016) offered pre-built APIs for vision, speech, and language. Azure Machine Learning (expanded significantly in 2019) provided training infrastructure and MLOps pipelines. Azure OpenAI Service (preview late 2022, GA January 2023) gave enterprises access to GPT models behind Azure's compliance boundary. Each service had its own portal, its own SDK, its own resource model, and its own networking story. Building an application that combined a GPT model with content safety, speech-to-text, and a retrieval pipeline meant stitching together four separate Azure resources, each with distinct RBAC scopes and networking configurations.

%%{init: {'theme': 'base', 'themeVariables': {'cScale0': '#1e40af', 'cScale1': '#6d28d9', 'cScale2': '#b45309', 'cScale3': '#be123c', 'cScale4': '#047857', 'cScale5': '#0e7490', 'cScale6': '#1e40af', 'cScaleLabel0': '#e2e8f0', 'cScaleLabel1': '#e2e8f0', 'cScaleLabel2': '#e2e8f0', 'cScaleLabel3': '#e2e8f0', 'cScaleLabel4': '#e2e8f0', 'cScaleLabel5': '#e2e8f0', 'cScaleLabel6': '#e2e8f0', 'textColor': '#e2e8f0', 'lineColor': '#94a3b8', 'fontSize': '16px'}}}%%
timeline
    title Evolution of Microsoft's AI Platform
    2016 : Azure Cognitive Services launches (Vision, Speech, Language APIs)
    2019 : Azure Machine Learning gets designer and MLOps pipelines
    2022 : Azure OpenAI Service enters preview (GPT-3.5 behind Azure compliance)
    2023-Nov : Azure AI Studio launches, first unified portal
    2024-May : Azure AI Studio reaches GA
    2024-Nov : Rebranded to Azure AI Foundry at Ignite 2024
    2025-May : Agent Service launches, Foundry Local announced at Build 2025
    2025-Nov : Rebranded to Microsoft Foundry at Ignite 2025, Foundry Tools replaces Azure AI Services

The naming churn matters because each transition was not cosmetic. Azure AI Studio (November 2023) unified the portal. Azure AI Foundry (November 2024) introduced the hub-based resource model and the model catalog's serverless deployment option. Microsoft Foundry (November 2025) dropped "Azure" from the name to signal cross-platform ambitions spanning Azure, Microsoft 365, and Fabric. It also replaced the hub + project model with a flatter Foundry Resource + Project hierarchy, introduced the Responses API for agents (replacing the Assistants API), unified five SDKs into one (azure-ai-projects 2.x), and rebranded Azure AI Services as Foundry Tools.

[IMAGE: Side-by-side comparison diagram showing the old resource model (Hub + Azure OpenAI + Azure AI Services as separate resources) versus the new model (single Foundry Resource containing projects, model deployments, and Foundry Tools)]

How Microsoft Foundry Actually Works

The Resource Hierarchy

Foundry's architecture enforces a two-level hierarchy. The Foundry Resource is the top-level Azure resource (provider: Microsoft.CognitiveServices/accounts, kind: AIServices). It owns governance: networking configuration, model deployments, security settings, and connections to external Azure services like Storage, Key Vault, and Azure AI Search. A Project is a child resource (Microsoft.CognitiveServices/accounts/projects) that provides development isolation. Teams build agents, run evaluations, and upload files within a project scope. Multiple projects share the parent resource's model deployments and connections without duplicate IT setup.

This separation is deliberate. IT administrators configure networking, encryption, and RBAC at the resource level once. Development teams operate within project boundaries. Control plane actions (creating deployments, configuring networking) are distinct from data plane actions (building agents, running evaluations, uploading files) in the RBAC model.

Connected resources (Azure Storage, Azure Key Vault, Azure AI Search) remain independent Azure resources with their own governance boundaries. Foundry references them through connections but does not absorb their networking or access policies. You manage those separately.

The Model Catalog

The catalog organizes its 1,900+ models into two categories:

Models sold by Azure (Azure Direct models) are hosted and sold by Microsoft under Microsoft Product Terms. These include the GPT family (GPT-5, GPT-4.1, GPT-4o), Phi-4 (Microsoft's small language model), and select models from partners like Mistral and Cohere. They come with Microsoft support, enterprise SLAs, and deep Azure integration. Some support fungible provisioned throughput, meaning you can flexibly allocate quota across any of these models.

Models from partners and community constitute the majority of the catalog. Anthropic's Claude family, Meta's Llama models, Hugging Face's open-source collection, and models from DeepSeek, xAI, NVIDIA, and others. These are supported by their providers, with Microsoft managing the hosting infrastructure for serverless deployments.

[IMAGE: Table visualization showing the model catalog categories with representative models: Azure Direct (GPT-5, GPT-4.1, Phi-4) vs Partner/Community (Claude, Llama, Mistral, DeepSeek-R1), with deployment options and pricing model for each]

Deployment Types

Two fundamental deployment paths exist, each with sub-variants:

Serverless deployments provision an API endpoint. Microsoft manages the infrastructure. Billing is per-token. Within serverless, nine deployment types control data residency and pricing:

Deployment Type Data Processing Billing
Global Standard Cross-region, Azure-managed Pay-per-token
Global Provisioned Cross-region, Azure-managed Hourly reserved capacity
Global Batch Cross-region, Azure-managed Batch token pricing
Data Zone Standard Within US or EU boundary Pay-per-token
Data Zone Provisioned Within US or EU boundary Hourly reserved capacity
Standard Single region Pay-per-token
Regional Provisioned Single region Hourly reserved capacity

The distinction matters for compliance. A European bank required to keep data within the EU uses Data Zone Standard. A startup optimizing for cost uses Global Standard. A high-throughput production system with predictable volume uses Provisioned.

Managed compute deploys model weights to dedicated VMs in your subscription. You get the full model (weights, container runtime) running on your hardware quota. Billing is per VM-hour, regardless of query volume. This path supports the Hugging Face collection and fine-tuned models that need custom runtimes.

Foundry Agent Service

The Agent Service is where Foundry's ambitions become most visible. It is a fully managed platform for building AI agents that go beyond text generation to take autonomous actions.

Three agent types serve different complexity levels:

Prompt agents require zero code. Define instructions, select a model, attach tools, and the service handles orchestration. Build one in the portal in minutes.

Workflow agents (preview) orchestrate multi-step sequences or coordinate multiple agents. Define them visually in the portal or in YAML through VS Code. They support branching logic, human-in-the-loop steps, and group-chat patterns.

Hosted agents (preview) are containers you build with any framework (Agent Framework, LangGraph, your own code) and deploy on isolated Micro VMs that scale independently. You own the orchestration logic; Foundry manages the runtime.

The tool ecosystem is substantial. Over 1,400 tools are available through public and private catalogs, including built-in tools (web search, file search, code interpreter, memory), MCP server integrations (including Azure DevOps in preview), and custom functions via Azure Functions. Authentication supports key-based access, Entra managed identity, OAuth On-Behalf-Of passthrough, and unauthenticated access.

Each agent can carry a dedicated Microsoft Entra identity, enabling scoped access to resources without shared credentials. Agents can run within Azure Virtual Networks for network isolation. The Foundry Control Plane (public preview) provides centralized governance across all agents in a subscription, with guardrails covering task adherence, sensitive data detection, groundedness, and prompt injection mitigation.

[IMAGE: Architecture diagram showing the three agent types (Prompt, Workflow, Hosted) with their respective hosting models, tool connections, and the Control Plane governance layer wrapping around all three]

Evaluation and Observability

Foundry ships built-in evaluators in the azure-ai-evaluation SDK:

from azure.ai.evaluation import evaluate, GroundednessEvaluator

evaluator = GroundednessEvaluator()
result = evaluator(
    response="The capital of France is Paris.",
    context="France is a country in Western Europe. Its capital is Paris."
)

The evaluator categories include:

  • Quality metrics: Coherence, fluency, relevance, similarity
  • RAG-specific metrics: Groundedness (is the response supported by the retrieved context?), Groundedness Pro (strict adherence check via Content Safety), retrieval relevance
  • Safety metrics: Hate/unfairness, violence, sexual content, self-harm, protected material detection, prompt injection detection
  • Agent-specific metrics: Tool call accuracy, task completion, intent resolution

Tracing integrates with Application Insights. Every model call, tool invocation, and agent decision is logged, giving teams an end-to-end view of agent behavior in production. Continuous evaluation monitors deployed agents against quality and safety baselines, firing alerts when metrics drift.

Foundry Tools (Formerly Azure AI Services)

The November 2025 rebrand consolidated Azure Cognitive Services / Azure AI Services under the Foundry Tools name. The capabilities remain identical, but they now share the Microsoft.CognitiveServices provider namespace with Foundry, meaning consistent RBAC, networking, and policy behavior.

The toolkit includes: Azure Speech (recognition, text-to-speech, translation), Azure Vision (image analysis, OCR, object detection), Azure Language (CLU, summarization, NER, health data analysis), Azure Translator, Content Understanding (GA with bring-your-own-model support), and Document Intelligence.

Foundry Local

Foundry Local runs models entirely on-device. No Azure subscription required. No cloud dependency. The SDK detects hardware capabilities (GPU, NPU, CPU) and automatically selects the optimal ONNX Runtime execution provider.

Install is one line: pip install foundry-local-sdk (Python), npm install foundry-local-sdk (JavaScript), or the equivalent for C# and Rust. The API is OpenAI-compatible, meaning existing code that targets the OpenAI chat completions endpoint works with minimal changes.

Models download from the Foundry Model Catalog with automatic versioning and hardware-optimized selection. Supported models include Phi-4-mini, Qwen2.5, and other small language models optimized for on-device inference. A Windows-specific package (foundry-local-sdk-winml) integrates with Windows ML for broader hardware acceleration.

Seeing It in Motion

End-to-End Agent Development Flow

%%{init: {'theme': 'base', 'themeVariables': {'actorBkg': '#1e40af', 'actorTextColor': '#fff', 'actorBorder': '#3b82f6', 'signalColor': '#94a3b8', 'signalTextColor': '#e2e8f0', 'labelBoxBkgColor': '#1e293b', 'labelBoxBorderColor': '#334155', 'labelTextColor': '#e2e8f0', 'loopTextColor': '#e2e8f0', 'noteBkgColor': '#1e293b', 'noteTextColor': '#e2e8f0', 'noteBorderColor': '#475569', 'activationBorderColor': '#3b82f6', 'activationBkgColor': '#1e3a5f', 'fontSize': '16px'}}}%%
sequenceDiagram
    participant Dev as Developer
    participant Portal as Foundry Portal
    participant Agent as Agent Service
    participant Model as GPT-4o
    participant Tools as Tool Catalog
    participant Eval as Evaluators

    Dev->>Portal: Create project
    Dev->>Portal: Select model + define instructions
    Portal->>Agent: Provision prompt agent
    Agent->>Tools: Attach file search + web search
    Dev->>Agent: Test in playground
    Agent->>Model: Send prompt + tool calls
    Model-->>Agent: Response with citations
    Agent-->>Dev: Display result
    Dev->>Eval: Run groundedness + coherence checks
    Eval-->>Dev: Scores + flagged issues
    Dev->>Agent: Publish as versioned endpoint
    Note over Agent: Auto-scaling, Entra identity, Content Safety active

Model Selection and Deployment Decision Tree

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1e40af', 'primaryTextColor': '#fff', 'primaryBorderColor': '#60a5fa', 'lineColor': '#94a3b8', 'textColor': '#e2e8f0', 'clusterBkg': '#1e293b', 'clusterBorder': '#334155', 'fontSize': '16px'}}}%%
flowchart TD
    Q1["Need to run on-device?"]
    Q2["Need custom model weights?"]
    Q3["Volume above 150M tokens/month?"]
    Q4["Data residency required?"]
    FL["Foundry Local<br/>Phi-4, Qwen2.5"]
    MC["Managed Compute<br/>Hourly VM billing"]
    PTU["Provisioned Throughput<br/>Reserved capacity"]
    DZ["Data Zone Standard<br/>US or EU boundary"]
    GS["Global Standard<br/>Pay-per-token"]

    Q1 -->|Yes| FL
    Q1 -->|No| Q2
    Q2 -->|Yes| MC
    Q2 -->|No| Q3
    Q3 -->|Yes| PTU
    Q3 -->|No| Q4
    Q4 -->|Yes| DZ
    Q4 -->|No| GS

    classDef blue fill:#1e40af,stroke:#3b82f6,stroke-width:1px,color:#fff
    classDef purple fill:#6d28d9,stroke:#a78bfa,stroke-width:1px,color:#fff
    classDef teal fill:#0e7490,stroke:#22d3ee,stroke-width:1px,color:#fff
    classDef amber fill:#b45309,stroke:#fbbf24,stroke-width:1px,color:#fff
    classDef emerald fill:#047857,stroke:#34d399,stroke-width:1px,color:#fff

    class Q1,Q2,Q3,Q4 slate
    class FL emerald
    class MC amber
    class PTU purple
    class DZ teal
    class GS blue

    classDef slate fill:#334155,stroke:#64748b,stroke-width:1px,color:#e2e8f0

[IMAGE: Pricing comparison chart showing cost curves for Global Standard vs Provisioned Throughput, with the break-even crossover at approximately 150-200M tokens/month, annotated with typical enterprise workload ranges]

By the Numbers

Model Pricing (Global Standard, Serverless API)

Model Input (per 1M tokens) Output (per 1M tokens) Notes
GPT-5 Premium tier Premium tier Most capable, complex reasoning
GPT-4o $2.50 $10.00 Production workhorse
GPT-4.1 $2.00 $8.00 Best cost-to-capability ratio
GPT-4.1 mini Lower tier Lower tier Low-latency, high-throughput
Phi-4-mini $0.07 $0.23 On-device or budget scenarios
Claude (Anthropic) Varies by model Varies by model Partner billing via Azure Marketplace
Llama (Meta) Varies by model Varies by model Open models, fine-tuning support

Source: Azure AI Foundry Models Pricing

Platform Scale

Metric Value Source
Models in catalog 1,900+ Microsoft Foundry docs
Enterprise customers 80,000+ Microsoft Foundry product page
Fortune 500 adoption 80% Microsoft Foundry product page
Tools in agent catalog 1,400+ Agent Service overview
SDK languages 4 (Python, C#, JS/TS, Java) Microsoft Foundry docs
Foundry Local languages 4 (Python, JS, C#, Rust) Foundry Local docs

Cost Break-Even Analysis

For GPT-4o specifically, the break-even between pay-per-token and provisioned throughput sits at approximately 150 to 200 million tokens per month. Below that threshold, serverless pay-as-you-go is cheaper. Above it, provisioned throughput units (PTUs) deliver better economics for consistent, high-volume production workloads. The effective blended rate for a typical 60/40 input-to-output split on GPT-4o is roughly $5.50 per million tokens (Azure AI Foundry Pricing Guide, 2025).

[IMAGE: Bar chart comparing monthly costs at 50M, 100M, 200M, and 500M token volumes for Global Standard vs Provisioned Throughput, with the crossover point highlighted]

A Concrete Example

Consider an enterprise building a customer support agent using Foundry. The agent needs to answer questions about company policies, search through knowledge bases, and escalate complex issues to human operators.

Step 1: Provision the infrastructure.

Create a Foundry Resource in East US 2. Connect an Azure Storage account for document uploads and an Azure AI Search instance for the knowledge base. Set up a project called "support-agent."

Step 2: Build the agent.

from azure.identity import DefaultAzureCredential
from azure.ai.projects import AIProjectClient

project = AIProjectClient(
    endpoint="https://contoso-ai.ai.azure.com/api/projects/support-agent",
    credential=DefaultAzureCredential(),
)

# Create a prompt agent with tools
agent = project.agents.create(
    model="gpt-4o",
    instructions="""You are a customer support agent for Contoso.
    Answer questions using the knowledge base. If you cannot find
    an answer, escalate to a human operator. Always cite your sources.""",
    tools=[
        {"type": "file_search"},   # Searches uploaded policy documents
        {"type": "web_search"},     # Falls back to web for general queries
    ],
)

Step 3: Index documents.

Upload the company policy PDFs and FAQ documents to the project's file storage. The file search tool automatically indexes them for retrieval.

Step 4: Test in the playground.

Send test queries through the Foundry portal's agent playground: "What is your return policy for electronics?" The agent retrieves the relevant policy section, generates a response with citations, and the content safety filters screen the output.

Step 5: Evaluate.

from azure.ai.evaluation import evaluate, GroundednessEvaluator, RelevanceEvaluator

results = evaluate(
    data="test_queries.jsonl",
    evaluators={
        "groundedness": GroundednessEvaluator(),
        "relevance": RelevanceEvaluator(),
    },
    target=agent_function,
)
# Results: groundedness 0.92, relevance 0.89

The groundedness score of 0.92 means 92% of responses are fully supported by the retrieved context. The team investigates the 8% gap, finds edge cases where the knowledge base lacks coverage, and adds documents.

Step 6: Deploy and publish.

Publish the agent as a versioned endpoint. Enable continuous evaluation to monitor groundedness and relevance daily. Connect the agent to Microsoft Teams via the publishing pipeline so support staff can invoke it directly in their workflow.

[IMAGE: Dashboard screenshot showing the agent's evaluation results over time, with groundedness and relevance scores plotted as time series, alongside token consumption and response latency metrics]

Where It Breaks

The naming and migration tax is real. Three names in three years means three rounds of documentation updates, SDK migrations, and portal navigation changes. The shift from the Assistants API to the Responses API, from hub-based projects to Foundry projects, and from five SDKs to one is substantial. Teams that built on Azure AI Studio in 2024 face a genuine migration effort to the current Foundry resource model.

Portal instability during transitions. The "classic" and "new" Foundry portals coexist, with a toggle banner to switch between them. Some features (managed compute deployments, Hugging Face model hosting) still require the classic portal. This is confusing for teams that do not track which features live where.

Regional availability gaps. Not all models, deployment types, and features are available in all Azure regions. Agent Service, evaluations, and specific model families have region-specific availability. A team that provisions in a region for data residency reasons may find that their preferred model or agent feature is not available there.

Vendor lock-in through integration depth. Foundry's strongest selling point (deep integration with Entra ID, Microsoft 365, Teams, SharePoint, and Microsoft Graph) is also its lock-in mechanism. Publishing agents to Teams and M365 Copilot is trivial on Foundry and requires significant custom work on competing platforms. Organizations that invest deeply in Foundry's agent publishing pipeline tie their agent distribution to the Microsoft ecosystem.

Content safety is not optional by default. Azure AI Content Safety filters run inline with model requests and cannot be fully disabled for all deployment types. For some use cases (security research, red-teaming, medical contexts), the default filtering can interfere. Fine-grained control exists but requires per-deployment configuration.

Pricing complexity. Nine serverless deployment types, managed compute VM tiers, Foundry Tools consumption, Azure AI Search costs, storage costs, and partner model marketplace billing create a pricing surface area that is genuinely difficult to forecast. The Azure Pricing Calculator helps, but organizations routinely underestimate total costs because they forget the connected resources.

Alternative Designs

Platform Strengths Weaknesses Best When
Microsoft Foundry Deepest M365/Entra integration; exclusive Azure OpenAI access (GPT-4o, GPT-5); managed agent runtime; 1,900+ model catalog; Foundry Local for on-device Three renames in three years; regional gaps; pricing complexity; classic/new portal split Microsoft-aligned enterprise; Teams/M365 agent distribution; Azure compliance boundary needed
AWS Bedrock Broadest model catalog (40+ FMs from 8+ providers); AgentCore for managed agent hosting; Guardrails API; strong S3/Lambda integration No visual flow builder matching Prompt Flow; agent tooling newer than Foundry's; less enterprise identity integration AWS-native organizations; multi-provider model access priority; serverless-first architectures
Google Vertex AI Best Jupyter/notebook experience; Gemini models exclusive to GCP; strong ML training pipeline integration; Vertex AI Search Smallest third-party model catalog of the three; enterprise identity story weaker outside Google Workspace Data science-heavy teams; GCP-native workloads; Gemini-first strategies
Self-hosted (vLLM, Ollama) Full control; no vendor lock-in; open-source flexibility; no per-token fees Operational burden for scaling, monitoring, safety; no managed agent runtime; no built-in evaluation Cost-sensitive at extreme scale; regulatory requirements forbidding cloud inference; custom model serving needs

The differentiator for Foundry is not model access alone (Bedrock arguably matches or exceeds it in breadth). The differentiator is the integration depth: Entra identity flowing through to agents, publishing directly to Teams and M365 Copilot, Azure Policy applying uniformly across Foundry and connected resources, and the Foundry Control Plane providing centralized governance. For Microsoft-aligned enterprises, this integration eliminates weeks of custom plumbing. For non-Microsoft shops, it is irrelevant.

[IMAGE: Radar chart comparing Microsoft Foundry, AWS Bedrock, Google Vertex AI, and self-hosted across six axes: model catalog breadth, enterprise identity integration, agent tooling maturity, evaluation framework, pricing flexibility, and on-device capability]

How It Is Used in Practice

Accenture deployed 75+ use cases across industries on Foundry, with 16 in production, reducing AI application build time by 50 percent (Microsoft Customer Story, 2025).

H&R Block built a tax question answering system using Azure AI Foundry and Azure OpenAI Service. The system answers filers' questions while maintaining safeguards for accuracy, a domain where hallucination has direct financial consequences for users.

UBS created a Legal AI Assistant (LAIA) that searches across 26 million documents in multiple languages to find related clauses, using Foundry's model deployments and Azure AI Search for retrieval.

Air India deployed a Foundry Agent Service-powered virtual assistant for customer service, transforming their interaction flow from static decision trees to dynamic, model-driven conversations.

ASOS built an AI-powered virtual stylist using Foundry, enabling personalized fashion recommendations through conversational interaction with customers.

The common pattern across these deployments: organizations start with a simple model deployment (often GPT-4o), layer on retrieval (Azure AI Search), add evaluation (groundedness checking), and then graduate to agents when they need autonomous tool use. The progression from chat endpoint to evaluated RAG application to autonomous agent typically takes 3 to 6 months.

[IMAGE: Maturity curve diagram showing the typical enterprise progression: Phase 1 (Model API, 1-2 months), Phase 2 (RAG + Evaluation, 2-4 months), Phase 3 (Agents + Governance, 4-6 months), Phase 4 (Multi-agent orchestration, 6-12 months)]

Insights Worth Remembering

  1. The platform itself is free. Costs accrue only at the deployment and consumption level. This means a team can explore the catalog, build in the portal, and test in playgrounds without spending anything until they deploy.

  2. Dropping "Azure" from the name was the most significant signal. Microsoft Foundry spans Azure, Microsoft 365, and Fabric. It is positioning itself as a company-wide AI platform, not an Azure service. Watch for deeper integration with Dynamics 365, Power Platform, and Windows.

  3. The Responses API replaced the Assistants API. This is not just a rename. The new API uses "conversations" and "items" instead of "threads" and "messages," with a fundamentally different state model. Code written against the Assistants API (agents v0.5/v1) needs rewriting for the Responses API (agents v2).

  4. Fungible provisioned throughput is a quiet cost optimization lever. Being able to share reserved capacity across GPT-4o, GPT-4.1, and other Azure Direct models means you can shift workloads without provisioning separate quotas for each model.

  5. Foundry Local's OpenAI API compatibility is strategically important. It means the same code that calls GPT-4o in the cloud can call Phi-4-mini on a laptop with a one-line endpoint change. This is Microsoft's answer to the growing demand for hybrid cloud/edge AI deployment.

  6. The evaluation story is stronger than competitors'. Built-in evaluators for groundedness, coherence, relevance, and safety, combined with continuous evaluation in production, give Foundry a more complete eval pipeline than Bedrock's Guardrails or Vertex AI's evaluation tools (though all three are converging).

  7. Agent publishing to M365 is the real moat. An agent built in Foundry can be published to Teams, BizChat, and M365 Copilot with a few clicks. No other platform offers this. For enterprises whose workforce lives in Microsoft 365, this eliminates the distribution problem entirely.

  8. The Toolbox (preview) for MCP server management hints at where the ecosystem is heading. Define tools once, version them, expose them through a single MCP-compatible endpoint, and any MCP client can consume them. This is Microsoft's play for becoming the tool registry for the agentic era.

  9. Content Understanding reaching GA with bring-your-own-model support fills a gap. Enterprises that need document parsing beyond what built-in models handle can now bring GPT family models into the Content Understanding pipeline, blurring the line between pre-built and custom AI.

  10. The three-portal problem (Azure Portal, Foundry classic, Foundry new) will persist through at least mid-2026. Teams should standardize on the new portal for all new work while accepting that some managed compute workflows still require classic.

Open Questions

Will Microsoft complete the classic-to-new portal migration without breaking existing workflows? Hub-based projects, managed compute deployments, and certain model types still require the classic portal. The migration path is documented, but the timeline for deprecating the classic experience is unclear.

How will pricing evolve for agent-intensive workloads? Agent Service charges for model calls, tool invocations, and memory operations separately. As agents become more autonomous and multi-step, the total cost per agent interaction could surprise teams that modeled costs based on simple chat endpoints. Outcome-based pricing (per resolved ticket rather than per token) is an industry trend that Microsoft has not yet adopted for Foundry.

Can Foundry Local scale beyond small language models? Today it supports models like Phi-4-mini and Qwen2.5-0.5b. Running larger models (7B+) on-device remains hardware-constrained. The question is whether ONNX Runtime optimizations and NPU acceleration will push the feasible model size boundary significantly in 2026-2027.

Will the Agent Orchestrator (preview August 2026) deliver on multi-framework coordination? The promise of orchestrating Semantic Kernel, LangChain, and vanilla REST agents under one plane is compelling, but multi-framework orchestration has historically been more difficult in practice than in architecture diagrams.

How will Microsoft handle the Anthropic relationship within Foundry? Claude models are available as partner models with separate terms and billing. As Anthropic competes more directly with OpenAI (Microsoft's primary AI partner), the governance and integration depth for Claude models within Foundry may diverge from Azure Direct models.

Sources and Further Reading

Official Documentation

Pricing and Cost Planning

Platform Evolution

Enterprise Adoption

Competitive Analysis

Technical Blogs

Sign in to save and react.
Share Copied