Microsoft Foundry Build 2026: Hosted Agents, Procedural Memory, and 11,000 Models

Microsoft's production agent platform added hosted runtimes, three memory types, and a governance layer enterprise teams have been waiting for

Saganote
Saganote ยท
6 Min Read

Microsoft Foundry shipped its most production-focused update at Build 2026, announced June 2. Hosted agents in Foundry Agent Service move toward general availability in early July, agent memory now includes procedural learning that delivers 7-14% task success gains in benchmarks, and Foundry IQ replaces custom retrieval-augmented generation pipeline work with a single SLA-backed knowledge endpoint. For teams who have been assembling these pieces manually on AWS Bedrock or Google Vertex AI, most of that infrastructure now ships as a managed service on Azure. Developer testing is open now.

Not everything here is generally available

Hosted agents reach GA in early July 2026. Toolboxes, all three memory types, Routines, and several evaluation tools remain in public preview and may change before release.

BuildingStart WithStatus
Production agent runtimeHosted agents in Foundry Agent ServiceGA in July 2026
Agent with many toolsToolboxes in FoundryPublic preview
Enterprise knowledge retrievalFoundry IQ knowledge basesGenerally available
Real-time voice agentVoice Live (prompt agents)Generally available
Agent safety and evaluationASSERT + Agent Control SpecificationOpen source

Hosted Agents Reach General Availability in July, Framework-Agnostic by Design

Production agents built with Microsoft Foundry no longer require teams to manage their own runtime infrastructure. Hosted agents in Foundry Agent Service provide sandboxed sessions with dedicated compute, persistent memory, and filesystem access - and the runtime accepts agents built with Microsoft Agent Framework, GitHub Copilot SDK, LangGraph, or the Claude Agent SDK without requiring rewrites. General availability arrives in early July 2026.

Hosted agents support two protocols. Responses API handles OpenAI-compatible stateful interactions; Invocations protocol is schema-free, for teams that control both the request and response format entirely. Routines, in public preview, extend both by letting any agent run on a schedule - overnight issue triage, daily report generation, or any recurring autonomous task that currently requires custom cron infrastructure.

Foundry Toolkit for VS Code is now generally available. Developers can create agents from templates, debug runs locally with trace visualization, connect to Toolboxes, and deploy directly to Foundry Agent Service without leaving the editor. For most teams, that is the shortest path from a working local prototype to a production deployment.

Procedural Memory Delivers 7-14% Task Success Gains in Benchmarks

Memory in Foundry Agent Service, in public preview, now includes three distinct types. Procedural memory is the new one - it helps agents learn how to complete work across sessions rather than just remembering what was said previously, and early Tau-bench results show 7-14% absolute gains in task success rate at near-baseline compute cost. For teams building agents that handle repeated workflows, that is a concrete benchmark number worth testing against your own use case before dismissing it.

User memory and session memory complete the set. User memory persists facts and preferences across separate sessions - a stored preference like "user prefers metric units" survives the next conversation entirely. Session memory handles context within a single thread. Neither type requires custom vector storage or retrieval logic on the developer's side; both run inside Foundry Agent Service.

Voice Live, now generally available for prompt agents, adds real-time voice to any of these memory and tool combinations. Speech recognition, text-to-speech, turn detection, interruption handling, and avatars ship as one API. Teams that need full control over the orchestration stack can connect hosted agents to Voice Live directly - that path is in public preview.

Foundry IQ Replaces the Custom RAG Pipeline

Building a retrieval-augmented generation pipeline from scratch typically takes weeks: chunking, indexing, retrieval tuning, and a separate integration per data source. Foundry IQ eliminates that work. One SLA-backed endpoint handles retrieval across Work IQ, Fabric IQ, Azure SQL, File Search, and MCP sources without custom plumbing. Knowledge bases are now generally available.

Web IQ extends Foundry IQ to live data. Sub-200 millisecond web grounding with zero data retention gives agents access to live web results, news, video, and image search without Microsoft storing any query content. Web IQ currently requires limited access approval - it is not open to all Azure customers yet.

Toolboxes, the new managed tool endpoint in public preview, wires directly into Foundry IQ so agents access enterprise knowledge through one governed URL. Skills are now versioned in a project-scoped catalog. Configure the Toolbox once and every MCP client in the project points at a single endpoint, with auth, lifecycle management, and governance handled by Foundry. Tool search, in preview, helps the agent select the right tool per task rather than surfacing the entire catalog to the model at runtime - a real problem when the catalog has 11,000+ entries.

Four New MAI Models, Fireworks AI, and a Frontier Tuning Cost Benchmark

Microsoft's first-party MAI model family adds four new entries at Build, all entering public preview: MAI-Thinking-1 for chat and reasoning, MAI-Image-2.5 for photorealistic generation and image-to-image editing, MAI-Transcribe-2 for speech-to-text with speaker diarization, and MAI-Voice-2 for multilingual text-to-speech with voice cloning. Four modalities. All in the Foundry catalog under one Azure endpoint.

Fireworks AI on Foundry is now generally available, adding open-model inference with enterprise SLAs, SOC 2 readiness, and provisioned throughput unit support - no separate infrastructure contracts or GPU cluster management. For teams that want low-latency, high-throughput access to open-weight models, Fireworks on Foundry is the no-ops path.

Frontier Tuning delivers more than 10 times better cost efficiency than GPT-5.5 on specific benchmarks, including generating technical Microsoft documentation. Fine-tuning now has a developer tier with no hosting fees - experiment at standard inference cost only. At 11,000+ models in the Foundry catalog, selecting the right model has become its own problem, which is exactly what tool search in Toolboxes is designed to solve at inference time. Multi-model choice on one Azure endpoint is also how Apple is approaching Siri AI in iOS 27 - letting users pick between Gemini, ChatGPT, and Claude - though at the platform level rather than the developer infrastructure level.

ASSERT and ACS Give Enterprise Teams Auditable Agent Safety

Two open-source tools address the governance gap that enterprise teams face when deploying production agents. Both are open source today. ASSERT converts written policies into executable agent evaluations, generates targeted test scenarios, and surfaces safety defects before they reach production - it runs across LangChain, CrewAI, LightLLM, and the OpenAI SDK. Agent Control Specification (ACS) defines deterministic safety controls at five checkpoints in an agent lifecycle: input validation, model behavior, state management, tool execution, and output formatting.

ACS expresses controls as a portable YAML contract - versionable, auditable, and framework-agnostic. Partners at launch include Infosys, KPMG, IBM, Aviatrix, BigSpin, and CrewAI. Reference implementations cover major platforms now. The partner list skews toward compliance-heavy industries, which suggests Microsoft built ACS primarily for teams that need audit documentation, not just better agents.

Guided Guardrail Setup, in public preview, runs a short questionnaire about the agent's audience, data access, and use case, then recommends controls - PII filters, jailbreak protection, task adherence - without requiring the developer to hold a security background. Rubric auto-generates evaluation criteria specific to each agent's context. Agent Optimizer, coming soon in public preview, feeds production traces into ranked improvement suggestions automatically, closing the loop from real usage to measurable changes.

Hosted agents reach general availability in early July 2026. AWS Bedrock and Google Vertex AI face the same challenge: developers need more than model access, they need the surrounding infrastructure to ship production agents reliably. Microsoft's advantage may come down to whether the governance layer - ACS, ASSERT, and the audit trail that KPMG and IBM have already signed onto - holds up at enterprise scale. Whether the 7-14% procedural memory gains replicate outside benchmark conditions is the question production teams will answer in Q3.


Share this
Previous
Noam Shazeer Leaves Google for OpenAI After $2.7 Billion Return

Noam Shazeer Leaves Google for OpenAI After $2.7 Billion Return

Jun 22, 2026

Saganote

About Author

Saganote

Saganote is an independent technology publication covering artificial intelligence, startups, cybersecurity, consumer technology, science, and innovation. Our editorial team reports on the companies, products, and ideas shaping the future.