About This Catalog

This is the ninth volume in a catalog of the working vocabulary of agentic AI. The eight prior volumes covered patterns (the timing of agent runs), skills (model instructions in packaged form), tools (the function-calling primitives), events and triggers (what activates the agent), fabric (the substrate beneath orchestration), memory (state, context, and recall), human-in-the-loop (approval, observation, and interaction), and evaluation and guardrails (the governance layer). This ninth volume covers the layer that distinguishes serious 2026 agent architectures from the chatbots that came before: how multiple agents coordinate, communicate, and collaborate to accomplish what a single agent cannot.

The catalog opens with a warning. The dominant industry trend in 2025—2026 was to default to multi-agent architectures --- every product brochure features a diagram of agents passing messages to other agents --- and this default is usually wrong. Multi-agent systems pay real costs: latency multiplies, errors compound, debugging becomes harder, information loses fidelity in hand-offs between agents, token consumption climbs by an order of magnitude (Anthropic’s own research found roughly 15× token usage for multi-agent versus single-agent on comparable tasks). A single agent with good tools handles most cases better than two agents with mediocre coordination. Multi-agent is a power tool: useful when the lift genuinely justifies the cost, harmful when reached for as a default.

With that caveat established, this catalog covers the cases where multi-agent earns its cost --- tasks too large for one context window, phases that benefit from genuine parallelism, work that decomposes naturally into specialist roles, critique-and-revision loops where separation of producer from reviewer matters --- and the substrates that make multi-agent coordination work: communication protocols (MCP, A2A, ACP), frameworks (LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, Claude Agent SDK), coordination patterns (planner-executor, critic-and-reflection), shared-state mechanisms (scratchpad, blackboard), and the tracing infrastructure that makes multi-agent systems debuggable.

Scope

Coverage:

Communication protocols: MCP (Model Context Protocol), A2A (Agent-to-Agent), ACP / AGNTCY.
Graph-and-supervisor frameworks: LangGraph multi-agent, AutoGen / AG2.
Role-based and hand-off frameworks: CrewAI, OpenAI Agents SDK, Claude Agent SDK subagents.
Event-driven and research frameworks: LlamaIndex Workflows, Microsoft Magentic-One.
Coordination patterns as code: planner-executor, critic-and-reflection.
Shared-state patterns: shared scratchpad, blackboard.
Multi-agent observability: how Volume 7’s tracing tools extend to multi-agent systems.
Discovery and directories: AGNTCY, awesome lists.

Out of scope:

Single-agent systems --- covered by Volumes 1 (Patterns) and 3 (Tools).
General-purpose distributed systems frameworks (Akka, Erlang OTP, Ray) when not specifically used for AI agent coordination.
Message brokers and queue systems (RabbitMQ, Kafka, NATS) when used outside the multi-agent context.
Game-theoretic multi-agent research (cooperative game theory, mechanism design) that hasn’t productized into working frameworks.
Agent simulation environments (Stanford’s generative agents, ChatDev) when treated as research artifacts rather than production substrates.

How to read this catalog

Part 1 (“The Narratives”) is conceptual orientation: when (and when not) to use multi-agent, the five coordination topologies, the communication protocol landscape, the three axes of agent specialization, and the hand-off problem that pervades multi-agent design. Five diagrams sit in Part 1; everything in Part 2 is text and code.

Part 2 (“The Substrates”) is reference material organized by section. Each section opens with a short essay on what its entries have in common and how they relate to alternatives. Representative substrates appear in the Fowler-style template established by the prior eight volumes.

Part 1 — The Narratives

Five short essays frame the design space for multi-agent coordination. The reference entries in Part 2 assume the vocabulary established here.

Chapter 1. When (and When Not) to Use Multi-Agent

Start with the case against. The current industry rhetoric --- every agent product positions itself as a multi-agent platform; every architecture diagram features agents passing messages to other agents --- obscures a basic engineering fact: multi-agent systems pay real costs, and the costs only earn out when the multi-agent design genuinely solves a problem the single-agent design couldn’t. Reaching for multi-agent as a default produces systems that are slower, less reliable, harder to debug, and dramatically more expensive than their single-agent equivalents, with no quality lift to justify any of it.

When to use multi-agent — The default is single agent with good tools. Multi-agent is a power tool; reach for it when the specific situation earns its cost.

The costs are concrete. Latency multiplies: a sequential pipeline of three agents takes roughly three times as long as one agent doing the same work, assuming the work doesn’t naturally parallelize. Errors compound: each hand-off between agents is a fresh opportunity for misunderstanding; small misalignments amplify across stages. Debugging gets harder by an order of magnitude: a single trace tree (Volume 7’s observability layer) becomes a forest, and root-causing why the system did the wrong thing requires reconstructing the conversation between multiple agents rather than reading one agent’s reasoning. Token consumption explodes: Anthropic’s own multi-agent research paper found roughly 15× token usage for multi-agent designs versus single-agent on comparable research tasks --- the agents talk to each other, summarize for each other, re-establish context with each other, and the bill arrives accordingly. Information loses fidelity in every hand-off (Chapter 5 covers this in detail), which means the multi-agent system often produces lower-quality outputs than a hypothetical perfect single agent would, even when the multi-agent design seemed to promise improvement.

The benefits are real but conditional. Multi-agent earns its cost when (1) the task is genuinely too large for a single agent’s context window --- a long research project, a multi-step coding task with many files, a document that exceeds context limits; (2) phases can run in parallel for real wall-clock speedup --- four agents researching four sub-topics simultaneously rather than one agent doing them sequentially; (3) different phases need genuinely different prompts and tools --- a researcher with web-search tools versus a writer with no tool access versus a critic with structured-output requirements; (4) critique and revision benefit from separation --- a critic agent reviewing a writer agent’s output produces better results than the same model self-reviewing, because the critic’s prompt can be unambiguously adversarial without confusing the writer’s flow; (5) domain isolation has compliance value --- a finance agent and a customer-data agent with different access controls satisfies enterprise security requirements that a single agent with all permissions cannot.

The default test is uncomfortably simple: would a single agent with the right tools handle this? If yes, build the single agent. Tool calls already give the agent the effect of consulting specialists --- the agent calling a web-search tool is not architecturally different from the agent talking to a search agent, but it’s far simpler to implement, debug, and operate. The cases where multi-agent earns its cost are real and not rare, but they’re also not the default; they’re the cases where careful design specifically identifies a problem that a multi-agent structure solves. A team that defaults to multi-agent and only reaches for single-agent when forced has the relationship backward, and ships systems whose costs the team will spend the next year explaining away.

Chapter 2. The Five Coordination Topologies

If the multi-agent design has earned its place, the next question is shape. Five topologies capture nearly all the working patterns; each has characteristic strengths, characteristic failure modes, and characteristic production examples.

Hierarchical topology has a manager agent that decomposes the task, dispatches sub-tasks to worker agents, and aggregates their results. The manager owns the planning; the workers own the execution; the structure is recursive (a worker may itself be a manager of its own sub-workers). This is the topology most production multi-agent systems converge on after iteration, because it maps cleanly onto how humans organize complex work --- a project lead breaking down a project, delegating, and reviewing. The bottleneck is the manager: every step routes through it, so its latency and context budget cap the system’s. The strength is legibility: a hierarchical trace tells a coherent story; reasoning about its behavior is tractable. LangGraph’s supervisor pattern and CrewAI’s hierarchical-process mode both implement this directly.

Sequential topology is the pipeline: agent A produces output, agent B receives A’s output and produces something, agent C receives B’s output, and so on to the final stage. This is the simplest multi-agent topology and the easiest to implement; it’s also the most prone to information loss across the hand-off boundaries (Chapter 5). Sequential works when each stage genuinely transforms the input into something the next stage operates on --- extract-then-analyze-then-summarize --- and works poorly when the later stages need context the earlier stages stripped away. CrewAI’s sequential-process mode and LangGraph’s linear chains both implement this; it’s also what happens implicitly in many tool-call sequences.

Peer-network topology lets agents talk to each other freely --- any agent can address any other agent, conversations can be multi-turn and many-to-many, the structure is emergent rather than designed. This is the topology that looks most like “agents collaborating” in marketing diagrams; it’s also the topology that’s hardest to debug, the topology where conversation explosion is a real risk (agents talking endlessly to each other), and the topology where the failure mode “the agents got into a loop” shows up most. Peer networks work in research and exploratory settings where the emergent behavior is the point; in production they require firm orchestration scaffolding (termination conditions, conversation budgets, supervisor agents that step in when things go off the rails). AutoGen’s GroupChat is the canonical peer-network implementation.

Hub-and-spoke topology has a central router agent that doesn’t do work itself but routes incoming requests to specialist agents based on the request’s nature. This is similar to hierarchical but with a different center of gravity: hierarchical is plan-and-delegate, hub-and-spoke is classify-and-route. It’s the right topology for customer-support systems (route to the billing agent, the technical agent, or the returns agent based on the question), enterprise assistant systems (route to the finance agent, the HR agent, or the engineering agent based on intent), and multi-domain applications generally. OpenAI Agents SDK’s handoff mechanism implements this directly; CrewAI can be configured to work this way; the pattern is also implementable directly on top of plain tool calls with one routing agent and several callable specialist agents.

Blackboard topology is the classical AI pattern that turns out to be unexpectedly modern. All agents read from and write to a shared workspace (the blackboard); no agent directly talks to any other agent; coordination emerges from each agent looking at the current state of the blackboard and contributing when it has something to contribute. This decouples the agents --- they don’t need to know about each other --- and avoids the hand-off problem entirely (everyone sees the same world). The pattern shows up in LangGraph’s shared state model, in CrewAI’s context-sharing mechanism, in AutoGen’s group-chat shared history, and explicitly in dedicated blackboard frameworks. It’s often the right topology when agents need to coordinate but their interactions are too irregular for a fixed protocol.

Two cross-cutting observations. First, real production systems usually combine topologies: a hierarchical structure at the top level with hub-and-spoke for routing within each level, blackboard for the cases where shared state matters more than message-passing, sequential for the genuinely pipeline-shaped sub-tasks. Picking one topology and forcing all coordination through it produces awkward designs; mixing topologies thoughtfully produces good ones. Second, the topology is not the architecture: a hierarchical manager-worker design and a blackboard design can implement the same business logic; the topology choice is about the coordination shape, not about what the system does.

Chapter 3. The Communication Protocol Landscape

Beneath the framework layer is a protocol layer. When agent X needs to communicate with agent Y --- to delegate a task, to request information, to coordinate on shared work --- the message has to follow some agreed format. Through 2024 the answer was “whatever the framework uses internally,” which meant multi-agent systems were locked to a single framework. Through 2025 three open protocols emerged to bridge frameworks and enable cross-platform agent communication. As of mid-2026 the protocols overlap, compete, and are starting to converge.

Communication protocols — MCP from Anthropic, A2A from Google, ACP from the Linux Foundation. Overlapping coverage; partial convergence underway.

MCP (Model Context Protocol), introduced by Anthropic in November 2024, started as a protocol for connecting agents to tools and data sources. The original vision was infrastructure-shaped: external MCP servers expose tools, data, or context; agents consume them via JSON-RPC; agent authors don’t need to write custom integrations for every system the agent talks to. Through 2025 the protocol’s scope expanded toward agent-as-server: an MCP server can wrap a whole agent and expose it as a callable resource, effectively making MCP a multi-agent protocol by extension. The strength is the ecosystem --- hundreds of MCP servers exist for tools, databases, SaaS products, and internal systems --- and the strength compounds because new tools naturally publish as MCP servers. The trade-off is that MCP wasn’t designed for agent-to-agent semantics from the start; some patterns natural in a peer-communication protocol require workarounds in MCP.

A2A (Agent-to-Agent Protocol), introduced by Google in April 2025, is the first major protocol designed for agent-to-agent communication from day one. The model is: agents publish Agent Cards (JSON metadata describing their capabilities, endpoints, and authentication); clients discover agents by capability; tasks are delegated through a defined task-and-message lifecycle; the protocol handles streaming responses, long-running tasks, and capability negotiation. JSON over HTTP is the wire format; the model maps naturally onto the case where an agent in one organization needs to invoke an agent in another organization. The strength is purpose-built design --- A2A handles things like agent discovery and capability advertisement that MCP retrofits awkwardly. The trade-off is the ecosystem gap; MCP has years of head start and broader tool coverage.

ACP (Agent Communication Protocol) and the AGNTCY consortium represent the open-governance alternative. The technical scope substantially overlaps A2A --- same problem space, agent-to-agent communication with discovery and task delegation --- but the stewardship model is different: Linux Foundation governance with IBM, Cisco, LangChain, Galileo, and others as founding contributors, deliberately positioned against vendor-controlled protocols. AGNTCY is the broader consortium (directory services, identity, observability standards) of which ACP is the protocol component. The strength is vendor-neutrality, which matters for cross-organization deployments and procurement. The trade-off is governance overhead; consortium-driven protocols evolve more slowly than single-vendor protocols, and ACP is still consolidating its position relative to A2A.

The convergence story is already underway. Multi-protocol gateways exist that translate between MCP, A2A, and ACP; frameworks support multiple protocols; the underlying message shapes are similar enough that translation is mechanical. The mid-2026 picture is roughly: MCP for tool access (the dominant use case, with the strongest ecosystem), A2A or ACP for agent-to-agent peer communication (with the choice often determined by vendor allegiance more than technical fit). Whether the protocols consolidate to a single standard, settle into a stable two-protocol world (tool plus agent), or remain three-way fragmented through 2027 is genuinely uncertain. Building against the protocol that fits the use case and accepting that some translation work may be required later is the pragmatic posture.

Two practical recommendations. First, instrument for portability: any agent built today should expose its capabilities through whatever protocol the team is currently using, but the underlying agent logic should be protocol-agnostic. Switching from MCP to A2A is a translation layer change if the agent’s capabilities are cleanly separated from the protocol bindings; it’s a rewrite if they’re entangled. Second, don’t bet on a single protocol winning. The OpenAPI and gRPC coexistence in conventional API design is the more likely template than the one-protocol-wins HTTP outcome; expect multiple protocols to coexist with gateways translating between them.

Chapter 4. The Three Axes of Agent Specialization

If multi-agent has earned its place and the topology is chosen, the next question is what each agent does --- how to split the work across agents. Three axes capture the working patterns: role, domain, and skill. Each axis has characteristic use cases, characteristic failure modes, and characteristic frameworks. Most production designs combine all three; understanding the axes separately makes the combination thoughtful rather than accidental.

Three axes of specialization — Role splits by what kind of work; domain splits by topic area; skill splits by capability type. Production systems combine all three.

Role-based specialization splits by what kind of work the agent does. The canonical examples come from research and content production: a researcher agent that gathers and synthesizes information, a writer agent that produces the prose, a critic agent that reviews and suggests revisions, an editor agent that polishes the final output. Each role has a distinct prompt (the researcher’s system prompt is about thoroughness; the writer’s is about clarity; the critic’s is about adversarial review); each role may have distinct tools (the researcher gets web search; the writer doesn’t); the roles compose into a workflow that mirrors how human teams produce content. CrewAI is the framework most explicitly built around this pattern; AutoGen and LangGraph implement it through their respective abstractions. Role-based works well when the roles are genuinely different in their cognitive shape; it works poorly when the roles overlap so much that the specialization is theatrical rather than functional.

Domain-based specialization splits by topic area. A finance agent handles financial queries with finance-specific data sources and compliance rules; a legal agent handles legal queries with case-law databases and disclaimer requirements; a customer-service agent handles customer queries with CRM access; an engineering agent handles technical queries with code repositories and runbooks. Each agent owns a domain in the same way a department owns a function in a human organization; routing happens at the boundary (the hub-and-spoke topology from Chapter 2). The strength of domain specialization is compliance: when finance and legal advice must follow different rules, having separate agents with separate prompts and separate access controls produces auditable separation. The weakness is that domain boundaries are rarely as clean in practice as they look on the org chart; a question about a contract dispute touches both legal and finance, and the routing decision is itself a meaningful design choice.

Skill-based specialization splits by capability type. A planner agent decomposes tasks and produces structured plans; an executor agent takes a plan step and carries it out; an evaluator agent scores the executor’s outputs against the plan’s success criteria; a reflector agent watches the overall loop and produces meta-observations for the next iteration. The split is engineering-clean --- each agent has a clear input/output contract, each can be tested independently --- but conceptually distant from how non-engineers think about the work. “Which department handles this” maps to domain; “who on the team does this” maps to role; “what capability type is needed at this step” doesn’t have an obvious human analog. Skill-based shows up most in framework code and research papers (LangGraph’s planner-executor-reflector tutorial implementations, the ReAct-style cognitive loops); it shows up less in product-facing descriptions because the vocabulary is awkward.

Production systems combine the axes. A serious enterprise customer-support agent might split domain at the top level (billing, technical, returns) via hub-and-spoke routing; within each domain split role (researcher who looks up customer context, responder who drafts the reply, reviewer who checks the reply against policy); within each role apply skill (planner, executor, evaluator) as needed for the more complex tasks. The result is not a flat three-agent system but a hierarchical structure with specialization at every level. Designing this consciously --- choosing where each axis applies and where it doesn’t --- produces systems that are coherent. Letting the axes accumulate without deliberate choice produces systems where it’s unclear why any particular agent exists.

Chapter 5. The Hand-off Problem

The fundamental difficulty of multi-agent systems isn’t the topology or the protocol or the specialization. It’s that whenever agent A finishes its work and passes control to agent B, information is lost. The hand-off problem pervades every multi-agent design; addressing it well is what separates multi-agent systems that work from multi-agent systems that look impressive in diagrams but underperform single agents in practice.

Three mechanisms cause the information loss. Compression: agent A’s natural output is a paragraph of free-form text summarizing its findings; that paragraph contains a fraction of the reasoning, tool outputs, and intermediate state agent A actually accumulated. When agent B receives the summary, it receives the destination but not the journey. If agent B needs to reason about why agent A reached its conclusion --- to verify, to extend, to course-correct --- the summary has stripped exactly the information needed. Format mismatch: agent A’s output shape doesn’t match agent B’s input expectations; either agent A produces something agent B can’t parse cleanly, or agent A bends its output to match agent B’s format and loses information in the bending. Context truncation: agent B operates on agent A’s message, not on agent A’s full conversation; the surrounding context that gave agent A’s message its meaning is absent.

The naive multi-agent design ignores all three causes. Agent A produces a summary, the summary goes to agent B, agent B produces its output based on what it can extract from the summary, the next stage repeats. Each hand-off compounds the loss; by the time the system reaches its final stage, the original task has been refracted through several lenses and the output reflects the cumulative compression rather than the original work. This is why multi-agent systems frequently underperform single agents on the same task: the single agent retains all of its own context throughout; the multi-agent system loses information at every boundary.

Three mitigations address the problem at increasing levels of effectiveness. The first, structured hand-offs: replace free-form summaries with typed messages. Agent A produces a structured output with named fields --- task, context, constraints, reasoning_trace, tool_outputs, confidence, open_questions --- and agent B parses the structure to access exactly the information it needs. The structure preserves what the free-form summary loses; the trade-off is that designing the structure requires anticipating what downstream agents will need, which gets expensive as the agent graph grows.

The second mitigation, conversation passing: instead of summarizing, pass the full conversation history (or relevant slices of it) to the next agent. Agent B sees not just agent A’s conclusion but agent A’s reasoning, tool outputs, and intermediate state. This works but doesn’t scale --- the context window has limits; passing full conversations through three or four hand-offs exceeds those limits quickly --- so in practice this is the right move for shallow agent graphs and the wrong move for deep ones.

The third mitigation, and the one that scales best, is the shared workspace pattern (the blackboard topology from Chapter 2). Rather than passing information between agents, all agents read from and write to a common state object. There is no hand-off to lose information at; each agent sees the same world the previous agents saw, plus whatever the previous agents added to the workspace. The hand-off problem becomes a state-synchronization problem, which is technically easier to address --- it’s a well-understood problem in conventional distributed systems with known solutions. LangGraph’s state model and CrewAI’s context-sharing mechanism both implement this; AutoGen’s group-chat shared history is a less-disciplined variant of the same pattern. When the multi-agent design uses shared state effectively, the information-loss problem largely dissolves.

The practical recommendation: default to shared workspace designs. Use structured hand-offs only when the hand-off crosses a genuine boundary --- a boundary between organizations, a boundary between trust zones, a boundary between agents that legitimately should not share state --- and accept the engineering cost of designing the structure carefully when it must be done. Treat free-form summary hand-offs as a smell: they’re what the system devolves to when no one designed the hand-off, and the cost shows up in output quality the team has trouble explaining. The hand-off problem is the unsolved problem of multi-agent systems in the sense that there’s no protocol or framework that eliminates it entirely; it’s a solved problem in the sense that the working patterns are known. The difference between multi-agent systems that work and those that don’t is mostly whether the design took the hand-off problem seriously.

Part 2 — The Substrates

Eight sections follow. Each opens with a short essay on what its entries have in common and how they relate to alternatives. Representative substrates are presented in the same Fowler-style template used by the prior eight catalogs.

Sections at a glance

Section A --- Communication protocols
Section B --- Graph-and-supervisor frameworks
Section C --- Role-based and hand-off frameworks
Section D --- Event-driven and research frameworks
Section E --- Coordination patterns as code
Section F --- Shared-state patterns
Section G --- Multi-agent observability
Section H --- Discovery and directories

Section A — Communication protocols

MCP, A2A, and ACP --- the wire formats for agent communication

Three protocols dominate the agent-communication category as of mid-2026. MCP (Anthropic, November 2024) started as a tool-and-data protocol and expanded toward agent-as-server. A2A (Google, April 2025) was designed from day one for agent-to-agent semantics with capability discovery via Agent Cards. ACP (Linux Foundation, 2025) is the open-governance alternative with AGNTCY as the broader consortium providing directory and identity services.

The protocols overlap; multi-protocol gateways translate between them; consolidation through 2026—2027 is likely but not certain. Building against the protocol that fits the immediate use case, with the agent’s capabilities cleanly separated from the protocol bindings, is the pragmatic posture against protocol churn.

MCP — Model Context Protocol

Source: github.com/modelcontextprotocol (Anthropic; MIT)

Classification Tool-and-context protocol extended toward agent-to-agent.

Intent

Provide a JSON-RPC-based protocol for connecting LLM agents to external tools, data sources, and (increasingly) other agents, with a standardized server interface that any compatible client can consume.

Motivating Problem

Agent frameworks historically required custom integration code for every external system the agent talks to --- a different adapter for each database, each SaaS product, each internal tool. The result was that agents were locked to whichever integrations their framework happened to ship. MCP’s answer is a protocol: an external system implements an MCP server exposing tools, resources, or prompts; any MCP-compatible client (Claude Desktop, Cursor, custom agent frameworks) consumes the server. The integration shape is once, the consumption shape is everywhere.

How It Works

An MCP server is a process exposing three primitive types over JSON-RPC: Tools (callable functions the agent can invoke), Resources (data the agent can read), and Prompts (templated prompts the user can invoke). The transport is configurable --- stdio for local servers, HTTP+SSE for remote servers, WebSocket for bidirectional cases. The protocol handles capability negotiation, request/response correlation, streaming, and error propagation.

Through 2025 the agent-as-server pattern emerged: rather than exposing low-level tools, an MCP server can wrap a whole agent and expose it as a single high-level tool. A client agent calling the server effectively delegates work to the server’s agent. This makes MCP a de facto agent-to-agent protocol for cases where the parties are willing to use it that way, even though A2A is more purpose-built for the role.

The ecosystem matters. Hundreds of MCP servers exist for tools, databases, SaaS products, and internal systems. Anthropic, Cursor, Continue, Cody, Zed, Sourcegraph, and many others ship MCP support. Building against MCP means inheriting the existing ecosystem; building against A2A means a smaller but more agent-specific ecosystem.

When to Use It

Connecting agents to tools and data sources. Building integrations once and reusing them across multiple agent frameworks and clients. Cases where the underlying need is tool access more than peer agent communication.

Alternatives --- A2A for first-class agent-to-agent communication. ACP for the open-governance alternative. Custom integration code when the protocol abstractions don’t fit the use case.

Sources

github.com/modelcontextprotocol
modelcontextprotocol.io

Example artifacts

Code.

# MCP server in Python exposing a custom tool

from mcp.server import Server

from mcp.server.stdio import stdio_server

import mcp.types as types

server = Server("customer-lookup")

\@server.list_tools()

async def list_tools() -> list[types.Tool]:

return [

types.Tool(

name="lookup_customer",

description="Look up a customer by ID and return their profile and
order history",

inputSchema={

"type": "object",

"properties": {

"customer_id": {"type": "string"},

"include_orders": {"type": "boolean", "default": True},

},

"required": ["customer_id"],

},

),

]

\@server.call_tool()

async def call_tool(name: str, arguments: dict) ->
list[types.TextContent]:

if name == "lookup_customer":

profile = await fetch_customer(arguments["customer_id"])

return [types.TextContent(type="text", text=str(profile))]

raise ValueError(f"Unknown tool: {name}")

if __name__ == "__main__":

import asyncio

asyncio.run(stdio_server(server))

MCP primitives — tools, resources, and prompts

Source: modelcontextprotocol.io (specification); MCP Python and TypeScript SDKs

Classification The three MCP server primitives and who controls each.

Intent

Model an MCP server’s surface as three distinct primitives --- tools, resources, and prompts --- distinguished by who decides when each is used, so that a capability is exposed through the primitive that matches its control model rather than defaulting everything to a tool.

Motivating Problem

MCP is often reduced to “a protocol for tools,” but a server exposes three primitive types, and collapsing them loses the design distinction the protocol is built on. The distinction is not what a capability does but who initiates it: the model, the application, or the user. Exposing read-only reference data as a tool, or a reusable workflow as a tool, works but misplaces control --- the model ends up deciding things the application or the user should decide. Choosing the right primitive is the first design decision when building a server.

How It Works

Tools (model-controlled): callable functions the model chooses to invoke during a turn, each described by a name, a description, and a JSON input schema. The model decides when to call them. This is the primitive most people mean by “MCP,” and its design is the tool-design discipline of Volume 3.

Resources (application-controlled): addressable data the server exposes for the application to read and inject into context --- a file, a database record, a config snapshot --- each identified by a URI (file://, db://, config://) with a MIME type. The application, not the model, decides which resources to load and when; the model consumes what it is handed. Resources support listing (resources/list), reading (resources/read), and optional change subscriptions.

Prompts (user-controlled): parameterized prompt templates the server offers for the user to invoke deliberately --- a “summarize this document” action or a compliance-review workflow --- with typed arguments and a rendered message body. The user selects them, often from a menu; they are not fired by the model mid-turn.

Choosing the primitive: the decision rule is “who drives?” If the model should decide at runtime, it is a tool. If the application should supply data, it is a resource. If the user should trigger a reusable interaction, it is a prompt. One underlying system often exposes all three --- a tool to act, resources to read, prompts to launch common workflows.

When to Use It

Whenever you build, rather than merely consume, an MCP server. The primitive split is the server author’s core design choice; getting it right keeps the model’s decision surface small and puts application and user decisions where they belong. Consumers of existing servers benefit from the same mental model when deciding what to wire into context.

Alternatives --- exposing everything as a tool when the client only implements tools (some clients do not yet support resources or prompts); the base MCP pattern (above) for the protocol and ecosystem overview.

Sources

modelcontextprotocol.io/docs/concepts
spec.modelcontextprotocol.io

Example artifacts

Code.

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("customer-lookup")

@mcp.tool()                          # model-controlled: the model calls it
def refund_order(order_id: str) -> str:
    ...

@mcp.resource("db://customers/{id}")  # application-controlled: the app reads it
def customer(id: str) -> str:
    ...

@mcp.prompt()                        # user-controlled: the user invokes it
def quarterly_review(quarter: str) -> str:
    ...

MCP transports — stdio and Streamable HTTP

Source: modelcontextprotocol.io (transport specification)

Classification MCP transport options: local stdio and remote Streamable HTTP.

Intent

Choose how an MCP client and server exchange messages --- stdio for a local subprocess, Streamable HTTP for a remote service --- matching the transport to whether the server runs beside the client and whether it must serve multiple users.

Motivating Problem

An MCP server’s primitives are the same regardless of how bytes move between client and server, but the transport determines where the server can run, how it is deployed, and whether it holds state. A transport chosen by default rather than by fit leaves teams with a local-only server they cannot host, or a remote server carrying per-connection state it cannot scale. The transport is a deployment decision made at build time.

How It Works

stdio: the client launches the server as a child process and exchanges newline-delimited JSON-RPC over the process’s stdin and stdout. Zero network configuration, single-user, lifecycle tied to the subprocess, stateful by nature (in-memory state lives as long as the process). This is how local servers run --- and how Claude Code runs its MCP servers natively.

Streamable HTTP: the server binds to a port and serves JSON-RPC over HTTP, using Server-Sent Events (SSE) for the server-to-client streaming leg --- progress, notifications, long-running responses. SSE is a component of this transport, not a separate transport of its own. This suits remote servers, multiple concurrent clients, and production APIs.

Stateful vs. stateless sessions: stdio servers are stateful per process. HTTP servers choose --- a stateless server treats each request independently (simplest to scale behind a load balancer), a session-based server carries a session token across requests when continuity is required. The choice drives horizontal scalability.

Production requirements follow the transport: a remote HTTP server needs TLS, authentication on every request, and usually a load balancer; a local stdio server needs none of these because it is not reachable off-box. Selecting the transport therefore also selects the security surface (see least privilege in Volume 12).

When to Use It

stdio for local integrations, developer tools, and anything running beside the client (Claude Desktop, Claude Code, IDE plugins). Streamable HTTP for hosted servers, multi-user deployments, and any server exposed as a production API. Build the server so its primitives are independent of the transport, so the same server can be offered either way.

Alternatives --- the deprecated standalone HTTP+SSE transport that Streamable HTTP superseded; custom transports for constrained environments, which the SDKs allow but few need.

Sources

modelcontextprotocol.io/docs/concepts/transports
spec.modelcontextprotocol.io

Example artifacts

Code.

# The same server, two transports; the primitives do not change
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("customer-lookup")
# ... register tools / resources / prompts ...

if __name__ == "__main__":
    # Local: Claude Desktop / Claude Code launch this as a subprocess
    mcp.run(transport="stdio")
    # Remote instead:
    # mcp.run(transport="streamable-http", host="0.0.0.0", port=8000)

MCP sampling and roots

Source: modelcontextprotocol.io (sampling and roots specification)

Classification Server-initiated capabilities: sampling (LLM callbacks) and roots (scoped grants).

Intent

Use MCP’s client-directed capabilities --- sampling, which lets a server ask the client to run an LLM call on its behalf, and roots, which let the client grant a server a bounded set of directories --- so servers can reason and act without holding their own model credentials or unbounded filesystem access.

Motivating Problem

The base tool, resource, and prompt model is client-pulls-from-server. Two needs invert or bound that flow. First, a server sometimes needs LLM reasoning of its own --- to summarize a document it fetched, say --- but should not carry model API keys or pick a provider unilaterally. Second, a filesystem-touching server should not see the whole disk; it should operate only where the client has explicitly allowed. Sampling and roots are the protocol’s answers, and both keep control and credentials on the client side.

How It Works

Sampling (server-initiated LLM calls): a server issues a sampling request (createMessage) back to the client; the client, which owns the model access, runs the completion and returns the result. The server never holds credentials or picks the provider; it may express preferences --- an intelligence hint, a speed hint, a suggested model --- that the client is free to honor or override. Because the call routes through the client, it is the natural place for a human-in-the-loop gate: the client can surface the request for approval, modification, or rejection before it runs.

Roots (scoped grants): the client hands the server a set of root URIs (roots: [“file:///app/src”]) that bound where the server may operate. There is no implicit access --- a server sees only what its roots grant, and the grant is a hard boundary. This is least privilege applied at the protocol layer: the server author cannot widen its own scope.

Notifications, alongside: long-running server work reports back through progress notifications (a progressToken plus progress and total) and log notifications (with standard severity levels), so a synchronous-looking tool call can surface a progress bar and diagnostics rather than a silent wait.

The common thread: each of these keeps authority with the client. Sampling keeps model credentials and the approval gate on the client; roots keep filesystem authority on the client; notifications keep the client informed without handing the server more power. A server author reaches for them to do more while holding less.

When to Use It

Sampling when a server needs model reasoning but should stay credential-free and model-agnostic --- and when that reasoning warrants human review. Roots for any server that touches a filesystem or other scoped resource, to bound its blast radius. Notifications for tools whose work outlasts a quick request. Client support varies --- not every client implements sampling or roots --- so treat them as capabilities to negotiate, not assume.

Alternatives --- giving the server its own API key and model configuration when client-side sampling is unavailable, at the cost of credential sprawl; enforcing filesystem scope with OS-level sandboxing (Volume 12) when the client does not support roots.

Sources

modelcontextprotocol.io/docs/concepts/sampling
modelcontextprotocol.io/docs/concepts/roots

Example artifacts

Code.

# Client-side sampling handler: the server asks, the client runs the model
async def handle_sampling(request):
    # optional human-in-the-loop gate before the call runs
    if not await approve(request):
        raise PermissionError("sampling request rejected")
    return await client.messages.create(
        model="claude-sonnet-5",   # client may override the server's hint
        max_tokens=request.max_tokens,
        messages=request.messages,
    )

A2A — Agent-to-Agent Protocol

Source: github.com/google-a2a (Google; Apache-2)

Classification Purpose-built agent-to-agent communication protocol.

Intent

Provide a JSON-over-HTTP protocol designed specifically for agent-to-agent communication, with first-class agent discovery (via Agent Cards), capability advertisement, task delegation with streaming, and a well-defined task lifecycle.

Motivating Problem

When agent X needs to ask agent Y to perform a task, the natural pattern is request-response with capability negotiation: X discovers what Y can do, sends a structured task request, receives streaming progress updates, and gets a typed result. MCP can be bent to this pattern but wasn’t designed for it; A2A is the design. Agent Cards advertise capabilities; clients query the registry to find agents matching a capability; tasks are delegated through a defined lifecycle (created, in_progress, requires_input, completed, failed); streaming responses handle long-running work.

How It Works

An A2A-compatible agent publishes an Agent Card at a well-known URL: a JSON document describing the agent’s name, description, capabilities (what kinds of tasks it accepts), authentication requirements, and endpoint. Clients discover agents either by direct URL or by querying a directory (the AGNTCY directory in the open-governance case).

Tasks are submitted via HTTP POST to the agent’s endpoint with a task definition. The agent returns a task ID and begins work. Status polling or server-sent-events streaming gives the client visibility into progress. The task lifecycle includes a requires_input state for HITL scenarios (the agent needs more information from the user before continuing) and a completed state with the final structured output. Long-running tasks survive client disconnects --- the protocol assumes asynchronous delegation.

Multi-modal communication is first-class. A task can include text, images, structured data, and references to external content; agent responses can stream multiple parts (a chart, then a summary, then a follow-up question). The protocol’s design anticipates the agent-collaboration cases that emerged as 2025 progressed.

When to Use It

Agent-to-agent communication where the agents may be developed by different teams or organizations. Cases needing first-class capability discovery and advertisement. Streaming long-running task delegation. Production multi-agent systems with cross-organizational components.

Alternatives --- MCP when the underlying need is tool access more than peer communication. ACP for the open-governance alternative with substantially overlapping technical scope.

Sources

github.com/google-a2a
a2aproject.github.io/A2A/

Example artifacts

Schema / config.

// Agent Card published at
https://my-agent.example.com/.well-known/agent.json

{

"name": "Customer Research Agent",

"description": "Researches customer profiles from CRM and public
sources",

"version": "1.2.0",

"capabilities": [

{

"name": "research_customer",

"description": "Produces a structured research report on a
customer",

"input_schema": {

"type": "object",

"properties": {

"customer_id": {"type": "string"},

"depth": {"enum": ["shallow", "standard", "deep"]}

},

"required": ["customer_id"]

},

"output_schema": {"\$ref":
"#/components/schemas/ResearchReport"}

}

],

"endpoint": "https://my-agent.example.com/a2a",

"authentication": {"type": "bearer"}

}

ACP — Agent Communication Protocol (AGNTCY)

Source: github.com/agntcy (Linux Foundation; Apache-2)

Classification Open-governance peer-agent communication protocol.

Intent

Provide an alternative to A2A under Linux Foundation governance, with a multi-vendor consortium (Cisco, IBM, LangChain, Galileo) developing both the wire protocol and the surrounding ecosystem services (directory, identity, observability standards).

Motivating Problem

For organizations that prefer open-governance standards over vendor-controlled protocols --- a common preference in enterprise procurement, government deployments, and cross-organization integrations --- A2A’s Google stewardship is a friction point regardless of the technical quality. ACP under Linux Foundation provides the same technical surface (agent-to-agent communication, capability discovery, task delegation) under multi-stakeholder governance. AGNTCY is the broader consortium providing the directory, identity, and observability standards that complement the protocol itself.

How It Works

ACP’s technical model substantially overlaps A2A: agents advertise capabilities through descriptor documents, clients discover agents through the AGNTCY directory, tasks are delegated through a defined lifecycle, streaming responses handle long-running work. The wire format is JSON-over-HTTP with optional gRPC bindings; authentication and identity flow through standard OAuth/OIDC patterns extended with agent-specific identity claims via AGNTCY identity services.

The differentiator is the surrounding infrastructure. AGNTCY provides a directory service (cross-organization agent discovery), an identity layer (verifying which agent is calling), and observability standards (tracing across agent boundaries, integrated with OpenInference and OTel GenAI conventions from Volume 8). The full stack is more than a wire protocol --- it’s an attempt at the missing infrastructure layer that would make cross-organization agent deployments work the way cross-organization API deployments work today.

When to Use It

Cross-organization or cross-vendor deployments where open-governance matters for procurement. Enterprise deployments that prefer Linux Foundation standards over vendor protocols. Cases needing the broader AGNTCY infrastructure (cross-organization agent directory, identity, observability) rather than just the wire protocol.

Alternatives --- A2A for the Google-stewarded equivalent with substantially overlapping technical scope. MCP for tool access. Custom protocols when neither standard fits the use case.

Sources

github.com/agntcy
agntcy.org

Section B — Graph-and-supervisor frameworks

LangGraph multi-agent and AutoGen / AG2 --- the explicit-graph approach

Two frameworks dominate the graph-and-supervisor approach to multi-agent coordination. LangGraph (from the LangChain team) models multi-agent systems as state graphs where nodes are agents and edges are transitions, with the supervisor pattern as the canonical hierarchical topology. AutoGen (Microsoft, with the AG2 community fork) models multi-agent systems as conversations among agents with configurable conversation patterns (sequential, group chat, hierarchical), with the GroupChat abstraction as the canonical peer-network topology.

The two frameworks reflect two design instincts. LangGraph treats multi-agent as graph engineering: the structure is explicit, every transition is named, the state model is typed. AutoGen treats multi-agent as conversation engineering: agents talk to each other in conversations, the structure emerges from the conversation flow. Both work; teams choose by preference for explicit-graph design versus emergent-conversation design.

LangGraph (multi-agent)

Source: github.com/langchain-ai/langgraph (MIT; Python and TypeScript)

Classification State-graph framework with explicit multi-agent patterns (supervisor, hierarchical, network).

Intent

Provide a state-graph framework where multi-agent systems are designed as explicit graphs with named agent nodes, typed state, and named transitions, with built-in patterns for supervisor (one manager dispatches to workers), hierarchical (managers of managers), and network (peer-to-peer) topologies.

Motivating Problem

For teams that want their multi-agent system’s structure to be explicit, auditable, and testable, the conversation-based approach (AutoGen) is too emergent. LangGraph’s answer is graph-as-architecture: the multi-agent system is a directed graph, agents are nodes, transitions between agents are edges, the shared state is a typed object that flows through the graph. The structure is visible in the code; the runtime executes the graph faithfully; the trace shows exactly which agents ran in which order.

How It Works

Define an agent state schema (a TypedDict or Pydantic class). Add agent nodes --- each is a function that takes the state and returns a state update. Add edges connecting nodes --- either unconditional (always go from A to B) or conditional (a router function decides where to go next based on state). Compile the graph; invoke with an initial state; the runtime executes the graph until termination, producing the final state.

The supervisor pattern is the canonical hierarchical implementation. A supervisor node receives the current state, decides which worker should run next, and routes accordingly; worker nodes do specialized work and return state updates; the supervisor sees the worker’s updates and decides the next step. The pattern composes recursively: a worker can itself be a sub-graph with its own supervisor.

Shared state addresses the hand-off problem (Chapter 5) directly. All agents read from the same state object and write updates to it; no agent receives a compressed summary of another agent’s work --- it receives the full state object with all prior agents’ contributions visible. The pattern maps onto the blackboard topology naturally.

Integration with Volume 7’s HITL primitives (interrupts) and Volume 7’s tracing (LangSmith) is first-class. A multi-agent system built on LangGraph inherits the durable-execution and observability machinery without separate integration work.

When to Use It

Production multi-agent systems where the structure should be explicit and auditable. Hierarchical or supervisor-style coordination. Cases needing typed shared state and structured transitions. Teams that prefer graph engineering over conversation engineering.

Alternatives --- AutoGen / AG2 for the conversation-based equivalent. CrewAI for role-based with less graph engineering. Custom orchestration when the framework abstractions don’t fit.

Sources

github.com/langchain-ai/langgraph
langchain-ai.github.io/langgraph/concepts/multi_agent/

Example artifacts

Code.

from typing import TypedDict, Annotated, Literal

from langgraph.graph import StateGraph, END

from langgraph.types import Command

from langchain_anthropic import ChatAnthropic

class ResearchState(TypedDict):

task: str

research_notes: list[str]

draft: str

critique: str

final: str

llm = ChatAnthropic(model="claude-opus-4-7")

def supervisor(state: ResearchState) ->
Command[Literal["researcher", "writer", "critic",
"__end__"]]:

"""Decide which agent should run next based on current
state."""

if not state["research_notes"]:

return Command(goto="researcher")

if not state["draft"]:

return Command(goto="writer")

if not state["critique"]:

return Command(goto="critic")

if state["critique"] and "approved" not in
state["critique"].lower():

return Command(goto="writer") # revise based on critique

return Command(goto="__end__", update={"final":
state["draft"]})

def researcher(state: ResearchState) -> dict:

notes = llm.invoke(f"Research the topic:
{state['task']}").content

return {"research_notes": [notes]}

def writer(state: ResearchState) -> dict:

draft = llm.invoke(

f"Task: {state['task']}\nNotes:
{state['research_notes']}\n"

f"Previous draft: {state['draft']}\nCritique:
{state['critique']}\n"

"Produce or revise the draft accordingly."

).content

return {"draft": draft, "critique": ""} # clear critique on
revision

def critic(state: ResearchState) -> dict:

critique = llm.invoke(

f"Review this draft against the task. If acceptable, say
'APPROVED'.\n"

f"Task: {state['task']}\nDraft: {state['draft']}"

).content

return {"critique": critique}

graph = (

StateGraph(ResearchState)

.add_node("supervisor", supervisor)

.add_node("researcher", researcher)

.add_node("writer", writer)

.add_node("critic", critic)

.set_entry_point("supervisor")

.add_edge("researcher", "supervisor")

.add_edge("writer", "supervisor")

.add_edge("critic", "supervisor")

.compile()

)

result = graph.invoke({"task": "Survey current state of vector
databases",

"research_notes": [], "draft": "", "critique": "",
"final": ""})

AutoGen / AG2

Source: github.com/microsoft/autogen (Microsoft; MIT) and github.com/ag2ai/ag2 (community fork; Apache-2)

Classification Conversation-based multi-agent framework with configurable conversation patterns.

Intent

Provide a framework where multi-agent systems are designed as conversations among agents, with configurable conversation patterns (sequential, group chat, hierarchical), termination conditions, and the ability to mix LLM agents, code-executor agents, and human-input agents in the same conversation.

Motivating Problem

For teams that find conversation a more natural primitive than graph for multi-agent design, AutoGen’s answer is agents-as-conversational-participants. An AssistantAgent (LLM-driven), UserProxyAgent (human-in-the-loop or code executor), and GroupChatManager (orchestrator) compose into conversations where the agents take turns responding. The structure is emergent within the constraints of the configured conversation pattern; the framework handles message routing, turn-taking, and termination.

How It Works

Define agents with system messages describing their roles. The simplest two-agent pattern: AssistantAgent and UserProxyAgent in a back-and-forth conversation. The GroupChat pattern: multiple AssistantAgents plus a GroupChatManager that selects which agent speaks next based on configurable speaker-selection logic (auto-LLM-driven, round-robin, or custom). Termination conditions stop the conversation when met (max rounds, specific message pattern, agent vote).

The conversation history is the shared state --- every agent in a GroupChat sees the full history of what every other agent has said. This naturally addresses the hand-off problem (Chapter 5) because there is no hand-off; there’s shared conversational context. The trade-off is that the conversation can grow large quickly, and context-window limits become a practical concern in long multi-agent runs.

AG2 (the community fork) emerged in late 2024 / early 2025 over governance disagreements with Microsoft. The technical surface remains very similar; the project paths diverge in priorities and community direction. As of mid-2026 both projects are active; the choice between them is governance-influenced more than technical.

AutoGen 0.4 (Microsoft’s 2024 rewrite) shifted to an actor-model architecture with stronger async support and explicit agent-runtime separation. The conceptual model remains conversation-based but the engineering substrate is more production-ready than the original 0.2 implementation.

When to Use It

Multi-agent systems where conversation is a natural primitive for the design. Cases needing flexible speaker-selection logic (LLM-driven “whose turn is it”). Mixing LLM agents with code-executor agents and human-input agents in a single coordinated flow. Research workflows where emergent conversational behavior is part of the value.

Alternatives --- LangGraph for explicit-graph engineering. CrewAI for the role-based equivalent with simpler ergonomics. OpenAI Agents SDK for the hand-off-centric alternative.

Sources

github.com/microsoft/autogen
microsoft.github.io/autogen/
github.com/ag2ai/ag2

Section C — Role-based and hand-off frameworks

CrewAI, OpenAI Agents SDK, and Claude Agent SDK subagents

Three frameworks dominate the role-based and hand-off approach. CrewAI is the canonical role-first framework: agents are configured with explicit roles, goals, and backstories; tasks have assigned agents; the crew executes the tasks in defined order. OpenAI Agents SDK (the production successor to OpenAI Swarm, released in 2025) is the hand-off-centric framework: agents are functions, hand-offs between agents are tool calls, the framework handles the orchestration. Claude Agent SDK provides subagents --- specialized sub-instances of Claude that the main agent can spawn for delegated work, popularized by Claude Code’s subagent feature.

All three share a design instinct: the structure is conceptually closer to how humans organize work than how engineers organize code. Roles, hand-offs, and subagent spawns map onto familiar mental models, which makes these frameworks accessible to non-engineers configuring agent behavior. The trade-off is that the structure is less explicit than LangGraph’s graphs and less flexible than AutoGen’s conversations.

CrewAI

Source: github.com/crewAIInc/crewai (Python; MIT)

Classification Role-first multi-agent framework with task-and-crew orchestration.

Intent

Provide a framework where multi-agent systems are designed as crews of role-defined agents executing structured tasks, with explicit roles, goals, and backstories per agent and clear task definitions including expected outputs.

Motivating Problem

For teams that find the role abstraction natural --- the Researcher, the Writer, the Critic, the Editor --- CrewAI structures multi-agent design around that abstraction. Each agent has a role (a one-line description of what kind of work it does), a goal (what it’s optimizing for), a backstory (the context that shapes its perspective), and optionally a set of tools. Tasks are first-class objects with descriptions, expected outputs, and assigned agents. A crew is a collection of agents executing a sequence of tasks; the framework handles the orchestration.

How It Works

Define agents: each gets role, goal, backstory, optional tools, optional LLM choice, optional max iterations. Define tasks: each gets description, expected_output, agent (the assigned agent), optional context (which tasks’ outputs feed in). Build a Crew with the agents and tasks, choose a process (Process.sequential for pipeline ordering or Process.hierarchical for manager-dispatches-workers), and call kickoff() to execute.

Context-sharing across tasks is the canonical hand-off mechanism: each task’s output becomes available as context to subsequent tasks. The framework handles the plumbing automatically. For finer control, task definitions can specify which prior tasks’ outputs they need, and the framework wires only those contexts.

Hierarchical process mode adds a manager: instead of pre-defining the task sequence, a manager agent (an LLM with a manager-style prompt) decides which agent should execute next based on the task and current state. The pattern is the supervisor topology from Chapter 2, implemented at the CrewAI abstraction level.

The framework’s opinionated structure is the trade-off. CrewAI is fastest to adopt for the cases it fits well --- content production, research workflows, structured-output generation --- and least flexible for cases that don’t map onto roles-and-tasks. Teams hitting those edges often migrate to LangGraph or AutoGen for more direct control.

When to Use It

Multi-agent systems where roles map cleanly onto the work. Content production, research synthesis, structured analysis tasks. Teams that want to configure agents declaratively rather than engineering coordination logic. Quick prototyping of role-based agent designs.

Alternatives --- LangGraph for explicit-graph control. AutoGen for conversation-based design. OpenAI Agents SDK for hand-off-centric orchestration. Custom code when CrewAI’s opinionated structure fights the problem.

Sources

github.com/crewAIInc/crewai
docs.crewai.com

Example artifacts

Code.

from crewai import Agent, Task, Crew, Process

researcher = Agent(

role="Senior Research Analyst",

goal="Find and synthesize information about the vector database
market",

backstory="You are an experienced market analyst with deep knowledge
of "

"the infrastructure software space and a track record of producing
"

"comprehensive market overviews.",

tools=[web_search_tool, scrape_website_tool],

verbose=True,

)

writer = Agent(

role="Technical Writer",

goal="Produce a clear, well-structured market overview",

backstory="You write for a technically sophisticated audience and
excel at "

"organizing complex topics into readable structure.",

verbose=True,

)

critic = Agent(

role="Editorial Reviewer",

goal="Ensure factual accuracy and clear structure in the final
draft",

backstory="You are an experienced editor with a strong eye for
unsupported "

"claims and a low tolerance for muddy prose.",

verbose=True,

)

research_task = Task(

description="Survey the current state of vector databases as of
2026. Cover "

"market leaders, open-source landscape, and recent trends.",

expected_output="A bulleted list of findings with sources, organized
by theme.",

agent=researcher,

)

write_task = Task(

description="Write a 1500-word market overview based on the research
notes.",

expected_output="A complete article in markdown with clear section
structure.",

agent=writer,

context=[research_task],

)

review_task = Task(

description="Review the draft. Flag any unsupported claims, weak
prose, or "

"structural issues. Either return 'APPROVED' or specific revision
notes.",

expected_output="Either 'APPROVED' or a bulleted list of specific
issues.",

agent=critic,

context=[write_task],

)

crew = Crew(

agents=[researcher, writer, critic],

tasks=[research_task, write_task, review_task],

process=Process.sequential,

verbose=True,

)

result = crew.kickoff()

OpenAI Agents SDK

Source: github.com/openai/openai-agents-python (OpenAI; MIT)

Classification Hand-off-centric multi-agent framework, production successor to OpenAI Swarm.

Intent

Provide a lightweight framework where multi-agent systems are designed as networks of agents connected by hand-offs (agent-to-agent transitions implemented as tool calls), with the framework handling routing and the agents themselves being simple functions.

Motivating Problem

OpenAI Swarm (released 2024 as an educational reference) demonstrated that multi-agent coordination could be modeled as hand-offs: agent A finishes its work and explicitly transfers control to agent B via a tool call. The Agents SDK (released 2025 as the production successor) productized this pattern with first-class hand-off primitives, structured outputs, integrated tracing, and the support tier OpenAI couldn’t provide for Swarm. The result is a framework that’s lightweight relative to LangGraph’s graph engineering and AutoGen’s conversation patterns, with hand-off as the central abstraction.

How It Works

Define agents: each is a function with a name, instructions (the system prompt), tools, and a list of agents it can hand off to. The framework treats hand-offs as a special kind of tool call --- the agent decides at runtime whether to invoke a tool or transfer to another agent. The Runner executes the conversation: it invokes the current agent, processes the response (which may be a hand-off, a tool call, or a final output), and continues until a final output is produced.

The hand-off is explicit and traceable: the agent’s decision to hand off appears in the trace as a typed event; the receiving agent gets the full conversation history; the routing logic is visible in code (which agents can hand off to which other agents).

Integration with structured outputs (Pydantic-style schemas as the agent’s output type), tracing (OpenAI’s built-in tracing UI or external observability platforms), and guardrails (input and output checks per Volume 8’s patterns) is first-class. The framework is intentionally small --- a few thousand lines of code --- because the abstractions are deliberately minimal.

When to Use It

Multi-agent systems where hand-offs (route to billing agent, escalate to manager agent, transfer to specialist) are the natural coordination primitive. Customer-support routing systems. Hub-and-spoke topologies (Chapter 2) with explicit routing logic. Teams that prefer lightweight frameworks over heavyweight orchestration.

Alternatives --- CrewAI for role-based with task-and-crew structure. LangGraph for explicit graph control. AutoGen for conversation-based design. Direct API code when the framework abstractions don’t add value.

Sources

github.com/openai/openai-agents-python
openai.github.io/openai-agents-python/

Example artifacts

Code.

from agents import Agent, Runner, function_tool

\@function_tool

def check_account_balance(account_id: str) -> dict:

return fetch_balance(account_id)

\@function_tool

def look_up_order(order_id: str) -> dict:

return fetch_order(order_id)

billing_agent = Agent(

name="Billing Specialist",

instructions="You handle billing questions. Always verify account
before discussing charges.",

tools=[check_account_balance],

)

order_agent = Agent(

name="Order Specialist",

instructions="You handle order-related questions: status, returns,
exchanges.",

tools=[look_up_order],

)

triage_agent = Agent(

name="Customer Support Triage",

instructions=(

"You are the first line of customer support. Identify whether the "

"customer's issue is about billing or orders, and hand off to the
"

"appropriate specialist. Do not attempt to resolve the issue
yourself."

),

handoffs=[billing_agent, order_agent], # explicit routing options

)

result = Runner.run_sync(

starting_agent=triage_agent,

input="I think I was charged twice for order #12345",

)

print(result.final_output)

Claude Agent SDK — subagents

Source: docs.claude.com/en/api/agent-sdk (Anthropic; commercial SDK)

Classification Sub-instance spawning pattern for hierarchical agent delegation.

Intent

Provide a primitive where a main Claude agent can spawn specialized subagents --- sub-instances of Claude with focused prompts and constrained tool sets --- to handle delegated sub-tasks, with the subagent’s context isolated from the main agent’s and its output returned as a structured result.

Motivating Problem

The pattern emerged from Claude Code’s subagent feature in 2024—2025 and was generalized into the broader Claude Agent SDK. The motivation: a main agent working on a large task often needs to delegate focused sub-tasks (research this specific topic; write this specific function; review this specific section) where the sub-task has its own context budget and its own specialized prompt. Doing the delegated work in the main agent’s context window consumes context budget the main agent needs to keep; doing it in an external subprocess loses the LLM’s ability to reason about the result. The subagent pattern threads this needle: a fresh Claude instance handles the sub-task with its own context, completes the task, returns a structured result that the main agent integrates.

How It Works

The main agent invokes a subagent with a task description and any necessary context. The framework spawns a fresh Claude instance with a specialized system prompt for the task type (a research subagent, a coding subagent, a critique subagent). The subagent works on the task within its own context window, possibly using its own tools, and produces a result. The result returns to the main agent as a structured output --- the equivalent of a tool-call result.

Context isolation is the design point. The subagent doesn’t see the main agent’s full conversation; it sees only what the main agent chose to pass. The main agent doesn’t see the subagent’s full reasoning; it sees only the structured result. This produces the hand-off problem (Chapter 5) by design --- the context isolation is a feature, not an accident. The mitigation is careful design of what context to pass and what structure to return.

Claude Code’s implementation popularized the pattern with named subagent types (general-purpose, statusline-setup, output-style-setup, output-mode) and per-subagent tool restrictions. The Claude Agent SDK exposes the primitive for general use; teams configure their own subagent types matching their domain.

When to Use It

Tasks where context isolation is desirable: the main agent should focus on the high-level work and delegate context-heavy sub-tasks. Coding agents that delegate file-level work to subagents with specific file context. Research agents that delegate topic research to specialized subagents. Cases where hierarchical decomposition naturally fits the work.

Alternatives --- LangGraph or CrewAI for the hierarchical pattern with shared state instead of isolated context. OpenAI Agents SDK hand-offs for the routing-style pattern. Custom subprocess spawning when full control over the subagent’s environment matters.

Sources

docs.claude.com/en/api/agent-sdk
docs.claude.com/en/docs/claude-code

Section D — Event-driven and research frameworks

LlamaIndex Workflows and Microsoft Magentic-One

Two frameworks sit outside the dominant graph-supervisor-and-role-and-conversation patterns. LlamaIndex Workflows takes an event-driven approach: agents (or workflow steps) react to typed events; the workflow is a graph of event-producers and event-consumers; the structure emerges from event flow. Microsoft Magentic-One is a research-oriented framework (released late 2024 and evolving) targeting open-ended tasks that mix web browsing, file system access, and code execution --- the kinds of tasks that benefit from a manager agent orchestrating specialists with concrete tools.

Both frameworks have less production adoption than the entries in Sections B and C, but both contribute distinctive design ideas worth understanding. Event-driven coordination scales differently from graph-driven coordination; the manager-with-specialists pattern Magentic-One demonstrates is a concrete instance of the hierarchical topology with strong empirical results on the GAIA and other agent benchmarks.

LlamaIndex Workflows

Source: github.com/run-llama/llama_index (Python; MIT)

Classification Event-driven workflow framework for multi-agent and multi-step agent designs.

Intent

Provide an event-driven workflow framework where steps (which may include agent invocations) react to typed events and emit new events, with the workflow structure emerging from the event flow rather than being explicitly graphed.

Motivating Problem

For multi-step agent designs where the control flow is dynamic and difficult to capture in an explicit graph --- conditional branches, parallel branches that merge based on which finishes first, retry-and-fallback patterns, mixed sync-and-async steps --- the event-driven model is a natural fit. LlamaIndex Workflows treats each step as an event consumer that may emit new events; the runtime routes events to consumers; the workflow completes when a designated termination event is emitted.

How It Works

Define event types (Pydantic models). Define a workflow class with @step-decorated methods, each of which takes specific event types as parameters and returns new event types. The framework introspects the method signatures, builds the event-routing graph automatically, and executes the workflow. Parallel execution happens naturally when multiple steps consume the same event or when a step emits multiple events; sequential execution happens when steps depend on each other through event chains.

Multi-agent designs in LlamaIndex Workflows typically have one step per agent invocation, with events carrying the context between them. The shared-state pattern is also supported through Context objects that persist across steps. Integration with LlamaIndex’s broader stack (the RAG and indexing primitives from Volume 6) is first-class.

The framework is newer than the entries in Sections B and C and has a smaller production track record, but the event-driven model maps cleanly onto certain workflow shapes (incident response, complex data pipelines, multi-modal processing) where explicit graphs become unwieldy.

When to Use It

Workflows with complex conditional branching, parallel-then-merge patterns, or mixed sync-and-async steps. Multi-agent designs already integrated with the LlamaIndex retrieval stack. Cases where the event-driven model maps more naturally onto the work than explicit graphs do.

Alternatives --- LangGraph for the explicit-graph equivalent. AutoGen for conversation-based. Custom event-driven orchestration (asyncio, RxPy) when the framework abstractions don’t add value.

Sources

github.com/run-llama/llama_index
docs.llamaindex.ai/en/stable/understanding/workflows/

Microsoft Magentic-One

Source: github.com/microsoft/autogen/tree/main/python/packages/autogen-magentic-one (MIT)

Classification Research-oriented multi-agent framework with manager-plus-specialists pattern.

Intent

Provide a research-grade implementation of the manager-plus-specialists hierarchical pattern, with concrete specialists for web browsing (WebSurfer), file operations (FileSurfer), code execution (Coder, ComputerTerminal), and an Orchestrator that manages the team.

Motivating Problem

For the class of agent tasks that mix web browsing, file system access, and code execution --- the kind of work that benchmark suites like GAIA, AssistantBench, and WebArena test --- a manager-plus-specialists architecture with concrete tools has produced state-of-the-art results in 2024—2025 research. Magentic-One is Microsoft’s reference implementation: an Orchestrator agent that maintains a task ledger and a progress ledger, deciding what to do next and which specialist to dispatch; four specialist agents with focused tools; the whole thing layered on top of AutoGen v0.4.

How It Works

The Orchestrator maintains two ledgers. The task ledger tracks what needs to be done: known facts about the task, key facts to gather, plan steps. The progress ledger tracks whether each step is making progress, with explicit decisions to advance, retry, or replan. On each iteration, the Orchestrator consults the ledgers and dispatches the appropriate specialist.

WebSurfer drives a browser (Playwright-based) to navigate, read pages, and interact with web UIs. FileSurfer reads and navigates the file system. Coder writes code; ComputerTerminal executes shell commands. Each specialist has a tightly scoped tool surface, which makes their behavior predictable and their errors interpretable.

The pattern’s contribution is the explicit ledger design and the demonstration that hierarchical-with-concrete-specialists outperforms more emergent patterns on benchmark suites. Magentic-One as a deployable framework is research-grade rather than production-grade; the pattern itself is influential and is reimplemented in production frameworks (LangGraph and CrewAI both have manager-plus-specialists tutorials echoing the design).

When to Use It

Research benchmarking on open-ended agent tasks. Reference implementation for the manager-plus-specialists pattern. Studies of how explicit ledger design affects agent behavior. Cases where the specialist-toolset model (web, file, code, terminal) matches the deployment target.

Alternatives --- production frameworks (LangGraph, CrewAI, AutoGen) implementing similar patterns for production use cases. Browser-use, Aider, Devin-style products for the specific verticals Magentic-One demonstrates.

Sources

microsoft.github.io/autogen/dev/user-guide/agentchat-user-guide/magentic-one.html
arxiv.org/abs/2411.04468

Section E — Coordination patterns as code

The planner-executor and critic-and-reflection patterns

Two coordination patterns recur across frameworks and warrant explicit treatment as patterns rather than as framework features. The planner-executor pattern splits work into planning (one agent decomposes the task into structured steps) and execution (a second agent or set of agents carries out each step). The critic-and-reflection pattern adds a critic agent that reviews the executor’s outputs and a reflection loop that incorporates the critic’s feedback into subsequent attempts.

Both patterns are implementable in any framework from Sections B and C. Their value as patterns is that they can be reasoned about independently of any framework: when the design calls for one of these patterns, the framework is the substrate; the pattern is the actual architectural decision.

The planner-executor pattern

Source: Implementable in LangGraph, CrewAI, AutoGen, OpenAI Agents SDK

Classification Coordination pattern: planning agent decomposes; executor agent(s) carry out.

Intent

Separate task decomposition from task execution by using one agent (planner) to produce a structured plan and another agent (or set of agents) to execute each step, with the executor optionally able to surface failures back to the planner for replanning.

Motivating Problem

For complex tasks where the right decomposition isn’t obvious, single-agent designs conflate two concerns: figuring out what to do and actually doing it. The agent’s context window holds both the high-level task and the low-level execution details; the planning logic and the execution logic share the same prompt; failures in one obscure the other. The planner-executor split addresses this by giving each concern its own agent with its own prompt, its own context budget, and its own success criteria.

How It Works

The planner agent receives the high-level task and produces a structured plan: a list of steps, each with a step description, expected output, success criteria, and (optionally) which executor or tool should handle it. The plan is a typed data structure --- not free-form text --- so the executor can parse and follow it deterministically.

The executor agent(s) consume the plan one step at a time. For each step, the executor reads the step description, executes (possibly calling tools or other agents), validates against the success criteria, and reports the result. Successful steps advance the plan; failed steps either retry or surface back to the planner for replanning.

The replan loop is the pattern’s most interesting design decision. The simple version: when the executor reports a failed step, control returns to the planner with the failure context, and the planner produces a revised plan. The more sophisticated version: the planner periodically reviews progress regardless of failure, replanning based on what was actually learned during execution rather than what was assumed at plan time. Magentic-One’s task-and-progress ledger design (Section D) implements this sophisticated version explicitly.

When to Use It

Complex tasks where the decomposition is itself a meaningful design decision. Cases where execution failures should trigger replanning rather than just retry. Tasks requiring explicit progress tracking. Multi-step workflows where the steps are heterogeneous.

Alternatives --- ReAct-style single-agent loops when the task is simple enough that decomposition is implicit. Hierarchical multi-agent without the explicit planner role when the manager can dispatch directly without producing a plan artifact.

Sources

langchain-ai.github.io/langgraph/tutorials/plan-and-execute/plan-and-execute/
arxiv.org/abs/2305.04091 (Plan-and-Solve prompting)

Example artifacts

Code.

from pydantic import BaseModel

from typing import Literal

from langgraph.graph import StateGraph, END

class PlanStep(BaseModel):

step_number: int

description: str

expected_output: str

success_criteria: str

status: Literal["pending", "in_progress", "done", "failed"]
= "pending"

result: str | None = None

class PlannerExecutorState(BaseModel):

task: str

plan: list[PlanStep] = []

current_step: int = 0

final_answer: str | None = None

def planner(state):

"""Produce or revise the structured plan based on current
state."""

if not state.plan:

# Initial plan

plan_text = llm.invoke(

f"Task: {state.task}\nProduce a numbered plan of 3-8 steps."

).content

plan = parse_plan(plan_text)

return {"plan": plan}

# Replan based on a failed step

failed = [s for s in state.plan if s.status == "failed"]

if failed:

plan_text = llm.invoke(

f"Original task: {state.task}\nFailed step: {failed[0]}\n"

f"Produce a revised plan from this point forward."

).content

revised = parse_plan(plan_text)

return {"plan": state.plan[:state.current_step] + revised,
"current_step": state.current_step}

return {}

def executor(state):

"""Execute the current plan step."""

step = state.plan[state.current_step]

result = llm.invoke(

f"Execute this step: {step.description}\nSuccess looks like:
{step.success_criteria}"

).content

# Validate against success criteria

if validate_result(result, step.success_criteria):

step.status = "done"; step.result = result

return {"plan": state.plan, "current_step": state.current_step +
1}

step.status = "failed"

return {"plan": state.plan}

def router(state):

if state.current_step >= len(state.plan):

return "finalize"

if any(s.status == "failed" for s in state.plan):

return "planner" # replan

return "executor"

def finalize(state):

final = synthesize_results([s.result for s in state.plan if
s.result])

return {"final_answer": final}

graph = (

StateGraph(PlannerExecutorState)

.add_node("planner", planner)

.add_node("executor", executor)

.add_node("finalize", finalize)

.set_entry_point("planner")

.add_conditional_edges("planner", router)

.add_conditional_edges("executor", router)

.add_edge("finalize", END)

.compile()

)

The critic-and-reflection pattern

Source: Implementable in any multi-agent framework

Classification Coordination pattern: critic agent reviews; reflection loop incorporates feedback.

Intent

Improve output quality by separating production (a writer or executor agent) from review (a critic agent) and iterating: the critic reviews the producer’s output, identifies issues, and the producer revises based on the critique.

Motivating Problem

Single-agent designs that ask the same agent to produce and self-review often produce worse outputs than a producer-and-critic pair. The reason is prompt collision: an agent prompted to be both creative and critical typically picks one tendency and underweights the other. Separating the roles gives each agent a clean prompt: the producer’s system prompt is about producing the best output; the critic’s system prompt is about adversarial review. The cleaner prompts produce better outputs in their respective directions, and the iteration loop integrates them.

How It Works

The producer agent receives the task and produces an output. The critic agent receives the producer’s output and the task, and produces a critique --- either “approved” (the output is acceptable) or specific issues with the output. If approved, the loop terminates. If issues exist, the producer receives the critique and produces a revised output. The loop continues until approval or until a maximum iteration count.

The critic’s prompt design matters. A weak critic produces vague “this could be better” feedback that doesn’t help the producer revise. A strong critic produces specific, actionable feedback (“the second paragraph claims X but the cited source says Y; revise to match the source”). The strongest critics use structured output (a list of specific issues with severity and recommended fixes) so the producer can address each issue systematically.

Termination conditions matter. A loop that runs until the critic approves can loop indefinitely if the producer and critic disagree fundamentally. The pragmatic pattern includes a maximum iteration count (typically 3—5) and a fallback action (return the best version so far with the unresolved critique attached). Some implementations also include a third agent --- a judge --- that decides whether to continue iterating when the producer and critic seem stuck.

The pattern works best when the critic uses a different model from the producer (Volume 8’s LLM-as-judge guidance), to avoid self-preference bias. A Claude producer reviewed by a GPT critic, or vice versa, produces less biased critique than self-review.

When to Use It

Content production where output quality matters more than throughput. Code generation where a separate review pass catches issues. Reasoning tasks where verification adds genuine value. Multi-step decisions where post-hoc critique reveals issues the producer missed.

Alternatives --- single-agent self-review when latency matters more than quality. Multiple producer agents with voting when diversity of approaches helps. Human review when the critique requires judgment outside the LLM’s training.

Sources

arxiv.org/abs/2303.11366 (Reflexion)
arxiv.org/abs/2305.10601 (Tree of Thoughts as related pattern)

Example artifacts

Code.

from langchain_anthropic import ChatAnthropic

from langchain_openai import ChatOpenAI

producer = ChatAnthropic(model="claude-opus-4-7")

critic = ChatOpenAI(model="gpt-5") # different family to avoid
self-preference

MAX_ITERATIONS = 5

def produce_and_revise(task: str) -> str:

output = producer.invoke(

f"Task: {task}\nProduce the best output you can."

).content

critiques = []

for iteration in range(MAX_ITERATIONS):

critique = critic.invoke(

f"You are an adversarial reviewer. Task: {task}\n"

f"Output to review: {output}\n\n"

"Return either 'APPROVED' or a numbered list of specific issues,
"

"each with severity (critical/important/minor) and a recommended
fix."

).content

if "APPROVED" in critique.upper():

return output

critiques.append(critique)

output = producer.invoke(

f"Task: {task}\nPrevious output: {output}\n"

f"Reviewer feedback: {critique}\n"

"Produce a revised output addressing the feedback."

).content

# Max iterations reached; return best version with unresolved
critique attached

return f"{output}\n\n[Unresolved review notes after
{MAX_ITERATIONS} iterations:\n{critiques[-1]}]"

Section F — Shared-state patterns

Shared scratchpad and blackboard --- avoiding hand-offs by avoiding hand-offs

The hand-off problem (Chapter 5) is structurally avoidable when agents share state instead of passing messages. Two patterns dominate. The shared scratchpad pattern is the lightweight version: agents read and write to a common state dictionary, with each agent aware of the others’ contributions through the shared state. The blackboard pattern is the structured version: a central workspace where each agent contributes typed artifacts (research findings, draft sections, critiques) and agents are activated based on what artifacts are present.

Both patterns are implementable in LangGraph (via the shared state model), CrewAI (via context-sharing), and AutoGen (via group-chat shared history). The choice is between lightweight (scratchpad) and structured (blackboard), with the lightweight version sufficient for most cases and the structured version paying for itself when many agents contribute many artifact types.

The shared scratchpad pattern

Source: Implementable as LangGraph State, CrewAI context, AutoGen group chat

Classification Coordination pattern: agents share a common state dictionary.

Intent

Avoid the hand-off problem by giving all agents read-and-write access to a common state object, so coordination happens through state changes rather than message-passing.

Motivating Problem

Message-passing between agents loses information (Chapter 5). Shared state eliminates the loss by making the state visible to all agents at all times. The pattern is the multi-agent equivalent of a global variable --- conceptually simple, occasionally messy in practice, and almost always better than the alternative of explicit hand-offs for the cases where it applies.

How It Works

Define a state schema (TypedDict, Pydantic model, or framework-specific equivalent). Each agent’s function takes the current state and returns a state update. The framework applies the update and passes the new state to the next agent. All agents see the cumulative state --- not just what they were explicitly told.

Naming conventions matter. The state schema effectively defines the coordination contract between agents --- which fields each agent reads, which fields each agent writes, in what order. A well-named state schema makes the multi-agent design readable; a poorly-named one produces the multi-agent equivalent of global-variable spaghetti.

Conflict resolution matters when multiple agents can write the same field. The simple version: last write wins. The more sophisticated version: each field has a defined merge function (concatenation for lists, set-union for sets, max for numbers). LangGraph’s state model supports custom reducers per field for exactly this case.

When to Use It

Multi-agent designs where the agents naturally coordinate around a shared task state. Sequential and hierarchical topologies. Cases where the hand-off problem would otherwise dominate the design.

Alternatives --- the blackboard pattern when the state has enough structure to warrant the additional discipline. Explicit message-passing when the agents legitimately should not share state (cross-organization boundaries, trust zones).

Sources

langchain-ai.github.io/langgraph/concepts/low_level/#state

The blackboard pattern

Source: Classical AI pattern (1980s); modern implementations in LangGraph and custom frameworks

Classification Coordination pattern: structured shared workspace with typed artifacts.

Intent

Provide a structured shared workspace where each agent contributes typed artifacts and agent activation depends on which artifacts are present, supporting opportunistic and emergent coordination patterns.

Motivating Problem

The shared-scratchpad pattern works for designs where the coordination is fixed at design time. For designs where the coordination is dynamic --- different agents activate based on what’s currently on the workspace, the order is not predetermined, multiple agents may contribute to the same artifact type --- the blackboard pattern adds structure: typed artifact slots, agent activation conditions, and explicit support for opportunistic coordination.

How It Works

The blackboard is a structured workspace with named artifact slots: research_findings (list), draft_sections (dict by section name), critiques (list), decisions (list), and so on. Each agent has an activation condition (“activate when research_findings has at least 3 entries and draft_sections is empty”) and a contribution pattern (“when activated, produce draft_sections entries”). A scheduler activates agents whose conditions are met; agents contribute artifacts; the scheduler reevaluates conditions; the loop continues until termination (no agents activate, or a designated completion condition is reached).

The pattern’s strengths are emergent coordination (the system can handle workflows that weren’t pre-specified) and opportunistic parallelism (agents whose conditions are simultaneously met can run in parallel). The trade-offs are debugging difficulty (the trace shows what happened but not why this particular order emerged) and termination correctness (designing termination conditions that catch all the legitimate completion cases without false positives or infinite loops).

Modern multi-agent frameworks implement the pattern with varying degrees of explicitness. LangGraph state with conditional routing approximates it; CrewAI’s context-and-task model leans toward it for hierarchical processes; AutoGen’s group chat with speaker selection logic implements a version of it. Dedicated blackboard frameworks exist but have less adoption than the general-purpose alternatives.

When to Use It

Multi-agent designs with dynamic coordination patterns. Workflows where agent activation depends on workspace state rather than a fixed plan. Research-style workflows with opportunistic discovery. Production systems where the work shape varies enough that a fixed pipeline would be wrong.

Alternatives --- shared scratchpad for simpler designs. Explicit graphs (LangGraph supervisor pattern) for designs that can be fixed at design time. Custom orchestration when the framework abstractions don’t fit.

Sources

Engelmore & Morgan, Blackboard Systems (Addison-Wesley, 1988)

Section G — Multi-agent observability

How Volume 7’s tracing tools extend to multi-agent systems

Multi-agent systems need everything Volume 7’s observability layer provides (trace trees, audit records, debug surfaces) plus several multi-agent-specific concerns: cross-agent trace correlation (when agent A calls agent B, the trace must span both), agent identity (which agent made this decision), conversation reconstruction (replaying the agent-to-agent conversation), and inter-agent metrics (how often did agent A invoke agent B, what was the success rate of that hand-off). The good news is that the major observability platforms (LangSmith, Phoenix, Langfuse) extend naturally to multi-agent through the same trace-tree mental model; the additional structure is handled through span metadata rather than fundamentally new abstractions.

Multi-agent tracing across LangSmith, Phoenix, Langfuse

Source: Volume 7’s observability platforms, extended via OpenInference multi-agent conventions

Classification Observability pattern for tracing multi-agent systems.

Intent

Extend the trace-tree mental model from Volume 7 to multi-agent systems by adding agent identity to spans, correlating cross-agent calls, and capturing inter-agent metrics, with the major observability platforms providing native support through OpenInference’s multi-agent attribute conventions.

Motivating Problem

A trace tree for a single-agent run has a clear shape: one thread, one or more runs, steps within runs, LLM and tool calls within steps. A trace tree for a multi-agent run has the same shape but with additional structure: each span belongs to a specific agent; cross-agent calls span agent boundaries; the trace UI needs to make agent identity visible so engineers can answer questions like “which agent made this decision” and “how many times did the supervisor invoke each worker.”

How It Works

The OpenInference specification (Volume 7 Section D) added agent-specific attributes in 2025: openinference.span.kind = “AGENT” for spans representing whole agent invocations; openinference.agent.name for the agent’s identifier; conversation_id for grouping cross-agent calls into a single logical conversation. Spans tagged with these attributes render in observability platforms with agent identity made visible.

Cross-agent call correlation works through standard OTel context propagation. When agent A calls agent B (whether through an in-process function call, an MCP server call, an A2A protocol call, or any other mechanism), the trace context propagates with the call, and agent B’s spans become children of agent A’s span in the trace tree. The trace then shows the full cross-agent flow as a single connected tree.

Inter-agent metrics are derivable from the trace data. Hand-off counts, agent latency distributions, success and failure rates per agent, conversation length distributions, branching factor at supervisor decisions --- all of these are aggregations over the trace data that the observability platforms compute and surface. Production deployments use these metrics to identify which agents are bottlenecks, which hand-offs frequently fail, and which conversation patterns produce the best outcomes.

When to Use It

Any production multi-agent deployment needing observability. Debugging cross-agent failures. Operational visibility into which agents are doing what. Audit and compliance documentation for agent decisions.

Alternatives --- custom logging when the framework integration doesn’t fit. Print-debugging during early development before observability matters. Application-specific dashboards for the high-level metrics that the generic platforms surface but don’t prioritize.

Sources

github.com/Arize-ai/openinference

Section H — Discovery and directories

AGNTCY directory and ecosystem-curation resources

For multi-agent systems that may need to discover agents from outside the local deployment --- agents in other organizations, agents from third-party vendors, agents whose endpoints aren’t known at design time --- a directory service is the missing piece. The AGNTCY directory under Linux Foundation governance is the primary cross-organization agent directory as of mid-2026. For ecosystem discovery (which frameworks exist, which patterns are documented, which research is recent), the awesome-X GitHub lists continue to be the right entry point.

AGNTCY directory and ecosystem resources

Source: agntcy.org and various awesome lists

Classification Discovery infrastructure and ecosystem resources.

Intent

Provide the cross-organization agent directory (AGNTCY) and the community-curated awesome lists that track the multi-agent ecosystem as it evolves.

Motivating Problem

Multi-agent systems that stay within one organization can use private agent registries; multi-agent systems that span organizations need cross-organization discovery, identity, and capability advertisement. AGNTCY’s directory addresses this. Separately, the multi-agent ecosystem moves fast enough that any printed catalog (including this one) goes partially stale within months; the community-curated awesome lists are the right resource for keeping current.

How It Works

AGNTCY directory: agents publish Agent Cards (compatible with A2A or ACP) to the AGNTCY directory; clients query the directory by capability, organization, or identity. The directory provides discovery, capability matching, identity verification (which agent is this, who operates it), and the observability standards complementing the wire protocols.

Awesome lists: awesome-llm-agents, awesome-ai-agents, awesome-multi-agent-systems, awesome-agent-orchestration, and many more. The standard GitHub awesome-X format with frameworks, papers, products, and resources organized by category and updated by community pull requests.

Research conferences and arXiv: NeurIPS, ICLR, ICML, ACL, COLM track the academic state of multi-agent systems. arXiv’s cs.MA (Multi-Agent Systems) category is the right firehose for keeping current on research --- noisier than the awesome lists but more current.

When to Use It

Cross-organization agent discovery. Periodic surveys of the multi-agent ecosystem. Discovery when surveying products for a specific multi-agent need. Cross-checking that this catalog’s recommendations match the current consensus.

Alternatives --- vendor-specific resources (Anthropic, OpenAI, Google, Microsoft documentation). Direct framework documentation. Conference proceedings for academic state of the art.

Sources

agntcy.org
github.com/topics/multi-agent-systems
arxiv.org/list/cs.MA/recent

Appendix A --- Topology Reference Table

Cross-reference between the five coordination topologies (Chapter 2) and their representative implementations.

Topology	Characteristic	Representative implementations
Hierarchical	Manager dispatches; workers execute	LangGraph supervisor, CrewAI hierarchical
Sequential	Pipeline; each agent transforms	CrewAI sequential, linear LangGraph chains
Peer network	Agents talk freely; emergent structure	AutoGen GroupChat
Hub-and-spoke	Central router; specialist endpoints	OpenAI Agents SDK handoffs
Blackboard	Shared workspace; opportunistic	LangGraph shared state, CrewAI context

Appendix B --- The Nine-Volume Series

This catalog joins the eight prior volumes to form a nine-layer vocabulary for agentic AI. The volumes are independent and the reading order is flexible; the cross-references make whichever path the reader chooses coherent.

Volume 1 --- Patterns of AI Agent Workflows --- the timing of agent runs.
Volume 2 --- The Claude Skills Catalog --- model instructions in packaged form.
Volume 3 --- The AI Agent Tools Catalog --- the function-calling primitives.
Volume 4 --- The AI Agent Events & Triggers Catalog --- the activation layer.
Volume 5 --- The AI Agent Fabric Catalog --- the infrastructure substrate.
Volume 6 --- The AI Agent Memory Catalog --- the state and context layer.
Volume 7 --- The Human-in-the-Loop Catalog --- the human-agent interaction layer.
Volume 8 --- The Evaluation & Guardrails Catalog --- the governance layer.
Volume 9 --- The Multi-Agent Coordination Catalog (this volume) --- the agent-to-agent communication layer.

Nine layers. The first eight describe what a single agent system needs (patterns, skills, tools, events, fabric, memory, human interaction, governance). The ninth describes what happens when a single agent isn’t enough --- when multiple agents need to coordinate, communicate, and collaborate. Volume 9 builds on every prior volume: agents need patterns for their runs, skills and tools for their work, events to trigger them, fabric to run on, memory to persist, humans to approve and observe, evaluation and guardrails to govern, and --- in the multi-agent case --- protocols and frameworks to communicate.

The series can be read top-down for the agent designer’s sequence: how do individual agent runs compose, what do they know, what can they call, what triggers them, where do they run, what state do they carry, how do humans interact with them, how is the system tested and defended, and --- if more than one agent --- how do they coordinate. It can be read bottom-up for the operator’s sequence: how do humans approve and observe, what state lives where, what fabric supports it, what events drive it, what tools the agents use, what skills ship with them, what patterns their runs follow, how the whole thing is governed, and how the agents communicate when there are multiple. The multi-agent volume sits at the top of the stack in the sense that it presumes everything beneath; it sits at the periphery of the stack in the sense that most applications don’t need it.

Appendix C --- The Multi-Agent Anti-Patterns

Seven recurring mistakes that distinguish working multi-agent designs from the rest. Avoiding these is most of the practical wisdom in the field:

Defaulting to multi-agent. Most applications don’t need multi-agent. Single-agent-with-good-tools handles most cases better. Multi-agent is a power tool; reaching for it as a default produces slower, less reliable, more expensive systems with no quality lift to justify the cost.
Ignoring the hand-off problem. Free-form summary hand-offs between agents lose information at every boundary. Multi-agent systems that don’t address the hand-off problem (with structured hand-offs or shared workspaces) frequently underperform single agents on the same task.
Theatrical role specialization. Splitting a single agent’s work across multiple agents whose roles are nominally different but functionally similar produces complexity without benefit. If the researcher, writer, and critic all have the same tools and similar prompts, they’re the same agent under three names.
Peer-network designs without orchestration scaffolding. Letting agents talk to each other freely without termination conditions, conversation budgets, or supervisor scaffolding produces conversation explosion: agents talking endlessly to each other, consuming tokens, not converging.
Coupling agent logic to specific protocols. Agents whose capabilities are tangled with MCP-specific or A2A-specific code can’t migrate when the protocol landscape shifts. Separate the agent’s capability from the protocol binding.
Skipping multi-agent observability. The trace-tree from Volume 7 is harder to extend to multi-agent but more necessary, not less. Multi-agent systems without proper tracing are nearly impossible to debug in production.
Over-engineering the topology. Mixing all five topologies in a single design produces incoherent systems. Most working multi-agent designs use two topologies thoughtfully (typically hierarchical and shared-state, or hub-and-spoke and sequential) rather than mixing all of them ad-hoc.

Appendix D --- Discovery and Standards

Resources for tracking the multi-agent coordination ecosystem as it evolves:

modelcontextprotocol.io --- MCP specification and ecosystem hub.
a2aproject.github.io/A2A/ --- A2A protocol specification and reference implementations.
agntcy.org --- Linux Foundation’s AGNTCY consortium for ACP, directory, and identity standards.
Anthropic’s multi-agent research papers --- honest empirical work including the 15× token-usage finding that motivates this catalog’s opening warning.
LangGraph multi-agent documentation --- the explicit-graph reference for hierarchical and supervisor patterns.
CrewAI documentation --- the role-based reference design.
OpenAI Agents SDK --- the hand-off-centric reference design.
Magentic-One paper (arxiv.org/abs/2411.04468) --- the manager-plus-specialists pattern with benchmark results.
Conference proceedings: NeurIPS, ICLR, ICML, ACL, COLM. arXiv cs.MA for the multi-agent firehose.

Two pragmatic recommendations. First, before adopting multi-agent for any production application, do the single-agent baseline. Build the single agent with good tools first; measure its performance against the target metrics; only then evaluate whether multi-agent earns the additional cost. The discipline saves significant engineering investment and produces clearer evidence of when multi-agent actually helps. Second, default to shared-state designs (LangGraph state, CrewAI context, AutoGen group chat) for the cases where multi-agent is warranted. The hand-off problem is the single biggest source of multi-agent quality issues; shared-state designs dissolve it. Use explicit hand-offs only when the boundary is genuine (across organizations, trust zones, or compliance boundaries) and accept the engineering cost of designing the hand-off structure carefully when it must be done.

Appendix E --- Omissions

This catalog covers about 16 substrates across 8 sections. The wider multi-agent ecosystem is significantly larger; a non-exhaustive list of what isn’t here:

General distributed-systems frameworks (Akka, Erlang OTP, Ray Actors) when used outside the AI-specific multi-agent context.
Game-theoretic multi-agent systems research (cooperative game theory, mechanism design, market mechanisms for agent coordination) that hasn’t productized into working frameworks.
Agent simulation environments (Stanford generative agents, ChatDev, Voyager) when treated as research artifacts rather than production substrates.
Vertical-specific multi-agent products (Devin and similar coding agents, Manus and similar generalist agents) that implement specific multi-agent patterns under the hood without exposing them as frameworks.
Specialized multi-agent infrastructure (E2B, Modal, Replicate as deployment substrates for agents) when treated outside the multi-agent coordination context.
Multi-agent benchmarks (GAIA, AssistantBench, WebArena, AgentBench) which are evaluation resources rather than coordination substrates.

Appendix F --- A Note on the Moving Target

MCP shipped in November 2024. A2A in April 2025. ACP and AGNTCY consolidated in 2025. LangGraph’s supervisor pattern emerged across 2024. CrewAI matured through 2024 and 2025. OpenAI Agents SDK released in 2025 as the production successor to Swarm. AutoGen v0.4 rewrote the framework on an actor model. AG2 forked from AutoGen. Anthropic’s multi-agent research paper landed in 2025 with its 15×-token finding. Magentic-One published in late 2024. The category moved from “research and prototypes” to “production frameworks with empirical guidance” across roughly 24 months.

The deepest structural facts are stable. Multi-agent is a power tool with real costs and conditional benefits; default to single-agent unless the multi-agent design solves a specific problem the single-agent design can’t. Five coordination topologies (hierarchical, sequential, peer, hub-and-spoke, blackboard) capture the working patterns; production systems combine two or three thoughtfully. Three communication protocols (MCP, A2A, ACP) compete in overlapping space; consolidation is likely but uncertain. Three specialization axes (role, domain, skill) carve up the work; production systems combine them. The hand-off problem is the unsolved structural difficulty of multi-agent design; shared-state patterns are the strongest mitigation. An architect who internalizes these structural facts can map any new framework or protocol onto the design space; an architect who learns only the products has to relearn the field every year.

Nine volumes. Patterns, Skills, Tools, Events, Fabric, Memory, Human-in-the-Loop, Evaluation & Guardrails, Multi-Agent Coordination. The vocabulary covers the full design space of agentic AI as of mid-2026. The products will change; the vocabulary will adapt; the structural understanding will hold up. That’s the value of catalogs over manuals.

--- End of The Multi-Agent Coordination Catalog v0.1 ---