X← X · 精读长文
文章X · 精读长文· 06-22 · 12:40

用 Claude 构建你的第一支 AI Agent 团队(完整课程)

How to Build Your First Team of AI Agents Using Claude (Full Course)

打开原文约 33 分钟读

cover

Most people are using Claude to answer one question at a time.

Save this :)

A small group of people are using Claude to run an entire team of agents that research, write, code, review each other's work, and ship finished output while they sleep.

The difference between those two groups is not intelligence.

It is orchestration.

A single agent is an assistant. A team of agents is a workforce. One Claude instance answering your prompt is useful. Five Claude instances, each with a defined role, handing work to each other and checking each other's output, is a system that does in twenty minutes what used to take you a full day.

And right now almost nobody knows how to build this properly.

That is the opportunity. Multi-agent systems sound like something that requires a PhD and a research lab. They do not. With the tools available in 2026, you can build your first working agent team this week, with zero machine learning background, using nothing but Claude and a clear head.

Here is exactly how to do it, from the ground up.

First, Kill the Mental Model That's Holding You Back

The reason most people never build an agent team is that they think of Claude as a chat window.

You type, it responds, you type again. That is the consumer experience, and it caps you immediately.

Here is the better model. Think of Claude as a brain you can spin up as many times as you want. Each copy can be given a different job, a different personality, a different set of instructions, and a different set of tools. One copy never has to know what the others are doing. You, the orchestrator, decide who talks to whom and in what order.

That is all a multi-agent system is. It is not magic. It is a group of specialized Claude instances, plus a plan for how work flows between them.

Once that clicks, everything else is just plumbing.

The Three Roles Every Agent Team Needs

Before you build anything, understand the three core roles. Almost every useful agent team is some combination of these.

The Orchestrator. This is the manager. It takes your goal, breaks it into tasks, decides which specialist handles each task, and assembles the final result. It does not do the deep work itself. It delegates and integrates. In a well-built system, this is the only agent you talk to directly.

The Specialists. These are the workers. Each one is narrow and excellent. A research specialist that only gathers and verifies facts. A writer that only turns research into prose. A coder that only writes and tests code. A designer that only produces layout and visual specs. The narrower the role, the better the output, because a focused instruction beats a vague one every time.

The Critic. This is the role almost everyone skips, and it is the one that separates amateur systems from professional ones. The critic's only job is to review the specialists' output against a standard and send it back if it falls short. A team without a critic produces fast garbage. A team with a critic produces work you can actually ship.

Get these three roles right and you have the skeleton of every agent team worth building.

Your Build Path: Five Stages

You do not build a five-agent system on day one. You build one agent, then two, then a team. Here is the path.

Stage 1: Build a Single Excellent Agent

Before you orchestrate anything, you need one agent that does one job extremely well.

Open a Claude Project. This is your walled-off workspace. Drop in the instructions, reference files, and examples that define the job. A Project keeps context isolated so the agent does not get confused by unrelated conversations.

Now write the system instruction. This is the single most important thing you will do in this entire course. A weak instruction produces a weak agent no matter how many of them you stack. A strong instruction defines the role, the standard, the format, and the boundaries.

Here is the structure of a strong agent instruction:

  • Role: "You are a research specialist. Your only job is to gather and verify factual claims on a given topic."
  • Standard: "Every claim must be supported by a credible source. If you cannot verify a claim, you mark it as unverified rather than including it."
  • Format: "Return findings as a numbered list. Each item: the claim, the source, a confidence level."
  • Boundaries: "You do not write prose. You do not give opinions. You gather facts and hand them off."

What to Do This Stage

  • Pick one real task you do often that involves a clear, repeatable process
  • Build a single agent in a Claude Project with a full role/standard/format/boundaries instruction
  • Test it on ten real inputs and refine the instruction until the output is consistent
  • Save the final instruction as a reusable template

Stage 2: Add a Second Agent and Pass Work Between Them

Now you learn the core move of all multi-agent work: handoff.

The simplest two-agent team is a worker and a critic. The worker produces a draft. The critic reviews it. If it passes, you keep it. If it fails, it goes back with specific feedback.

You can run this manually at first. Open two conversations. Paste the worker's output into the critic. Paste the critic's feedback back into the worker. Watch the quality climb with each loop.

This feels clunky by hand, and that is the point. Feeling the friction teaches you exactly what you will later automate. You will understand viscerally why the handoff format matters, why structured output beats free text, and why a vague critic is worse than no critic at all.

What to Do This Stage

  • Build a worker agent and a critic agent, each in its own Project or conversation
  • Define the exact format the worker outputs and the critic consumes
  • Run five full worker-critic-worker loops by hand on a real task
  • Write down every point of friction. Those are your future automation targets

Stage 3: Give Your Agents Tools

An agent that can only talk is a chatbot. An agent that can act is a worker.

This is where Claude's connectors and the Model Context Protocol come in. MCP is an open standard that lets Claude connect to external tools and data sources through a single consistent interface. In practice, it means your agent can read your documents, search your files, query a database, pull from an API, or take an action in another app.

With connectors enabled, your research agent can search the web and read your own files instead of relying only on what it already knows. Your writing agent can pull from a shared style guide. Your coding agent can read your actual repository.

Tools are what turn a clever conversation into real work. The moment an agent can fetch its own inputs and act on its own outputs, you stop being a copy-paste middleman and start being a manager.

A word of caution that the hype crowd skips: an agent with tools can take real actions, so you give it the narrowest set of tools it needs and you keep a human in the loop for anything irreversible. Reading a file is safe. Sending an email on your behalf is not something you let an agent do unsupervised on day one.

What to Do This Stage

  • Enable the connectors your agents actually need, one at a time, per conversation
  • Give your research agent web search and file access and watch its output quality jump
  • Connect one agent to one real data source you use daily
  • Test what happens when a tool returns nothing or an error, and instruct the agent how to handle it

Stage 4: Automate the Orchestration

Now you stop being the middleman.

You have felt the friction of manual handoffs. You know the formats. Now you build the orchestrator, the manager agent that does the passing for you.

The orchestrator's instruction looks different from a specialist's. It is about delegation and assembly, not execution:

  • "You are the orchestrator. You receive a goal. You break it into subtasks. You assign each subtask to the correct specialist. You collect their outputs. You send drafts to the critic. You return the final assembled result only when the critic approves."

In 2026 you have two clean ways to run this. Inside Claude's agentic tooling, you can set up sub-agents that the main agent spawns and coordinates for parallelizable work, with the orchestrator splitting a job across several workers at once and stitching the results together. Or, if you are comfortable with a little code, you call the Claude API directly, sending the orchestrator's plan to each specialist as a separate request and feeding the responses back in.

You do not need both. Pick the one that matches your comfort level and ship it.

What to Do This Stage

  • Write an orchestrator instruction focused purely on delegation and assembly
  • Wire it to your existing specialists and critic
  • Run one full goal end to end without touching anything between input and output
  • Add one rule that pauses the system and asks you before any irreversible action

Stage 5: Make It Reliable and Repeatable

Anyone can get an agent team to work once. Professionals make it work the hundredth time.

This stage is about durability. You add three things.

Evaluation. Build a small set of test inputs with known good outputs. Run your whole team against them after any change. If quality drops, you catch it before your users do. This is the single habit that separates a toy from a tool.

Memory. Give your team persistent context so it does not start from zero every session. With Claude's project memory and the persistent storage now available in artifacts, your team can remember decisions, preferences, and past work across sessions.

Failure handling. Decide in advance what happens when a specialist returns garbage, a tool fails, or the critic and worker get stuck in a loop. A professional system has a defined escape hatch. An amateur one just breaks and you find out from an angry user.

What to Do This Stage

  • Build a ten-case evaluation set and run it after every change to your system
  • Add persistent memory so the team carries context between sessions
  • Define explicit failure behavior for each agent: what to do when inputs are bad
  • Set a hard limit on critic-worker loops so the team never spins forever

A Real Example: The Content Team

Let me make this concrete with a team you could build this weekend.

Say you want to produce researched, written, fact-checked articles on autopilot. Here is the team:

The orchestrator takes a topic and a target length. The research specialist searches the web, gathers verified facts, and returns a structured brief. The writer turns that brief into a full draft in your voice, pulling tone from a style guide you connected as a file. The critic checks the draft against three standards: factual accuracy versus the research brief, adherence to your style guide, and structural completeness. If anything fails, it goes back to the writer with specifics. Only an approved draft reaches you.

You give the orchestrator one line: "Write a 1,500 word article on X." Twenty minutes later you get a draft that has already been researched, written, and reviewed twice. You do final edits and ship.

That is not a fantasy. Every piece of that is buildable today with the stages above. The only thing standing between you and that team is sitting down and building it one stage at a time.

The Mistakes That Kill Agent Teams

A few traps catch almost everyone. Skip them and you will move twice as fast.

Building five agents before one works. You will be tempted to design the whole org chart first. Do not. One excellent agent beats five mediocre ones wired together. Earn each new agent.

Vague roles. "Help with research" is not a role. "Gather and verify factual claims, return as a structured list, never write prose" is a role. Specificity is everything.

No critic. A team that only produces and never reviews produces fast, confident garbage. The critic is not optional.

Over-trusting tools. An agent with the power to act needs the narrowest permissions and a human gate on anything that cannot be undone. Speed is not worth a deleted file or an email you did not mean to send.

Skipping evaluation. If you cannot measure whether your team got better or worse after a change, you are not building a system. You are gambling.

The Honest Truth About Multi-Agent Systems

A team of agents will not fix a process you do not understand.

If you cannot describe how a task should be done step by step, you cannot delegate it to agents, because each agent needs a clear instruction and you are the one writing it. The work of building an agent team is mostly the work of thinking clearly about your own process. The agents are easy. The clarity is hard.

But here is what makes this worth it. The people who learn to orchestrate agents are not going to be replaced by AI. They are the ones using AI to do the work of a whole team by themselves. That is the leverage. One person, a clear process, and a team of agents that never sleeps.

The window where building this puts you years ahead of everyone else is open right now.

Six weeks from today you can either still be typing one question into a chat box and waiting for one answer.

Or you can be running a team that works while you sleep.

The difference is whether you start building stage one today.

If you found this useful, follow me @eng_khairallah1 for more AI content like this. I post breakdowns, courses, and tools every week.

hope this was useful for you, Khairallah ❤️

cover

大多数人用 Claude,一次只问一个问题。

收藏一下 :)

而有一小撮人,用 Claude 跑起了一整支智能体(agent)团队——它们查资料、写稿、写代码、互相审稿,趁你睡觉的工夫就把成品交付了。

这两群人的差距,不在智商。

在编排(orchestration)。

单个智能体是助手,一支智能体团队是劳动力。一个 Claude 实例回答你的提示词,有用;五个 Claude 实例,各有明确分工,彼此交接、互相检查产出,就是一套系统——过去要你忙一整天的活,它二十分钟搞定。

而眼下,几乎没人知道该怎么把这套东西搭好。

机会就在这里。多智能体系统听起来像是要博士学位加一间研究实验室才玩得转的东西。其实不然。靠 2026 年现成的工具,你这周就能搭出第一支能跑的智能体团队,零机器学习背景,只需要 Claude 加一个清醒的脑子。

下面就从零讲,到底怎么搭。

先砍掉那个拖住你的思维模型

大多数人之所以从没搭起一支智能体团队,是因为他们把 Claude 当成一个聊天窗口。

你打字,它回话,你再打字。那是消费者的用法,一上来就给你封了顶。

换个更好的模型。把 Claude 想成一个大脑,你想启动几个就启动几个。每一份拷贝都能分到不同的活、不同的性格、不同的指令、不同的工具。一份拷贝完全不需要知道其他几份在干什么。你,作为编排者(orchestrator),决定谁跟谁说话、按什么顺序说。

多智能体系统就这么回事,没有魔法。它就是一群各有专长的 Claude 实例,外加一份关于工作如何在它们之间流动的计划。

这一点想通了,剩下的全是接管道的活儿。

每支智能体团队都需要的三个角色

动手搭之前,先搞清三个核心角色。几乎每一支有用的智能体团队,都是这三者的某种组合。

编排者(Orchestrator)。这是经理。它接过你的目标,拆成任务,决定每个任务交给哪个专才,再把最终结果拼起来。深活它自己不干,它负责分派和整合。在一套搭得好的系统里,它是你唯一直接对话的智能体。

专才(Specialists)。这些是干活的人。每一个都窄而精。一个只负责搜集和核实事实的调研专才。一个只把调研转成文字的写手。一个只写代码、测代码的程序员。一个只出版式和视觉规格的设计师。角色越窄,产出越好——聚焦的指令永远胜过含糊的指令。

评审(Critic)。这是几乎所有人都跳过的角色,也正是它把业余系统和专业系统区分开。评审唯一的工作,就是拿一个标准去检验专才的产出,不达标就打回去。没有评审的团队,飞快地产出垃圾;有评审的团队,产出你真能拿出手的活儿。

把这三个角色摆对,你就有了每一支值得搭的智能体团队的骨架。

你的搭建路径:五个阶段

第一天就上五智能体系统,那不行。你先搭一个智能体,再搭两个,最后才是一支团队。路径如下。

阶段一:先搭一个出色的单智能体

在你编排任何东西之前,你得先有一个把一件事做到极好的智能体。

打开一个 Claude Project。这是你与外界隔开的工作区。把定义这份活的指令、参考文件和样例丢进去。Project 让上下文(context)保持隔离,智能体就不会被不相干的对话搅糊涂。

现在写系统指令。这是整门课里你要做的最重要的一件事。指令弱,智能体就弱,无论你叠多少个都没用。指令强,则界定清楚角色、标准、格式和边界。

一条强的智能体指令,结构是这样的:

  • 角色(Role):「你是一名调研专才。你唯一的工作,是就给定主题搜集并核实事实性陈述。」
  • 标准(Standard):「每一条陈述都必须有可信来源支撑。无法核实的陈述,标注为未经核实,而不是把它写进去。」
  • 格式(Format):「以编号列表返回结论。每一项包括:陈述、来源、置信度。」
  • 边界(Boundaries):「你不写文章。你不发表观点。你搜集事实,然后交接出去。」

这个阶段该做什么

  • 挑一件你常做、且流程清晰可重复的真实任务
  • 在一个 Claude Project 里搭一个单智能体,配上完整的「角色/标准/格式/边界」指令
  • 拿十组真实输入测试它,反复打磨指令,直到产出稳定一致
  • 把最终指令存成一个可复用的模板

阶段二:加上第二个智能体,让活儿在两者之间流转

现在你要学的,是所有多智能体工作的核心动作:交接(handoff)。

最简单的双智能体团队,是一个干活的和一个评审。干活的出草稿,评审审它。过了,就留下;没过,带着具体反馈打回去。

一开始你可以手动跑。开两个对话。把干活的产出粘进评审。把评审的反馈粘回给干活的。看着质量随每一轮往上爬。

手动跑起来确实笨拙,而这正是重点。亲身体会这种摩擦,恰恰教会你日后要自动化的到底是什么。你会切肤地明白:为什么交接格式要紧,为什么结构化输出胜过自由文本,为什么一个含糊的评审比没有评审还糟。

这个阶段该做什么

  • 搭一个干活的智能体和一个评审智能体,各自放在自己的 Project 或对话里
  • 定死干活的输出格式、以及评审消费的格式
  • 在一件真实任务上,手动跑五轮完整的「干活—评审—干活」循环
  • 把每一处摩擦都记下来。那些就是你日后的自动化目标

阶段三:给你的智能体配上工具

只会说话的智能体是聊天机器人。能动手的智能体才是干活的人。

这里就轮到 Claude 的连接器(connector)和 Model Context Protocol 登场了。MCP 是一套开放标准,让 Claude 通过单一一致的接口接上外部工具和数据源。落到实处,就是你的智能体能读你的文档、搜你的文件、查数据库、拉 API,或者在另一个 App 里执行操作。

开了连接器,你的调研智能体就能搜网、读你自己的文件,而不只靠它脑子里已有的东西。你的写作智能体能拉取一份共享的文体指南。你的程序员智能体能读你真正的代码仓库。

工具,正是把一场漂亮的对话变成真正工作的那个东西。一旦智能体能自己取输入、对自己的输出动手,你就不再是个复制粘贴的中间人,而开始当一个经理。

一句鼓吹的人群会跳过的告诫:配了工具的智能体能采取真实行动,所以你只给它够用的最窄一组工具,并对任何不可逆的操作保留人工把关(human in the loop)。读一个文件是安全的。替你发一封邮件,可不是头一天就能放手让智能体无人监督去干的事。

这个阶段该做什么

  • 一次开一个,按对话粒度启用你的智能体确实需要的连接器
  • 给你的调研智能体配上网页搜索和文件读取,看它的产出质量往上蹿
  • 把一个智能体接到一个你每天真在用的数据源上
  • 测一测工具返回空结果或报错时会怎样,并教智能体该如何应对

阶段四:把编排自动化

现在,你不再当中间人了。

你已经体会过手动交接的摩擦。你懂那些格式了。现在你来搭编排者——那个替你做传递的经理智能体。

编排者的指令长得和专才不一样。它讲的是分派和组装,不是执行:

  • 「你是编排者。你接到一个目标。你把它拆成子任务。你把每个子任务分派给正确的专才。你收集它们的产出。你把草稿送给评审。只有评审通过后,你才返回拼装好的最终结果。」

到了 2026 年,你有两条干净的路子来跑这套。在 Claude 的智能体工具里,你可以设置子智能体(subagent),由主智能体启动并协调,处理可并行的工作——编排者把一个活同时切给几个干活的,再把结果缝到一起。或者,如果你不怵写一点代码,就直接调 Claude API,把编排者的计划作为各自独立的请求发给每个专才,再把回应喂回去。

两条路你不用都走。挑一条跟你舒适度匹配的,把它跑起来。

这个阶段该做什么

  • 写一条纯粹聚焦分派和组装的编排者指令
  • 把它接到你已有的专才和评审上
  • 完整跑通一个目标,从输入到输出中间一概不碰
  • 加一条规则:任何不可逆操作之前,先暂停系统、来问你

阶段五:让它可靠、可复用

谁都能让一支智能体团队跑成一次。专业的人,让它跑成第一百次。

这个阶段讲的是耐用性。你要加三样东西。

评测(Evaluation)。攒一小批测试输入,配上已知的好产出。每次改动之后,拿整支团队过一遍。质量掉了,你能在用户之前抓到。就这一个习惯,把玩具和工具分了开。

记忆(Memory)。给你的团队持久的上下文,别每次开新会话都从零开始。借助 Claude 的 project memory,以及现在 artifacts 里可用的持久化存储,你的团队能跨会话记住决策、偏好和过往的工作。

故障处理(Failure handling)。提前想好:专才返回垃圾、工具失灵、评审和干活的卡进死循环时,分别怎么办。专业的系统有一个明确的逃生口。业余的系统直接崩,然后你从一个愤怒的用户那里才知道。

这个阶段该做什么

  • 攒一套十例的评测集,系统每改一次就跑一遍
  • 加上持久记忆,让团队在会话之间带着上下文走
  • 给每个智能体定下明确的故障行为:输入有问题时该怎么办
  • 给「评审—干活」循环设一个硬上限,让团队永远不会空转下去

一个真实的例子:内容团队

我拿一支你这个周末就能搭出来的团队,把它讲具体。

假设你想自动产出经过调研、撰写、事实核查的文章。团队是这样的:

编排者接过一个主题和一个目标字数。调研专才搜网、搜集核实过的事实,返回一份结构化简报。写手把那份简报变成一篇完整草稿,用的是你的口吻——文体调子取自你接进来当文件的那份文体指南。评审拿三条标准检验草稿:对照调研简报的事实准确性、对文体指南的遵循度、以及结构完整性。但凡有一条不过,就带着具体意见打回给写手。只有通过的草稿才会到你手里。

你只给编排者一句话:「写一篇关于 X 的 1500 字文章。」二十分钟后,你拿到一份已经调研过、写好、并审过两遍的草稿。你做最终修订,然后发出去。

这不是幻想。这里头的每一块,靠上面那五个阶段,今天就都搭得出来。横在你和这支团队之间的,只有一件事:坐下来,一个阶段一个阶段地搭。

那些会搞死智能体团队的错误

有几个陷阱几乎人人都踩。绕开它们,你能快一倍。

一个还没跑通就先搭五个智能体。 你会忍不住先把整张组织架构图设计出来。别。一个出色的智能体,胜过五个平庸的拼在一起。每加一个新智能体,都得它配得上。

含糊的角色。 「帮忙做调研」不是一个角色。「搜集并核实事实性陈述,以结构化列表返回,绝不写文章」才是一个角色。具体,就是一切。

没有评审。 一支只产出、从不审查的团队,飞快地、自信满满地产出垃圾。评审不是可选项。

过度信任工具。 一个有权动手的智能体,需要的是最窄的权限,外加对任何无法撤销之事的人工把关。删掉的文件、发错的邮件,不值你用速度去换。

跳过评测。 如果一次改动之后,你没法衡量团队是变好了还是变差了,那你搭的不是系统,是在赌。

关于多智能体系统的大实话

一支智能体团队,治不好一个你自己都没搞懂的流程。

如果你没法把一件任务一步一步讲清楚该怎么做,你就没法把它交给智能体——因为每个智能体都需要一条清晰的指令,而写指令的人是你。搭一支智能体团队的活,大部分其实是把你自己的流程想清楚的活。智能体是容易的,想清楚才是难的。

但这件事值得做,原因在这里。学会编排智能体的人,不会被 AI 取代。他们恰恰是用 AI、靠一个人就干完一整支团队的活的那批人。这就是杠杆。一个人,一套清晰的流程,外加一支永不睡觉的智能体团队。

那扇窗——搭起这套东西就把你甩在所有人前头好几年的窗——眼下正开着。

从今天起六周后,你要么还在往聊天框里敲一个问题,等一个答案。

要么,你在跑一支趁你睡觉时也在干活的团队。

差别只在于:今天,你动没动手搭阶段一。

如果你觉得这篇有用,关注我 @eng_khairallah1,看更多这样的 AI 内容。我每周发拆解、课程和工具。

希望这对你有用,Khairallah ❤️