Anthropic官方：拆开“思考”“执行”，Claude怎么把Agent跑起来

好文章来自Anthropic 官方发布 Managed Agents(CMA)产品时的技术博客,既讲产品发布,也讲背后的工程哲学。

划重点：1.

Decoupling the brain from the body

让"大脑"和"身体"解耦

为什么要做 Managed Agents

A running topic on the Engineering Blog is how to build effective agents and design harnesses for long-running work. A common thread across this work is that harnesses encode assumptions about what Claude can't do on its own. However, those assumptions need to be frequently questioned because they can go stale as models improve.

我们工程博客一直在讨论的话题是:如何构建高效的 agent、如何为长任务设计 harness。这一系列工作有个共同主线——

harness本质上是在'对Claude 自己做不到的事情做出假设'。但这些假设需要经常被重新审视,因为模型在进步,假设可能就过时了。"harness的本质挑明了——harness不是给 Claude 加能力,而是在弥补 Claude 自己暂时做不到的事。这是个非常深刻的工程哲学。harness和模型能力是一对此消彼长的关系。

As just one example, in prior work we found that Claude Sonnet 4.5 would wrap up tasks prematurely as it sensed its context limit approaching—a behavior sometimes called 'context anxiety.' We addressed this by adding context resets to the harness. But when we used the same harness on Claude Opus 4.5, we found that the behavior was gone. The resets had become dead weight。

"Context anxiety"

这个词翻译过来就是"上下文焦虑"——模型自己感受到上下文快满,就慌张地草草收尾。这是模型的"心理状态"导致的工程问题。工程师为此做了"context reset"零件——但 Opus 4.5 出来后,这个心理问题没了,这个零件就成了累赘。

这就是 harness 设计的核心矛盾：你针对当前模型问题做的工程，有可能在下一代模型上就废了。

解法——做一个不绑定具体 harness 的"元 harness"

We expect harnesses to continue evolving. So we built Managed Agents: a hosted service in the Claude Platform that runs long-horizon agents on your behalf through a small set of interfaces meant to outlast any particular implementation."

我们预期 harness 会继续演化。所以我们做了 Managed Agents:一个托管在 Claude Platform 上的服务，通过一组小而稳定的接口，替你跑长任务 agent——这些接口的设计目标是比任何具体实现活得更久。" Managed Agents(CMA)的真正定位——它不是一个具体的 harness,而是"harness的容器"。接口本身可能有不同类型(创建、操作、流式)，但它们都是 Anthropic 对外承诺稳定的"契约"。你的代码依赖这套接口写。

你的代码"绑定"在 CMA 的这套接口上，意思是你的代码必须按 CMA 接口的规范来写,不能随便改。

Managed Agents是harness 的容器——它自己不规定具体 harness 怎么做，而是定义了一组稳定的接口(session),让任何符合这组接口的 harness 都能在上面跑。这个设计哲学完全模仿了操作系统对硬件的抽象——OS 不规定你用什么硬件,只规定了"进程、文件、内存"这些抽象接口。

类比操作系统——"为还没想到的程序做设计"

Building Managed Agents meant solving an old problem in computing: how to design a system for 'programs as yet unthought of.' Decades ago, operating systems solved this problem by virtualizing hardware into abstractions—process, file—general enough for programs that didn't exist yet. The abstractions outlasted the hardware. The read() command is agnostic as to whether it's accessing a disk pack from the 1970s or a modern SSD.

一个古老的问题：如何为'还没被想到的程序'设计一个系统?几十年前,操作系统通过把硬件虚拟化成抽象概念(进程、文件)来解决这个问题——这些抽象足够通用,通用到能服务那些还不存在的程序。

这些抽象比它们最初服务的硬件活得更久。read()命令并不关心它访问的是 1970 年代的磁盘组,还是现代的 SSD。

操作系统(过去)

Managed Agents(现在)

虚拟化硬件

虚拟化 agent 组件

进程、文件作为抽象

session、harness、sandbox 作为抽象

read()

不关心底下是磁盘还是 SSD

Managed Agents 不关心底下用什么具体 harness

抽象层稳定,实现层自由演化

接口稳定,Claude 模型和 harness 自由演化

好的抽象能"穿越时代"。OS 的read()命令从 1970 年代到现在没变,但底下硬件换了无数代。Managed Agents 想做的是同一种事——让今天写的 agent 代码，5 年后还能跑,即使中间 Claude 模型迭代了 5 代。

Managed Agents follow the same pattern. We virtualized the components of an agent: a session (the append-only log of everything that happened)，session 是 append-only 的——这是个非常重要的工程选择。意思是发生过的事永远不能改,只能往后加。这让整个系统可审计、可回放、可调试；因为你永远能看到"agent 在每一步都看到了什么、做了什么"。

meta-harness 的概念

With Managed Agents, we aimed to design a system that accommodates future harnesses, sandboxes, or other components around Claude. Managed Agents is a meta-harness in the same spirit, unopinionated about the specific harness that Claude will need in the future. Rather, it is a system with general interfaces that allow many different harnesses.

元 harness(meta-harness)——它对未来 Claude 需要什么具体 harness 不持立场,而是提供一组通用接口,容纳各种不同 harness。把context 做成外部对象，让模型按需查询。Anthropic 不打算让客户绑定到任何特定的 harness 设计——它知道 harness 会演化。所以它把"产品形态"放在更高一层:你绑定 Managed Agents 这套接口,我保证它能容纳从 2026 年到未来任何 Claude 的 harness 演化。

Claude Code 是其中一种 harness

For example, Claude Code is an excellent harness that we use widely across tasks. We've also shown that task-specific agent harnesses excel in narrow domains. Managed Agents can accommodate any of these, matching Claude's intelligence over time.

举例说,Claude Code 就是一个优秀的 harnes，我们在很多任务上广泛使用它。我们也证明过：

针对特定任务定制的 agent harness，在窄领域里表现卓越。Managed Agents 能容纳任何这些 harness—长期跟随 Claude 智能的演进。

Anthropic 的态度是:通用 harness(Claude Code)和专用 harness(领域定制)都有价值,我的 Managed Agents 平台都能跑。

这就避免了"我只有一套 harness,客户必须接受我的设计选择"的局限——CMA 是平台,不是单一产品。

核心难题——"哪些 token 该留,哪些该扔"

But irreversible decisions to selectively retain or discard context can lead to failures. It is difficult to know which tokens the future turns will need.

但是,不可逆地选择保留或丢弃哪些 token,会导致失败——很难提前知道未来对话会需要哪些 token。工程师做 compaction、trimming 时,要决定"哪些信息保留、哪些扔掉"。但这个决策是不可逆的——一旦扔了,以后想用就拿不回来了。

而模型未来需要什么信息,很难提前预测——可能 3 小时前一句无关紧要的话,3 小时后突然变成关键线索。

这就是为什么传统的"压缩 + 修剪"方案有上限——它在做赌博,赌"扔掉的东西未来不需要"。

更优解——把上下文做成可查询对象

If messages are transformed by a compaction step, the harness removes compacted messages from Claude's context window, and these are recoverable only if they are stored. Prior work has explored ways to address this by storing context as an object that lives outside the context window. For example, context can be an object in a REPL that the LLM programmatically accesses by writing code to filter or slice it.如果消息被 compaction 步骤转换了,harness 就会把这些被压缩的消息从 Claude 上下文窗口里移除——只有存起来的消息才能恢复。之前的工作探索过解决这个问题的方法——把上下文当作一个对象,存在上下文窗口之外。比如:上下文可以作为 REPL 里的一个对象,LLM 通过写代码去过滤、切片来访问它。

这一段提出了关键解法——把上下文从"塞在 token 里"升级成"塞在外部对象里"。

Managed Agents 里的 session 实现这个思想：

"In Managed Agents, the session provides this same benefit, serving as a context store for Claude that lives outside the context window."

在 Managed Agents 里,session 就提供了同样的能力——它作为一个存在于上下文窗口之外的上下文存储,服务于 Claude。"这里把前面的理论落地到产品上——Managed Agents 的 session 就是这个"外部上下文对象"。

这是 Anthropic 最深的工程巧思:他们把"长上下文管理"这件原本由 harness 做的事情,提升到了平台层。也就是说,以前每个 harness 都要自己解决"上下文怎么管",现在 CMA 直接把它做到了平台里——任何跑在 CMA 上的 agent 都自动获得这个能力。

"对话"在 session 和 context 里都有

——但 session 是"原始档案",context 是"此刻的工作视野"。

Session(完整记录)

Context Window(工作视野)

包含什么

2000 个完整事件

任务摘要 + 进度摘要 + 最近 3 步

token 量

5000 万

12 万

Claude 调用工具时是"靠读描述自己决定怎么传参数",所以函数名、参数命名、描述写法、错误信息全都要为"模型能用对"做优化——这和"为人类设计"的传统 API 哲学有时候冲突。Anthropic 在 Claude Code 里每个工具的 description 都被反复打磨,就是因为它直接决定 agent 的成功率。

agent 时代的 API 工程师,要同时驾驭"几十年验证过的接口设计哲学"和"全新的模型可读性原则",这种复合能力是 Anthropic、OpenAI 这种公司真正稀缺的核心人才。

本质上，Anthropic 把它的"agent infra 层"正式产品化。

agent 工程的复杂度被严重低估"——这件看似朴素的事,真做对极其难。

铭鸿体育资讯网

Anthropic官方：拆开“思考”“执行”，Claude怎么把Agent跑起来

热门分类