文章X · 精读长文· 06-22 · 12:40

Agentic 编程产品的下一站

What's Next for Agentic Coding Products?

cover

2026: AI Editor → Agent Command Center

As of 2026, there are four major forms of agentic coding products:

Transition period · Editor & IDE + chat side-panel：VS Code, Cursor 1.0, Antigravity 1.0
Transition period · Coding Agent CLI：Claude Code, Codex CLI, Gemini CLI
🤖 AI-native · Agent coworker chat apps：Slack, ClickUp, Slock
🤖 AI-native · Agent command center apps：Codex App, Cursor 3, Antigravity 2.0

We can split them into two phases:

Phase 1: Transition from human → agentic coding (2023 ~ 2025): Products: Editor & IDE + chat side-panel, Coding Agent CLI

Editors predate LLMs, adding a side-panel enables AI coding without breaking the classic interface.

Coding agent CLIs were first built as an experiment (according to Claude Code's creator). Nobody thought they would work this well, so the creators chose a CLI over a GUI simply because it was easier to build.

Phase 2: 🤖 Agent-native workflows (2025 ~ now): Products: Agent coworker chat apps, Agent command center apps

Compared to Phase 1—where humans still wrote code by hand or constantly steered the agents—the primary user journey has shifted to orchestrating agents to handle work from start to finish. The agent is no longer an assistant living in a side panel; it represents the core functionality of the product.

In 2026, there is a notable trend: companies are redesigning their existing products to be agent-focused. For instance, Google launched Antigravity 2.0 at I/O, replacing the VS Code-based editor with an agent-first layout (i.e., an agent command center). Cursor 3 did this even earlier in April. To quote from their announcement:

We're introducing Cursor 3, a unified workspace for building software with agents. The new Cursor interface brings clarity to the work agents produce, pulling you up to a higher level of abstraction, with the ability to dig deeper when you want. It's faster, cleaner, and more powerful, with a multi-repo layout, seamless handoff between local and cloud agents, and the option to switch back to the Cursor IDE at any time.

This shift is more significant than you might realize. Leading players in the market have thrown away incredibly successful products used by millions of people to create something completely new. Clearly, these are rational, careful decisions backed by comprehensive internal studies and metrics. Is this the final form of agentic coding products? Not necessarily, but for the near future, agent command center will continue to dominate.

What about agent coworker chat apps? I love the concept, and I use OpenClaw from Telegram every day. But the question remains: do I really want another chat app? Maybe not. That's why I think WhatsApp, Telegram, Slack, and Discord will maintain their positions while becoming more AI-native. Besides, a good coding environment still requires features like code editing, debugging, and linting, which chat apps don't provide out of the box.

What's Next?

Today, if you use Codex/Cursor/Antigravity, they look and feel almost identical:

A natural question is: what's next? I see a few directions for evolution in agent capabilities and product design:

Better Agent Harnesses

Needless to say, agents and their harnesses will continue to improve. There is a ton of exciting research into dynamic context, safer sandboxes, memory/skill management, and subagents. What I find most interesting is that Anthropic "abandoned" their belief that "You don't need a new agent, just create skills," and instead created agent teams and dynamic workflows. If I understand correctly, this is a loop:

Previously, when there weren't enough agent trajectories in the training data, you needed a harness layer or prompting tricks to make things work. Now, as LLMs are trained directly on that data, they can complete long-running, multi-tool tasks without much steering, allowing the harness layer to become thinner and more standardized.

Consequently, people have created more sophisticated architectural patterns like agent teams because LLMs aren't inherently trained to handle them yet. Will LLMs handle this natively someday, creating a harness on the fly to fit a given task? It might not be that far off.

Proactivity: Can Agents Suggest What to Do Next?

People have been talking about proactive agents for a while, but few products actually support them. Jules is one of the exceptions. If you haven't tried this feature, Jules scans your codebase looking for TODOs and performance bottlenecks, generates suggestions, and notifies you. Once you approve a suggestion, Jules automatically creates a PR. I find the performance tips quite useful and have already merged multiple PRs this way.

Beyond that, agents should be able to proactively maintain a codebase by completing tasks such as:

Fixing code style issues

Creating tests

Upgrading dependencies

Fixing bugs

Some tasks may require external triggers, like a pipeline that automatically converts GitHub issues into user prompts. What is even more interesting—and has yet to reach a consensus—is whether we should let agents proactively suggest new features, let alone submit those changes without human approval. This is very different from standard maintenance work, as the problem space is open-ended with no single right answer. I'm looking forward to hearing stories from teams trying this.

Team Collaboration

Imagine a team working like this: previously, engineers produced 5 commits a day; with coding agents, they produce 12.

This brings two major challenges, which will only intensify as coding agents become more capable:

Context Isolation: Vital engineering context gets trapped inside isolated agent conversations, leading to a breakdown in shared team knowledge. With engineers shipping code they don't read, even humans lose the full context of how the code functions.

Coordination Hurdles: Codebases evolve 10x faster with AI. Without a central hub to synchronize intent and progress, teams face broken APIs, redundant efforts, and code conflicts. Existing coordination problems are magnified tenfold.

A possible solution is to build a coordination layer that syncs context across agents and accumulates a team memory. This would act as your team's ultimate PM—someone who knows what everyone is working on and proactively unblocks or helps people when needed. Much like the human brain, long-term memory would store lasting team knowledge (e.g., design choices), while short-term memory tracks immediate team progress and workspace status. SageOx is one startup embracing this exact idea.

Verification: Can an Agent Verify if a Change Actually Works?

I'm putting verification at the end, as it is always the final step in an agent loop (if there is one) 😉.

Many people don't realize how important verification is. However, if you examine agent trajectories, almost all coding agents perform some form of validation at the end. For example, an agent might launch a local server to test a newly added endpoint. If a test fails, it will attempt to fix the issue and continue looping until it passes.

Does this mean verification is a solved problem? Certainly not. The biggest bottleneck remains setting up the test environment. Not all verification is as simple as sending HTTP requests. Changes to a frontend, for instance, typically require browser automation or even recording video and screenshots to be evaluated by an LLM auto-rater—which is a non-trivial process on its own. What if you're fixing bugs for an Android app but don't have the necessary toolchains installed? Yes, you can connect to a remote sandbox containing all dependencies, but then you have to figure out communication protocols and address privacy concerns. While it isn't too difficult to build a custom solution for a specific requirement, turning this into a generic, out-of-the-box component for any coding agent still requires a long way to go.

P.S. I'm open to work. if you're looking for an AI agent engineer in the bay area, feel free to DM.

cover

2026：AI 编辑器 → 智能体指挥中心

截至 2026 年，智能体化编程（agentic coding）产品主要有四种形态：

过渡期 · 编辑器与 IDE + 聊天侧边栏：VS Code, Cursor 1.0, Antigravity 1.0
过渡期 · 编程智能体 CLI：Claude Code, Codex CLI, Gemini CLI
🤖 AI 原生 · 智能体同事聊天应用：Slack, ClickUp, Slock
🤖 AI 原生 · 智能体指挥中心应用：Codex App, Cursor 3, Antigravity 2.0

我们可以把它们划分为两个阶段：

阶段一：从人工 → 智能体化编程的过渡（2023 ~ 2025）：产品：编辑器与 IDE + 聊天侧边栏、编程智能体 CLI

编辑器早于 LLM 出现，加一个侧边栏就能在不破坏经典界面的前提下接入 AI 编程。

编程智能体 CLI 最初是作为一次实验而构建的（据 Claude Code 的创造者所说）。没人想到它会这么好用，所以创造者选 CLI 而非 GUI，纯粹是因为它更好造。

阶段二：🤖 智能体原生工作流（2025 ~ 至今）：产品：智能体同事聊天应用、智能体指挥中心应用

与阶段一相比——那时人类仍在手写代码，或不断地给智能体把方向——核心的用户旅程已经转向了「编排智能体，让它从头到尾把活干完」。智能体不再是住在侧边栏里的助手，它代表的是产品的核心功能。

2026 年有一个值得注意的趋势：各家公司正在把自己已有的产品重新设计为以智能体为中心。比如，Google 在 I/O 上发布了 Antigravity 2.0，用一套智能体优先的布局（即智能体指挥中心）替换掉了原先基于 VS Code 的编辑器。Cursor 3 在四月就更早地做了同样的事。引用他们的公告：

我们推出 Cursor 3，一个用智能体构建软件的统一工作区。全新的 Cursor 界面让智能体产出的工作变得清晰，把你拉到一个更高的抽象层级，同时保留你想深入时随时下探的能力。它更快、更干净、更强大，带来多仓库布局、本地与云端智能体之间的无缝交接，以及随时切回 Cursor IDE 的选项。

这一转变比你以为的更重大。市场上的领头玩家，把那些被数百万人使用、极其成功的产品扔掉，去打造一个全新的东西。显然，这些都是理性而审慎的决定，背后有完整的内部研究和指标支撑。这是智能体化编程产品的最终形态吗？未必，但在可见的未来，智能体指挥中心仍将占据主导。

那智能体同事聊天应用呢？我很喜欢这个概念，自己每天都在 Telegram 上用 OpenClaw。但问题在于：我真的想再要一个聊天应用吗？也许并不想。所以我认为 WhatsApp、Telegram、Slack、Discord 会守住各自的位置，同时变得更加 AI 原生。何况，一个好的编程环境仍然需要代码编辑、调试、lint 这些能力，而这些是聊天应用开箱即用所不具备的。

下一站是什么？

今天，如果你用 Codex / Cursor / Antigravity，它们看上去、用起来几乎一模一样：

一个自然的问题是：下一站是什么？在智能体能力和产品设计上，我看到几个可能的演进方向：

更好的智能体 harness（运行框架）

不必多言，智能体及其 harness 会持续进步。围绕动态上下文、更安全的沙箱、记忆/技能管理、子智能体（subagents），有大量令人兴奋的研究在进行。我觉得最有意思的是，Anthropic「放弃」了它原本「你不需要新的智能体，只要创建 skill 就好」的信条，转而做出了智能体团队（agent teams）和动态工作流（dynamic workflows）。如果我理解得没错，这是一个循环：

此前，当训练数据里还没有足够多的智能体轨迹时，你需要一个 harness 层或一些 prompting 技巧才能把事情跑通。如今，随着 LLM 直接在这些数据上训练，它们无需太多把控就能完成长周期、多工具的任务，于是 harness 层得以变得更薄、更标准化。

因此，人们造出了像智能体团队这样更精巧的架构模式，因为 LLM 本身还没被训练得能直接处理它们。LLM 会不会有一天原生地搞定这些，临场为某个特定任务现造一个 harness？也许并不遥远。

主动性：智能体能否主动建议「接下来做什么」？

人们谈论主动型智能体已经有一阵子了，但真正支持它的产品寥寥无几。Jules 是少数的例外之一。如果你还没试过这个功能：Jules 会扫描你的代码库，寻找 TODO 和性能瓶颈，生成建议并通知你。一旦你批准某条建议，Jules 就会自动创建一个 PR。我发现它给的性能优化提示相当有用，已经通过这种方式合并了好几个 PR。

除此之外，智能体应当能够主动维护一个代码库，去完成诸如下面这些任务：

修复代码风格问题

创建测试

升级依赖

修复 bug

有些任务可能需要外部触发，比如一条把 GitHub issue 自动转成用户 prompt 的流水线。更有意思、也尚未形成共识的是：我们该不该让智能体主动建议新功能，更别说在没有人类批准的情况下直接提交这些改动了。这与标准的维护工作非常不同，因为它的问题空间是开放式的，没有唯一的正确答案。我很期待听到那些正在尝试这件事的团队带来的故事。

团队协作

设想一个这样工作的团队：以前工程师一天产出 5 个 commit，有了编程智能体后，产出 12 个。

这带来两个重大挑战，而且随着编程智能体越来越强，它们只会愈发尖锐：

上下文孤岛（Context Isolation）：关键的工程上下文被困在彼此隔离的智能体对话里，导致团队共享知识的崩解。当工程师交付自己根本没读过的代码时，连人类自己也失去了对代码如何运作的完整把握。

协同障碍（Coordination Hurdles）：在 AI 加持下，代码库的演进速度快了 10 倍。没有一个中心枢纽来同步意图和进度，团队就会遭遇 API 被改坏、重复劳动、代码冲突。原本就存在的协同难题被放大了十倍。

一个可能的解法，是构建一个协同层，在多个智能体之间同步上下文，并沉淀出团队记忆。它会像团队的终极 PM 一样——清楚每个人在做什么，并在需要时主动为人解除阻塞、提供帮助。就像人脑一样，长期记忆存放持久的团队知识（比如设计选择），短期记忆则跟踪即时的团队进度和工作区状态。SageOx 就是一家拥抱这一思路的初创公司。

验证：智能体能否核实一处改动是否真的有效？

我把验证放在最后，因为它永远是智能体循环的最后一步（如果有这一步的话）😉。

很多人没有意识到验证有多重要。然而，如果你去看智能体的轨迹，几乎所有编程智能体都会在结尾做某种形式的校验。比如，一个智能体可能会启动本地服务器来测试一个新加的 endpoint。如果某个测试失败，它会尝试修复问题并继续循环，直到测试通过为止。

这是否意味着验证已经是一个被解决的问题了？当然不是。最大的瓶颈仍然是搭建测试环境。并非所有验证都像发 HTTP 请求那样简单。比如对前端的改动，通常需要浏览器自动化，甚至要录视频、截图，再交给一个 LLM 自动评分器（auto-rater）来评判——而这件事本身就不简单。如果你在为一个 Android 应用修 bug，却没有装好必需的工具链怎么办？是的，你可以连到一个装齐所有依赖的远程沙箱，但接着你就得搞定通信协议，并解决隐私顾虑。为某个特定需求搭一个定制方案并不算太难，但要把它做成一个通用的、任何编程智能体都能开箱即用的组件，还有很长的路要走。

附言：我正在找工作。如果你在湾区招 AI 智能体工程师，欢迎私信我。