认识一下 "Built with Opus 4.7" Claude Code 黑客松的获奖者们
Meet the winners of the Built with Opus 4.7 Claude Code hackathon
From medical training and electronics repair to coding education and factory maintenance, see the projects built by the winners of our latest virtual hackathon.
Last week, we hosted Claude Build Day, our latest hackathon where builders got together in San Francisco to put their ideas to work using Claude Opus 4.8.
While we wait to see what they built, we chatted with the winners of our Built with Opus 4.7 hackathon about their projects. They tackled medical training, electronics repair, computer science education, interactive play, home repair, and factory maintenance.
Congratulations to the winners and to everyone who participated! We hope their ideas will inspire you.
First place: Medkit, Bedirhan Keskin
Bedirhan Keskin, an Istanbul-based physician-turned-software engineer, used Claude Managed Agents to build Medkit: a learning tool for medical residents or junior doctors, simulating real-life patient encounters in a gamified medical clinic.
"When you're alone in an emergency department with 50 patients waiting and you realize there are cases you never practiced in medical school, you end up practicing them on real patients in real time," says Bedirhan.
With Medkit, medical students practice diagnosing and treating simulated patients, using the tool to take medical history, order labs, read imaging, diagnose, and prescribe treatment. At the end, an agentic grader assesses the full encounter against the same published clinical guidelines that a board examiner would use.
Bedirhan built Medkit across four separate Claude Code sessions (voice engine, content generation, 3D game layer, and a core app), keeping each context clean and progressing on all at once. He followed a “talk, don’t type” approach, working almost entirely by voice.
Medkit is already gaining traction, with three medical faculties and a pharma company, all based in Istanbul, set to start running pilots in the coming weeks.
Advice to other builders: Work with Claude as a thought partner, not just a coding agent.
Bedirhan’s first instinct was to self-host the voice engine, but Claude suggested using a cloud provider to move faster. “What I value most about Claude is it’s not just a code generator, but a thought partner helping me see options I'd otherwise miss,” he says.
Second place: Wrench Board, Alexis Chapellier
Alexis Chapellier from Reignier-Ésery, France, spent years fixing electronics before creating RepairMind, an AI-powered management platform for repair shops. His Opus 4.7 hackathon project, Wrench Board, helps independent technicians figure out complex repairs. Users drop in a schematic and a boardview and describe the symptoms, and the agent creates a unified electrical graph, reasons over it, points to the exact pad to probe, reads measurements, and updates its hypotheses until it diagnoses the issue.
Alexis prototyped Wrench Board in Claude Design, separating the app’s responsibilities (design, schematic ingestion, boardview, diagnostic agent) and producing first a spec and then a plan for each one. He executed in Claude Code’s multi-agent mode, benchmarking at every step by running five or six agents in parallel during debugging, with one dedicated agent per domain.
Alexis’s big bet was on Opus 4.7’s new ability to understand visual schematics; he says he knew it was working when he asked the model to trace a power path on a motherboard.
“I watched Wrench Board’s boardview light up step by step, arrows appearing, components getting pointed at, names surfacing. At that moment, I understood the idea was holding up," he says.
Wrench Board’s next phase is to build a community of electronics repairers interested in trying the app and experts able to enrich the tool with their field experience. His Claude credits will go toward RepairMind, those first users, and all the infrastructure currently in flight.
“This hackathon is proof that a self-taught person coming out of a repair shop can ship an ambitious system in five days,” says Alexis, who was applying for “survival jobs” when he entered. “Claude Code amplifies whoever has an idea and the endurance to execute on it, regardless of their starting point.”
Advice to other builders: Go deep in the brainstorm and push back on the model.
Alexis uses Superpowers, a skills framework integrated into Claude that structures the brainstorm-then-plan steps, sometimes running brainstorms in parallel to make progress on different fronts. He starts in Claude Design, then hands off to Claude Code using the built-in button that shares the project directly, and pushes the model when it tells him no.
“During the hackathon, Claude told me several times that this or that wouldn't fit in the time available. In reality, I had plenty of time,” he says. “You have to know how to tell it, I'm going to try anyway."
Third place: Maieutic, Paula Vásquez-Henríquez
Paula Vásquez-Henríquez, who teaches computer science at Universidad del Desarrollo in Concepción, Chile, says over the past two years she is seeing more students pass tests without understanding their own code.
"Students now use AI to manufacture code, but they have no idea what the code does," she says. “They never learn to state a problem precisely, to draft a plan before coding, or to read their own code critically and notice where it drifted from what they intended. The autocomplete delivers working code before they've even finished forming the question, so the metacognitive loop, the thinking-about-your-thinking that actually creates a programmer, never closes. They graduate able to generate code but not reason about it.”
Paula, who is currently working on a PhD in Artificial Intelligence researching student–AI interaction patterns, entered the hackathon to solve this problem from both student and instructor perspectives.
Maieutic is an IDE designed to make students slow down at key moments. Students must describe in plain language what their program should do before writing any code; Claude asks targeted clarifying questions and keeps the editor locked until the spec is detailed enough that a competent programmer could implement it without guessing.
Students can then start writing Python but autocomplete is off; a chat panel answers reference questions directly but responds to reasoning questions with counter-questions rather than fixes, refusing to do the student's thinking for them.
The Intent-Diff Review, the core of the tool, has Claude compare the spec against the final code, classify each divergence as drift, revision, or bug, and then surface a neutral, non-accusatory question prompting the student to explain the issue themselves.
For instructors, a live dashboard shows one row per student with a one-sentence cognitive summary (e.g., "written the spec three times, still hasn't considered empty input"). Teachers can click on individual students to monitor their specific interactions with Claude, which also analyzes the full cohort to identify and surface any shared misunderstandings across the whole class so instructors can close that gap.
Since the hackathon ended, researchers at the University of Houston have reached out about co-authoring a paper, and Paula is putting her prize credits toward developing the tool further. She says hackathon week showed her that the gap between understanding a problem and shipping a tool for it has collapsed.
“I’m an educator in Concepcion, Chile, not Silicon Valley,” she says. “I shipped a working full-stack product in a week because the tools let me stay in the role I'm genuinely expert in while they handle the rest. The people closest to real problems can now build for them directly.”
This project was Paula dogfooding her own philosophy: specify before you build. “Maieutic exists because students jump straight to code, and the only way I built it well was by refusing to do exactly that myself,” she says. She dedicated two days to pure thought work, creating the design spec and the technical spec before writing a single line of code.
“Those two days of spec felt slow at the time, there's real pressure in a hackathon to start shipping immediately, but they were what let the rest of the week move fast,” she says.
Most Creative Use of Opus 4.7: Virtual Puppet Theater, Rene Hangstrup Møller
Intrigued with Opus 4.7’s spatial reasoning capabilities, full stack developer Rene Hangstrup Møller built Virtual Puppet Theater, a browser-based app that turns webcam video and voice into a dynamic interactive puppet show. A real-time animated puppet mirrors a user’s movements while a second AI-driven companion puppet banters with the user; spoken prompts can transform the scenery and spawn 3D props on the fly.
Rene used Claude across the full pipeline: concept discussion, planning, and code writing, while he handled direction, architecture, review, and decision-making. The app is based on Bun, Vite, and TypeScript, using MediaPipe hand tracking (running in WASM) and Three.js to render the puppet stage in 3D at 60 fps. A small WebSocket server connects to Claude Opus 4.7 via the Anthropic SDK to drive the AI puppet's dialogue and generate 3D props on the fly, while voice is handled by the Web Speech API for input and ElevenLabs for output (with browser speech synthesis as a fallback). Opus's spatial reasoning capabilities, refined through a screenshot-based feedback loop, handle the visual output.
There's no objective in Virtual Puppet Theater beyond open-ended play and Rene says she has no product plans for his winning project. For him, it was about learning and fun. "I tested it with my youngest son and he had a blast,” Rene says. “Seeing him interact with the puppet, describe scenes, and giggle at the responses was really the only user validation I needed.”
Virtual Puppet Theater’s source code is available on GitHub under MIT licensing, he adds, “if anyone wants to take it further.”
Advice to other builders: If you’re participating in a hackathon, plan time to create the demo video.
"It takes way longer than you think to produce a 3-minute video,” Rene says, noting that he went up against the hackathon deadline using Claude and Hyperframes to create and edit his Virtual Puppet Theater video. “Many people in the hackathon Discord warned about this, and they were right. Next time, I'd reserve that entire last day just for producing the demo."
"Keep Thinking" Prize: MaestrIA, Benjamin Torralbo
Benjamin Torralbo grew up apprenticing alongside his father, Juan Rodrigo Torralbo, a certified Maestro Mayor carpenter in Chiloé, Chile. “My father has 30 years of craft, has restored UNESCO-listed churches, but is still invisible to the Chilean system, like hundreds of thousands of other tradespeople,” Benjamin says. “Meanwhile, people needing home repairs don't know what is wrong, what it costs, who to call, and whether they're being charged fairly.”
His MaestrIA hackathon project solves both sides as a web app that gives ordinary people master-level home repair diagnostics while giving skilled tradespeople a way to demonstrate expertise.
With MaestrIA, users photograph their problem, describe it in voice or text, and share their location. Claude streams its reasoning in real time with animated bounding boxes over the photos, then delivers structured diagnoses: what's broken, material, severity 1–5, project budget and time estimate. The agent then renders a map of nearby maestros filtered by trade while a second agent drafts a WhatsApp message to send.
MaestrIA’s technical heart is a JSON file, injected into every diagnosis, that contains 17 diagnostic rules, 7 native Chilote woods, 16 terms of local trade dialect, 19 benchmark prices, and 9 common mistakes of the craft all distilled from hours of interviews Benjamin did with his father. Without touching the system prompt, that single file lifted his eval seven points (74% to 81% against a human master's judgment) and is how MaestrIA can diagnose "rising damp on alerce wood siding" instead of generic "wood damage."
With no prior programming experience, Benjamin says his role was site foreman overseeing Claude’s technical execution. “Before writing any feature, I asked Claude Code to design the specs, the staged action plan, and the security model: input sanitization against prompt injection, rate limiting, origin validation, and Zod schemas as the single source of truth,” he says. “Then I reviewed each feature diff by diff.”
Benjamin wants MaestrIA to grow into new builds, hardware-store integration, formal budgets, contracts, reviews, and a certification system. Eventually, each trade will have its own Maestro Mayor encoded inside, including carpenters, architects, plumbers, electricians, and masons.
His prize credits go toward developing the app, digitizing his father's company as a live pilot, and his own technical growth. “Claude Code lets a 20-year-old from Chiloé with no programming experience build software that his own dad can use and that can help 280,000 more maestros like him in Chile,” he says. “And it opens the door for millions of people who've always had valuable ideas but no way to bring them to life."
“The single most important thing I did was build an auditable 9-dimension eval against 12 real cases with ground truth recorded by my dad,” Benjamin says. “That eval, not my intuition, told me what was working and what wasn't. If I did another hackathon, the eval would be the first commit.”
Best Use of Claude Managed Agents: ARIA, Idriss Benguezzou & Adam Hnaien
Most factories have that one veteran technician who can tell when a machine is about to break, just by the sound it makes. The Best Use of Claude Managed Agents prize-winning project, ARIA (Adaptive Runtime Intelligence) turns an experienced maintenance engineer’s instincts into an affordable, fast-to-set-up AI system that continuously watches factory machines and generates custom diagnostics and repair plans the moment trouble appears.
With ARIA, a maintenance engineer uploads a manufacturer's PDF, answers four plain-language calibration questions, and within 15 minutes the plant is profiled. From there, five agents watch live signals. If an agent detects a failure or predicts one is imminent, it produces a work order analyzing component, failure mode, urgency, parts, and intervention window
The project’s builders, both of whom have on-the-floor industrial experience, met in the hackathon’s teammate-finding Discord channel. Idriss Benguezzou, a French industrial-software engineer with a Master's in data/AI, had been mapping out the idea and most of its architecture for a while. Adam Hnaien, a self-taught engineering student experienced with Claude Code and multi-agent workflows, immediately recognized ARIA as a valuable solution for industrial maintenance.
Idriss and Adam spent all of the hackathon’s second day in planning mode with a GitHub Project board, scoping every milestone, issue, and acceptance criterion before writing the first line of code. “We wanted to go in at 200% from M2 onward,” Adam says. “One day of planning let us spend the rest of the week executing, not improvising.”
Both estimate that Claude Code wrote ~80% of the raw lines while they made domain logic and design decisions by hand. Idriss handled threshold evaluation, KB schema, and anomaly detection because, he says, “you can't prompt your way to knowing what a maintenance technician actually looks at." Adam took on UX, visual language, and ARIA’s constellation concept because, he says, “you can't prompt your way to taste.”
Managed Agents handled agent infrastructure. “Without Claude Managed Agents, we'd have spent the week building infrastructure that Anthropic already hosts: a sandboxed Python environment, secure execution, session persistence, MCP dispatching,” Adam says. “Instead, we spent that week building the product around that infrastructure. That's the difference between shipping ARIA in five days and shipping ARIA in five weeks.”
After the hackathon’s results were announced, companies working on exactly this problem reached out about the project. Idriss will fold ARIA's agent architecture, KB schema, and signal pipeline into his own industrial IoT platform; his credits will go toward more building and experimentation. As for Adam, his plan is to continue exploring opportunities in industrial agentic AI and use the API credits to continue building and experimenting.
Advice to other builders: Let Claude audit. Ask Claude to find if there’s anything wrong with what you've already built before building the next thing, says Idriss. “That loop is underrated.”
Learn about our Claude Community programs, including meetups, hackathons, and more.
认识一下 "Built with Opus 4.7" Claude Code 黑客松的获奖者们
原文:Meet the winners of the Built with Opus 4.7 Claude Code hackathon
*从医学训练、电子维修到编程教育、工厂维护,来看看我们最近这届线上黑客松的获奖者们都做出了什么。*
上周,我们举办了 Claude Build Day——这是我们最近的一场黑客松,构建者们齐聚旧金山,用 Claude Opus 4.8 把自己的想法变成现实。
在等着看他们这次做出什么的同时,我们和上一届 "Built with Opus 4.7" 黑客松的获奖者们聊了聊他们的项目。他们的方向涵盖医学训练、电子维修、计算机科学教育、互动游戏、家庭维修和工厂维护。
恭喜各位获奖者,也恭喜所有参赛者!希望他们的点子能给你一些启发。
第一名:Medkit,Bedirhan Keskin
Bedirhan Keskin 是一位常驻伊斯坦布尔、从医生转行做软件工程的开发者,他用 Claude Managed Agents 做出了 Medkit:一款面向住院医师和年轻医生的学习工具,在一个游戏化的虚拟诊所里模拟真实的接诊场景。
「当你独自一人在急诊科,外面还有 50 个病人在排队,这时你意识到有些病例你在医学院根本没练习过,结果只能在真实的病人身上实时练手。」Bedirhan 说。
用 Medkit,医学生可以练习诊断和治疗模拟病人:采集病史、开化验单、读影像、做诊断、开处方。整个接诊结束后,一个 agent 化的评分系统会对照公开的临床指南来评估整场接诊——和资格考的考官用的是同一套标准。
Bedirhan 用四个独立的 Claude Code 会话来构建 Medkit(语音引擎、内容生成、3D 游戏层和核心 app 各一个),让每个会话的上下文保持干净,同时齐头并进。他奉行「能说就别打字」的做法,几乎全程靠语音操作。
Medkit 已经有了势头:三所医学院和一家制药公司(都在伊斯坦布尔)将在未来几周内开始试点。
给其他构建者的建议: 把 Claude 当成思考伙伴,而不只是写代码的 agent。
Bedirhan 一开始的本能是自己搭建语音引擎,但 Claude 建议他用云服务商以加快进度。「我最看重 Claude 的一点是,它不只是个代码生成器,更是一个思考伙伴,帮我看到那些我本来会错过的选项。」他说。
第二名:Wrench Board,Alexis Chapellier
来自法国 Reignier-Ésery 的 Alexis Chapellier 做了多年电子维修,之后创办了 RepairMind——一个面向维修店的 AI 管理平台。他这次的 Opus 4.7 黑客松项目 Wrench Board,专门帮独立技师搞定复杂的维修。用户把电路原理图和 boardview(电路板视图)丢进去,再描述故障现象,agent 就会生成一张统一的电路图、在上面推理、指出该用探针测哪个焊盘、读取测量值,并不断更新假设,直到诊断出问题所在。
Alexis 先在 Claude Design 里做出了 Wrench Board 的原型,把这个 app 的职责拆开(设计、原理图导入、boardview、诊断 agent),先为每一块写规格,再为每一块写计划。然后他在 Claude Code 的多 agent 模式里执行,每一步都做基准测试——调试时同时跑五六个 agent,每个领域配一个专属 agent。
Alexis 押下的大注,是赌 Opus 4.7 新增的看懂可视化电路图的能力。他说,当他让模型在一块主板上追踪一条供电路径时,他就知道这条路走通了。
「我看着 Wrench Board 的 boardview 一步步亮起来,箭头一个个冒出来,元件被一个个指出来,名称浮现出来。那一刻,我明白这个想法站得住脚。」他说。
Wrench Board 的下一阶段,是建立一个社区:既有愿意试用这个 app 的电子维修者,也有能用自己的现场经验来丰富这个工具的专家。他的 Claude credits 会投入 RepairMind、最早那批用户,以及目前还在推进的全部基础设施。
「这次黑客松证明了,一个从维修店里走出来的自学者,可以在五天里做出一个有野心的系统。」Alexis 说——他参赛时正在四处申请「能糊口的活儿」。「Claude Code 会放大每一个有想法、又有韧劲去执行的人,不管他从哪儿起步。」
给其他构建者的建议: 头脑风暴要往深里走,并且要敢于反驳模型。
Alexis 用了 Superpowers——一个集成进 Claude 的 skills 框架,把「先头脑风暴、再做计划」这套步骤结构化,有时他会并行跑多个头脑风暴,在不同方向上同时推进。他从 Claude Design 起步,再用内置按钮把项目直接共享给 Claude Code,完成交接;而当模型说「不行」时,他会去推它一把。
「黑客松期间,Claude 好几次告诉我,这个那个在有限时间里做不完。可实际上,我时间多得很。」他说。「你得知道怎么跟它说:*我就是要试试看。*」
第三名:Maieutic,Paula Vásquez-Henríquez
Paula Vásquez-Henríquez 在智利康塞普西翁的 Universidad del Desarrollo 教计算机科学。她说,过去两年里,她看到越来越多的学生能通过考试,却看不懂自己写的代码。
「现在的学生用 AI 来批量生产代码,可他们根本不知道这些代码是干什么的。」她说。「他们从来没学会怎么把一个问题表述清楚、怎么在写代码前先打草稿,也没学会怎么批判性地读自己的代码、发现它在哪里偏离了本来的意图。自动补全在他们连问题都还没想清楚之前,就已经把能跑的代码递了过来——于是那个元认知循环,那个真正能造就一个程序员的『思考自己的思考』,就永远闭不上了。他们毕业时能生成代码,却没法对代码进行推理。」
Paula 目前在攻读人工智能方向的博士,研究学生与 AI 的互动模式。她参加这次黑客松,是想从学生和教师两个角度同时解决这个问题。
Maieutic 是一个 IDE,专门设计来让学生在关键时刻慢下来。学生必须先用大白话描述清楚自己的程序应该做什么,然后才能写代码;Claude 会有针对性地追问,把编辑器锁住,直到这份规格详细到一个合格的程序员不用猜也能照着实现为止。
之后学生可以开始写 Python,但自动补全是关掉的;一个聊天面板会直接回答查阅类的问题,但面对推理类的问题,它给的是反问而不是答案——它拒绝替学生思考。
这个工具的核心是 Intent-Diff Review(意图—代码差异审查):让 Claude 把规格和最终代码做对比,把每一处偏差归类为「漂移」「修订」或「bug」,然后抛出一个中立、不带指责意味的问题,引导学生自己来解释这个问题。
对教师而言,有一个实时仪表盘,每个学生占一行,配一句对其认知状态的总结(比如「规格写了三遍,还是没考虑空输入」)。老师可以点开某个学生,查看他和 Claude 的具体互动;Claude 还会分析整个班级,找出并呈现全班共有的误解,好让老师把这个缺口补上。
黑客松结束后,休斯顿大学的研究者主动联系她,想合写一篇论文;Paula 正把奖励的 credits 投入到工具的进一步开发上。她说,黑客松这一周让她看清:从理解一个问题,到做出一个能解决它的工具,这中间的距离已经坍缩了。
「我是智利康塞普西翁的一名教育工作者,不是硅谷的人。」她说。「我能在一周里做出一个能用的全栈产品,是因为这些工具让我可以待在我真正擅长的角色里,剩下的交给它们。最接近真实问题的人,现在可以直接为这些问题动手构建了。」
给其他构建者的建议: 先想清楚,再动手。
这个项目就是 Paula 在 dogfooding(自己用自己的产品)自己的理念:先写规格,再构建。「Maieutic 之所以存在,是因为学生总是一上来就写代码;而我能把它做好的唯一办法,恰恰是拒绝自己也这么干。」她说。她拿出整整两天做纯粹的思考工作,先写出设计规格和技术规格,然后才写下第一行代码。
「那两天写规格,当时感觉很慢——黑客松里有一种实实在在的压力,逼你马上开始出活儿——但正是这两天,让接下来的一周跑得飞快。」她说。
最具创意地使用 Opus 4.7 奖:Virtual Puppet Theater,Rene Hangstrup Møller
全栈开发者 Rene Hangstrup Møller 对 Opus 4.7 的空间推理能力很感兴趣,于是做了 Virtual Puppet Theater——一个基于浏览器的 app,把摄像头画面和语音变成一场动态的互动木偶剧。一只实时动画木偶会模仿用户的动作,另一只由 AI 驱动的搭档木偶则会和用户斗嘴;说出的指令可以即时改变场景、变出 3D 道具。
Rene 在整条流水线上都用了 Claude:概念讨论、规划、写代码;而方向把控、架构、评审和决策则由他自己负责。这个 app 基于 Bun、Vite 和 TypeScript,用 MediaPipe 做手部追踪(跑在 WASM 里),用 Three.js 以 60 fps 在 3D 中渲染木偶舞台。一个小型 WebSocket 服务器通过 Anthropic SDK 连到 Claude Opus 4.7,来驱动 AI 木偶的对白、即时生成 3D 道具;语音方面,输入用 Web Speech API,输出用 ElevenLabs(浏览器自带的语音合成作为后备)。视觉输出则交给 Opus 的空间推理能力,并通过一套基于截图的反馈循环来打磨。
Virtual Puppet Theater 除了开放式的玩耍之外没有别的目标,Rene 说他对这个获奖项目也没有任何产品化的打算。对他来说,这件事关乎的是学习和乐趣。「我让我最小的儿子试了试,他玩得不亦乐乎。」Rene 说。「看着他跟木偶互动、描述场景、对回应咯咯直笑,这就是我唯一需要的用户验证了。」
他补充说,Virtual Puppet Theater 的源代码已经在 GitHub 上以 MIT 许可证开源,「如果有人想把它往前推一推的话。」
给其他构建者的建议: 如果你要参加黑客松,记得留出时间来做演示视频。
「做一个 3 分钟的视频,花的时间远比你想的要长。」Rene 说。他提到自己当时是赶在黑客松截止前,用 Claude 和 Hyperframes 来制作和剪辑 Virtual Puppet Theater 的视频的。「黑客松的 Discord 里很多人都提醒过这一点,他们说得没错。下次我会把最后一整天专门留出来做演示视频。」
*Virtual Puppet Theater 的 GitHub 仓库*
"Keep Thinking" 奖:MaestrIA,Benjamin Torralbo
Benjamin Torralbo 从小跟着父亲 Juan Rodrigo Torralbo 当学徒——他父亲是智利奇洛埃岛一位持证的 *Maestro Mayor*(大师级木匠)。「我父亲有 30 年的手艺,修复过列入 UNESCO 名录的教堂,可在智利的体制里,他至今仍然是隐形的——和成千上万其他手艺人一样。」Benjamin 说。「与此同时,需要修房子的人也搞不清楚到底哪儿出了问题、要花多少钱、该找谁、自己有没有被宰。」
他的黑客松项目 MaestrIA 是一个 web app,把这两头的问题一起解决:给普通人提供大师水准的家庭维修诊断,同时给有技术的手艺人一个证明自己专业能力的渠道。
用 MaestrIA,用户拍下自己遇到的问题,用语音或文字描述,再共享自己的位置。Claude 会实时把推理过程流式呈现出来,在照片上叠加动态的边界框,然后给出结构化的诊断:哪里坏了、是什么材料、严重程度 1–5 级、项目预算和工期估算。接着 agent 会渲染一张地图,按工种筛选出附近的 maestros(师傅),同时另一个 agent 会起草一条可以发出去的 WhatsApp 消息。
MaestrIA 的技术核心是一个 JSON 文件,会被注入到每一次诊断里。这个文件包含 17 条诊断规则、7 种奇洛埃本土木材、16 个当地行话术语、19 个基准价格,以及这门手艺的 9 个常见错误——全都是 Benjamin 从对父亲的多个小时访谈里提炼出来的。在完全没动系统提示词的情况下,单单这一个文件就把他的 eval(评测得分)提升了 7 个点(从 74% 升到 81%,对照的是一位人类大师的判断),也正是它让 MaestrIA 能诊断出「alerce 木(智利柏)墙板上的上升潮气(墙体毛细返潮)」,而不是笼统的「木材受损」。
Benjamin 此前没有任何编程经验,他说自己的角色是工地工头,监督 Claude 的技术执行。「在写任何一个功能之前,我都会让 Claude Code 先设计好规格、分阶段的行动计划和安全模型:针对 prompt 注入的输入消毒、限流、来源校验,以及把 Zod schema 作为唯一可信源。」他说。「然后我再一个 diff 一个 diff 地评审每个功能。」
Benjamin 希望 MaestrIA 能成长起来,扩展到新建工程、五金店对接、正式报价、合同、评价,以及一套认证体系。最终,每个工种里都会编码进它自己的 Maestro Mayor,包括木匠、建筑师、水管工、电工和泥瓦匠。
他的奖励 credits 将用于开发这个 app、把父亲的公司数字化作为一个真实的试点,以及他自己的技术成长。「Claude Code 让一个来自奇洛埃、没有编程经验的 20 岁年轻人,能做出他自己的父亲也能用的软件,还能帮到智利另外 28 万个像他父亲一样的 maestros。」他说。「它也为千百万一直怀揣着宝贵想法、却苦于没有办法把它们落地的人打开了一扇门。」
给其他构建者的建议: 先做 eval,再做功能。
「我做过的最重要的一件事,就是建了一个可审计的 9 维 eval,对照 12 个真实案例,而这些案例的标准答案是我父亲亲自记录的。」Benjamin 说。「是这个 eval、而不是我的直觉,告诉我什么管用、什么不管用。如果我再参加一次黑客松,eval 会是我的第一个 commit。」
最佳 Claude Managed Agents 使用奖:ARIA,Idriss Benguezzou 和 Adam Hnaien
大多数工厂里都有那么一位资深技师,光听机器发出的声音,就能判断它是不是快坏了。最佳 Claude Managed Agents 使用奖的获奖项目 ARIA(Adaptive Runtime Intelligence,自适应运行时智能),就是把一位经验丰富的维护工程师的直觉,变成一套既便宜、又能快速部署的 AI 系统:它持续监视工厂里的机器,一旦出现麻烦的苗头,就立刻生成定制的诊断和维修方案。
用 ARIA,维护工程师上传一份厂商的 PDF,回答四个大白话的校准问题,15 分钟内整个工厂就被建好了画像。从那以后,五个 agent 会盯着实时信号。一旦某个 agent 检测到故障、或预测到故障即将发生,它就会生成一张工单,分析故障部件、失效模式、紧急程度、所需零件和干预时间窗。
这个项目的两位构建者都有一线工业经验,他们是在黑客松的找队友 Discord 频道里认识的。Idriss Benguezzou 是一位法国工业软件工程师,有数据/AI 方向的硕士学位,他已经把这个点子和它的大部分架构琢磨了好一阵子。Adam Hnaien 是一位自学成才的工程专业学生,熟悉 Claude Code 和多 agent 工作流,他一眼就认出 ARIA 对工业维护来说是个有价值的方案。
Idriss 和 Adam 把黑客松的第二天整天都用在了规划上:他们用一块 GitHub Project 看板,在写下第一行代码之前,把每一个里程碑、每一个 issue、每一条验收标准都梳理清楚。「我们想从 M2 起就 200% 全力投入。」Adam 说。「一天的规划,换来了整周都在执行、而不是临场瞎凑。」
两人都估计,原始代码里约 80% 是 Claude Code 写的,而领域逻辑和设计决策则由他们亲手把控。Idriss 负责阈值评估、KB schema(知识库结构)和异常检测,因为他说「光靠提示词,你问不出一位维护技师真正在看的是什么」。Adam 接手了 UX、视觉语言,以及 ARIA 的「星座」概念,因为他说「光靠提示词,你堆不出品味」。
Managed Agents 则负责处理 agent 的基础设施。「要是没有 Claude Managed Agents,我们这一周本来得花在搭建那些 Anthropic 已经替我们托管好的基础设施上:沙箱化的 Python 环境、安全执行、会话持久化、MCP 调度。」Adam 说。「而现在,这一周我们用来围绕这套基础设施去打磨产品本身。这就是『五天交付 ARIA』和『五周交付 ARIA』之间的差别。」
黑客松结果公布后,正在攻克这一模一样问题的公司主动找上门来。Idriss 会把 ARIA 的 agent 架构、KB schema 和信号流水线整合进他自己的工业 IoT 平台,他的 credits 会用于更多的构建和实验。至于 Adam,他的计划是继续探索工业 agent 化 AI 领域的机会,并用 API credits 继续构建和实验。
给其他构建者的建议: 让 Claude 来做审计。Idriss 说,在动手做下一件东西之前,先让 Claude 检查一下你已经做出来的东西有没有什么毛病。「这个循环被严重低估了。」
*了解**我们的 Claude 社区项目,包括线下聚会、黑客松等等。*