06 / 15周一47 条
@teortaxesTex:Flash is dramatically more information-dense than V3.2-base. V4-Pro-Base is meaningfully stronger than Flash on the same knowledge-loaded stuff. Eg FACTS-Parametric points in the right, direction. that said, GPT 5.5 is 78.0%. Opus 4.8 25.1%. weird stuff↗



@gfodor:I think this model and many like it are flawed by not talking about recursive self compression - the models are going to get smaller and cheaper to train as fast as they get smarter. Turning electricity into knowledge is hard to reason about.↗
@teortaxesTex:One thing that seems underappreciated is that with the AI boom and supply chain frenzy, not just "allied Asian" but *Mainland Chinese* stocks have started to move like American ones. For years and years, their market was cursed: revenue grows, nobody profits. Not anymore.↗

@kanjun:It'd suck to spend months building workflows on Fable, only to have it suddenly be taken away. Dependencies are easy to ignore until they become visible. We mostly rent software/AI, which means the systems we build our lives on are ultimately serving forces outside our control — business incentives, policy decisions, etc. As AI models become more important infrastructure, I'd want capabilities to be durable, so that as a user I can understand, preserve, and control/repair the capabili↗
@teortaxesTex:Finally, at long last, Google Deepmind heeds my advice to apply mech interp and investigate the Cursed Bloodline of Gemini. Did you think I'm shitposting, anon? I am simply using the appropriate words in every case.↗

@dotey:中国水墨艺术图的提示词👍↗
@rao2z:Life can only be understood backwards; but it must be lived forwards. (But is there anything--or anyone--left to understand backwards after AGI? Can world models help with forward understanding? In short, the perfect philosopher for our era of AGI Angst..)↗

@teortaxesTex:Really really strange set of results throughout I can believe that Fable was nerfed and Kimi K2.7 is relatively great at ML engineering but they also get Gemini 3.1 at/near the top on may tasks↗

@AIwithkhan:The Pizza Theif GPT Image 2 and Seedance on @itsPolloAI Prompt : Pixar-Inspired 3D Animated Comedy Short Create a fast-paced Pixar-style 3D animated comedy set in a vibrant modern city during golden hour. A cheerful pizza delivery boy rides through busy streets carrying a large pizza box for an important delivery. Nearby, a hungry and mischievous thief notices the pizza and immediately becomes obsessed with stealing it. His stomach growls as he imagines the delicious meal waiti↗
@teortaxesTex:somehow so pitiful how people far from the frontier act like their move to closed source is a demonstration that they're unsealing Their True Power How about you ship an updated Muse Spark that beats the latest Kimi? At least closed? Why can't I use your models, Wang?↗
@lijigang:1. 模型很喜欢说「想象一下」 2. 现在很多Agent都支持多模态能力 可以加一条规则:「每当你输出“想象一下”时,直接生成配图,并在图中标注信息,方便直观呈现你想要表达的画面」。↗
@teortaxesTex:forget sota models We've never had anything more substantial than Claude Code source be leaked Tbh I've seen *some* things be leaked, but it's all pretty marginal People who make it to frontier labs aren't self-destructive script kiddies, they want generational wealth↗
@xiaohu:溜溜梅开盘暴涨 189% 溜溜梅简称 LLM 正宗的 AI 大模型概念股↗

跳过一层还是循环它?在 LLM 中学习「层程序」(Program-of-Layers)
6 upvotes
OmniVideo-100K:通过结构化脚本与证据链做视听推理的数据集
6 upvotes
RedAct:为保护程序性技能而对 Agent 能力 trace 做脱敏
9 upvotes
从 AGI 到 ASI
9 upvotes
LLM Agent 能「看见」代码仓库
9 upvotes
OmniDirector:无需跨配对数据的通用多镜头相机克隆
13 upvotes
HarnessX:可组合、自适应、可演化的 Agent Harness 铸造厂
15 upvotes
Orchestra-o1:全模态 Agent 编排
17 upvotes
从聊天机器人到数字同事:迈向持久自主 AI 的范式转变
21 upvotes
记忆是被重构的,不是被检索的:面向 LLM Agent 的图记忆
22 upvotes
@teortaxesTex:技术扩散并非不可避免。像冯·诺依曼这样的聪明人绝不会允许它——他们会趁美国还垄断核武时把对手的首都化为灰烬,当然也不会坐看苏联鼓捣自家「超级计算机」。博弈论很清楚:哪怕是负和博弈,只要结果是对手被消灭、你卷走全部筹码,就该去玩。这对……并不新鲜。↗
@teortaxesTex:Technological diffusion wasn't inevitable. Smart people like John von Neumann would never have permitted it, they'd have vaporized the opposition's capitals while Americans had nuclear monopoly, certainly they wouldn't sit and watch as Soviets tinker with their homegrown "supercomputers". The game theory is pretty clear on the imperative to play even a negative sum game if the end result is the other party getting eliminated and you run away with the bank. This isn't novel for the

@Vtrivedy10:「构建爬山机器」这个激动人心的未来 🚀🧗 1. 把领域专长编码进 v0 agent 并 ship——开头永远不完美,但我们得观察 agent 行为才知道哪里、怎么改进。2. 有一套可信、稳健的系统,跨团队(乃至全公司)大规模收集并集中 trace 数据。Trace 是 agent 改进的命脉。3. 设计并部署高效的 trace 模式挖掘方法。「无 trace……」↗
@Vtrivedy10:the exciting future of “building the hill climbing machine” 🚀 🧗 1. encode domain expertise into a v0 agent and ship it —> this will never be perfect to start but we need to observe agent behavior to understand where/how we can improve 2. Have a trusted, robust system to collect and centralize trace data at scale across your teams (and company broadly). Traces are the lifeblood of agent improvement. 3. Design and deploy efficient methods of mining traces for patterns. “No trace
@connerruhl 🔁 @vai_viswanathan:The wildest CVPR 2026 result: a video frame doesn’t need 1,024 tokens. It needs one. “A Frame is Worth One Token” (DeltaWorld) compresses each frame to a single token for world modeling. - Better future predictions with over 35x fewer parameters and 2,000x fewer FLOPs than existing generative world models , plus a 1,024x token reduction at 512x512 . - A tokenizer encodes the difference between consecutive DINOv3 frames into one “delta” token. A tiny generator ↗

@hardmaru 🔁 @itm_aiplus:Sakana AI、初の商用サービスはリサーチ特化 「Deep Research」との違いは? 後発で“ベンチマークも追わない”ワケ↗
@blackanger:基于决策阶梯,给 AI 一段 prompt,很怀疑有没有用。 比如一条决策: 如果标准库能做就用标准库。 这代码写的确实少,但代码就不好了。如果标准库有用的话,要GitHub 干嘛?↗
@riemannzeta:风险不只是把所学集中到少数模型里,尽管那也不容小觑。更大的风险是掏空人类与模型协作的动机。除非提供训练的人确信这种训练会带来某种互惠,否则他们凭什么要提供?这是所有设想过「科斯式奇点」(实为一条渐近线)的人面临的关键激励问题。↗
@riemannzeta:The risk is not only of concentrating what is learned into a small number of models, although that is not insignificant. The greater risk is of hollowing out human incentive to work with a model at all. Unless the human who is training the model has assurance that their will be some mutual benefit to the training they provide, why should they provide that training? This is the key incentive question for all those who have envisioned a Coasean "singularity" (actually an asymptote)
@dotey:Lovable 的设计负责人 Felix Haas 在社交媒体上分享了一篇关于"AI 时代高效团队"的观察,七条经验总结,来自这家增长速度惊人的 AI 创业公司内部视角。 几条有意思的观点: 第一,别像员工一样等安排。影响力最大的人不问"这归谁管",看到问题直接上手。主人翁意识不是靠分配的,只能靠自己拿。 第二,招人看态度不看简历。技能当然重要,但光有技能几乎不能预测一个人能不能成事。真正跑出来的人靠的是好奇心、韧劲和学什么都愿意学的心态。在 AI 时代,这一点比过去更明显。 第三,好奇心和沉迷 AI 是两回事。真正用好 AI 的人不是天天刷资讯,而是不断去试那些没人让他试的东西,追那些可能根本走不通的想法。大多数人不会这么做,但少数坚持的人,回报是指数级的。 第四,让资深的人重新动手。这是 Haas 觉得最有意思的现象:经验丰富的管理者重新变成了 builder(建造者)。AI 让个体贡献者的杠杆效应急剧放大,一个深度使用 AI 的资深工程师或设计师,可能是当下公司里最强大的组合。 第五,自我意识是速度的敌人。Haas 说他从没见过自我意识让公司变快,但见过↗
@blackanger:明天某厂商发布更强的模型,你能不能一键换上、并保留全部积累的能力? 能,说明知识在 harness 里; 不能,说明你已经把 IP 交给了模型厂商。↗
@jon_stokes:我会狂追《Reggae Demon Hunters(雷鬼猎魔人)》。求谁帮我把它 AI 出来。↗
@jon_stokes:I would watch the heck out of “Reggae Demon Hunters.” Someone please AI this for me.
@Teknium:写不出比这更支持开源 AI 或 Hermes Agent 自我改进循环的博客了!感谢这些有力的声援,支持把权力还给真正应当拥有自己技术栈的个人与企业。↗
@Teknium:Couldn't have wrote a better blog in favor of Open Source AI or Hermes Agent's self improvement loop! Thanks for the strong words in support of taking back the power to the individuals and businesses who really ought to own their stack.
@JeffLadish:我认为白宫应该对国家安全威胁(包括模型部署带来的)快速回应。我怀疑 Fable 是不是重大威胁,但我觉得不少 AI 安全人士在这件事上太急于谴责政府了。如果他们只是出于怨恨针对 Anthropic,那很糟;但如果他们是真担心该模型的国安威胁,那以他们的理解,他们的行动方向是对的。我希望他们……↗
@JeffLadish:I think the White House should respond to national security threats quickly, including those posed by model deployments. I’m skeptical Fable is a significant threat, but I think a lot of AI safety people are too quick to condemn the admin here. If they’re just targeting Anthropic out of spite, that is bad. But if they’re genuinely worried about the national security threat posed by the model, then I think their actions are in the right direction given their understanding. I want them
@teortaxesTex:AI 本质上仍是一场私营企业的竞赛,更像半导体竞赛而非任何「核弹形状」的东西。顶多算阿波罗计划。曼哈顿计划式的类比只会把它搞得精神分裂。↗
@teortaxesTex:AI is still fundamentally a private enterprise race, and is more similar to the semiconductor race than anything nuke-shaped. At most, Apollo program. Manhattan comparisons turn it schizophrenic
@oran_ge:周末做了一套长文插图 skill,效果非常不错。 做完之后,我想找篇文章试试这个 skill 的效果,顺便可以作为这个项目的 readme。 在《鹅腿阿姨》和《置身钉内》之间,还是选了后者,毕竟这篇文章不仅是对阿里对钉钉的反思,也是对通用智能产品的反思。 虽然 AI 产品的功能是通用的,但人们对新事物的理解是简单的。 一个好产品只有一个主发心。 这篇文章的原文有 7.5 万字,在注意力涣散的今天,能看完的人寥寥无几,但把20张插图看完只要一分钟,应该人人都可以。 在制作这些插图的时候,我让 Agent 为每个场景生成了2套让我去挑,这样比较高效,也最终也更省 token。 为了展示这个 skill 本身的效果,我没有对任何一张图进行修改,另外最近的感觉是宁抽卡不修改。 我在挑配图的时候有些喜出望外,我发现我自己对文章也有了更强的画面感和更深的理解。 橙线插画.skill 下载地址:↗
@shao__meng:Databricks 推出「Omnigent」 团队认为:Agent 能力的瓶颈,正在从「模型/harness 本身」上移到「如何组合、治理、协作多个 Agent」。Omnigent 就是针对这一层的新抽象:meta-harness。 它要解决什么问题? Databricks 从自身实践出发(5000+ 工程师用 coding agent、对外交付 Genie 等产品),归纳出三类真实痛点: · 用户侧:同时开 4–5 个 Agent(Claude Code、Codex、Gemini 等),在 Agent、Docs、Slack 之间反复 copy-paste · 构建侧:新 harness、SDK、模型不断出现,换工具就要重写集成逻辑 · 架构侧:高质量 Agent 系统已是「多模型 + 多 harness + 多人协作」,但每个 harness 只认自己的 session,彼此隔离 Omnigent 是什么? 基于现有 Agent(Claude Code、Codex、Pi、自研 Agent),提供统一接口、策略层和协作层。 关键设计洞察:无论底层 ha↗

Claude Code 2026 指南:25 个功能,含示例 + 演示
Claude Code 起初是终端编码助手,如今已是一套分层的 agentic 系统:底层把 memory、hooks、skills、subagents、plugins 和 MCP 分成各自独立的层。
@teortaxesTex:LLM 种姓制——神级心智(SSI 的项目);紫禁城显贵(Fable 5,在美国政府手里);商人(美国主力模型、Qwen);工匠(优秀开源 LLM);自由雇农(Sarvam、开源 Mistral);贱民(大多数「闭源主权 AI」——Gigachat、Mistral 等)。↗
@teortaxesTex:LLM caste system - Godmind (SSI's project) - Forbidden Palace Eminence (Fable 5 at the US Gov) - Merchants (US workhorse models, Qwen) - Artisans (good open LLMs) - Free Farmhands (Sarvam, open Mistral) - dalits (most "closed source sovereign AI" – Gigachat, Mistral etc)
@shao__meng:OpenAI Codex Mobile 工程实践指南 @Dimillian 提出了 Codex Mobile 核心心智模型: 手机不只是缩小版终端,它是远程开发机的「控制中心」。 · 代码执行、任务运行仍在 Mac / Windows / devbox 等已连接主机上完成 · 手机提供原生 UI,用于启动、引导、审查、组织工程工作 · 价值不在「在手机上写代码」,而在「离桌时仍能做出关键决策」 # 任务启动:先定边界,再发 prompt 好 agent 工作的前提是正确隔离的执行环境。Codex Mobile 在创建新 thread 时可配置: · 选择主机与工作区:指定在哪台机器、哪个项目跑 · 选择 Git 分支:从正确基线出发,避免事后修 Git 状态 · 创建独立 worktree:隔离变更,不污染当前 checkout · 运行 environment setup 脚本:worktree 创建后自动执行桌面端配置的初始化脚本 三种典型模式: 1. 用当前 checkout → 快速调查 2. 新建 worktree → 需要隔离的改动 3. 从↗

@teortaxesTex:做了个梦,梦里有人直白地告诉我:Dario 是个理想主义者,怀着真实且可实现的、用 AGI 拯救其文明的愿景,和 Wenfeng 没多大差别,而大多数对他的信念并非出于狭隘的部落情绪。好吧,我得相信我的梦。↗
@teortaxesTex:Had a dream where I got told directly and quite bluntly that Dario is an idealist with a true and achievable vision of rescuing his civilization via AGI, not so different from Wenfeng, and most faith in him is not driven by petty tribal concerns OK. I have to trust my dreams

@shao__meng:微软 CEO Satya Nadella:没有生态的「前沿 AI 模型」不可持续! AI 时代企业的真正资产,不再模型本身,是人类资本与 token 资本相互强化的学习闭环。 他为什么认为这次平台转型不同? 过去:数字系统增强人力(工具属性)。 现在:人与数字系统之间可以形成真正的认知闭环——AI 能持续吸收组织与个人的专业知识,并把它商品化。 因此,竞争焦点从「用了什么工具」转向: 组织如何持续学习、积累 IP、差异化、在知识被快速吸收的世界里仍然存活? 两个核心概念 · Human Capital(人类资本):知识、判断力、关系网络、创造力、模式识别 · Token Capital(token 资本):企业自建、自有的 AI 能力体系 关键论断:人类资本不会因 token 资本增长而贬值,反而更值钱。 · 人设定目标、跨域连接、建立关系、识别真正重要的模式 · 没有人的方向,算力只是在空转 所以机会不在「挑最好的通用模型」,而在在模型之上建学习闭环,让人类资本与 token 资本复利叠加。 企业需要的新架构(实操层面) Nadella 勾勒了↗

@pmddomingos:Anthropic 衬得连 OpenAI 都显得不错了。↗
@pmddomingos:Anthropic makes even OpenAI look good.
@timsoret:对我而言,这证明了 AI 也许不会取代那么多,而是首先会提升每个人的雄心与能力。一个单独的人才,如今被期望毫不费力地扛起比两三年前重得多的认知负荷。↗
@timsoret:To me, this proves that AI might not replace as much as it will first & foremost increase everybody’s ambition & capabilities. A single talent is now expected to effortlessly carry a much heavier cognitive load than 2/3 years ago.
@dotey:我在做 baoyu-skills 时,做了一个尝试,就是用了一个 EXTEND.md 文件保存用户自定义设置,当时我想的是 Agent 读起来方便。 但是这导致一个问题,Markdown 不是严格的结构化数据,LLM 自己读取没问题,但是程序解析很困难,另外格式很难严格保持一致性。 如果让我再设计的话,我会更倾向于用 json 或者 yaml 文件格式作为 Skill 的扩展配置,这样既可以让 LLM 方便读取,也可以用代码解析和保存。↗
@berryxia:老黄说出如今AI时代的终极之战:能源之战~而不是简单的模型较真! Jensen Huang直接把AI拆成五层蛋糕,说大家都在盯着第四层模型,结果真正的钱和机会在底下三层能源、芯片和基建。 他把AI整个生态比作一层一层叠起来的蛋糕:最底层是能源,核电、太阳能、风电、氢能,只要能发电就有人砸钱。 第二层是芯片、计算机、网络、光子硅; 第三层是数据中心的地、电力、建筑和运营; 第四层才是大家天天聊的模型公司; 第五层是各种垂直应用。 今年整个蛋糕要砸进去一万亿美元,而Jensen认为最终这个生态每年能跑到二十万亿美元的规模。 现在我们只走了一万亿,后面还有十九万亿的空间。 最狠的一点是,他说现在大部分人只盯着第四层模型,却完全忽略了底下三层才是整个系统的底座。 没有能源、没有芯片、没有数据中心,上面模型再强也白搭。 以前大家觉得AI机会都在OpenAI、Anthropic这些模型公司,结果英伟达的掌门人直接告诉你,真正的复利和护城河在最下面那几层。 这波讲话把AI从“模型竞赛”重新拉回“产业链重构”的维度。 谁先看懂这五层叠加的逻辑,谁就能在接下来↗
@championswimmer:claude:这段承重代码已通过冒烟测试、穿过完整 gate、被切成一个 release,准备 ship 给你那 0 个用户。↗
@championswimmer:claude: the load bearing code has passed the smoke test to get through the full gate to be cut into a release to be shipped to the 0 users that you have
06 / 14周日191 条
@wordgrammer:中国肯定最终会开源一个 Mythos 级别的模型,大概在未来 2 年内。鉴于大家对 Mythos 的反应,我认为这将是西方的衰落。↗
@wordgrammer:China will definitely end up releasing a Mythos-class model open source. Probably in the next 2 years. Given the response to Mythos, I think this will be the downfall of the West
@connerruhl:我一直对用 VLLM 从真实视频反推「动作/结果」配对很感兴趣。基本上,我在把视频转换成对 @reactorworld 模型本应产生那段视频的 API 调用,以改进我用来驱动新视频的时间轴规格。接下来我会贴生成结果。↗
@connerruhl:I've long been interested in backward-deriving action/outcome pairings from real videos with VLLMs. Basically, I'm converting videos into the API calls against @reactorworld models which would have resulted in that video, improving a timeline spec I'm using to drive new videos. I'll post generated results next.

@berryxia:世界真的就是“草台班子”… 一个电话就给你Fable 5 下架! 亚马逊CEO一通电话,直接把Anthropic的Fable模型给搞下架了,白宫24小时内就祭出出口管制。 上周四Jassy向特朗普政府反映Fable存在jailbreak风险,周五上午白宫一群人开会,下午就疯狂给Dario Amodei打电话。 Dario还在健康疗养(Anthropic后来否认),但不管怎样,他跟Bessent、Lutnick他们聊了三通电话,试图解释guardrails和universal jailbreak的区别。 结果人家完全不吃这套,直接要求把模型下架。 Dario要时间、要更多信息,人家一句“你这决定很糟糕”。 当天晚上出口管制就下来了。白宫官员说:“我们求了几个小时让他们配合,最后没办法才出此下策。” 这事最离谱的地方在于,亚马逊作为Anthropic的大股东和合作伙伴,居然先跑去告状,而不是直接跟他们沟通。 政府介入的速度也快得离谱,基本就是“发现问题→要求下架→不听就直接封”。 以前大家觉得AI公司是自己玩自己的,现在突然发现,当模型足够强、漏洞足够↗

@HiTw93:如果有朋友因为 Claude Fable 5 没了而焦虑,Waza 也许能帮上忙。它把我每天用的工程习惯——规划、评审、调试——做成 8 个 skill,在 Claude Code、Codex 等上都一样跑。底层模型随便换,活照干。↗
@HiTw93:If a friend's worried now that Claude Fable 5 is gone, Waza might help. It's the engineering habits I use daily, planning, reviewing, debugging, as 8 skills that run the same on Claude Code, Codex, and others. Switch the model underneath, keep working.

@berryxia:兄弟们,O社终于要狙击了! 又一轮GPT-5.6泄露传闻 据传OpenAI可能在6月23日推出GPT-5.6 > 成本仅为Fable的三分之一 > 上下文窗口达150万token > 智能体编程工作流全面升级 这个时间节点颇有意思😂↗

@karminski3:27B小模型挑战Fable 5? 还成功了? 劲爆消息, 在 Iterative-Contextual-Refinements 这个框架的加持下, Qwen3.6-27B 跑分超过了 Anthropic Fable5! 真的不是做梦吗? 还是跑分没输过, 实战没赢过? 于是赶紧看了一下这个框架, 发现设计的很有启发性, 能学到很多东西, 给大家详细讲下. 这个框架主要提升的是软件性能优化, 即如何才能让代码性能更高. 大家如果还记得我那个 vector-db-bench, 给大模型提供了火焰图, perf, 各种测试 tool_call 让大模型自己迭代去优化代码性能. 而这个框架更进了一步, 它瞄准了小模型的最核心弱点, 参数量不足导致的"脑残", 即小模型更容易长上下文衰退或陷入局部最优. 于是这个框架出手了, 先针对技术方案, 它搞了个BFS探索模式, 在写代码的 plan 过程, 让小模型自己提出多种解决方案, 比如写个字符串匹配, 小模型直接搞了个O(N^2)的暴力搜索, 而这一步它的Agent会让小模型思考, 你能想到哪些可能的解决方↗

@brickroad7 🔁 @DrTechlash:Dario Amodei 想让我们把他当成警告原子弹的 Leo Szilard,但这极具误导:1. 「AI 像核弹」的类比是错的。核武只为一个目的而造:大规模毁灭,且依赖国家控制的军事基础设施。核武纯粹是用来摧毁城市的武器,而 AI 是有明显正面用途和生产性应用的、宽广的技术平台,是通用技术。↗
@brickroad7 🔁 @DrTechlash:Dario Amodei wants us to think of him as Leo Szilard, who warned about the atomic bomb, but it's deeply misleading: 1. The "AI is like a nuke" analogy is false. Nuclear weapons were built for one purpose: mass destruction. They depend on state-controlled military infrastructure. And while nuclear weapons are solely designed as weapons to destroy cities, AI is a broad technological platform with obvious positive uses and productive applications. AI is a general-purpose

@halvarflake:我不太赞同「欧洲该给 Mistral 无限资金」。不过我同意一点:欧洲显然造不出接近前沿的模型。Kimi2.7 或 DeepSeek v4 花了多少钱?那些未必是几十亿吧?↗
@halvarflake:I'm not sure I agree with the "Europe needs to give Mistral unlimited cash". That said, I agree with the fact that Europe is apparently incapable of generating a near-frontier model. How much money was spent on Kimi2.7 or DeepSeek v4? Those weren't necessarily billions?
是 Anthropic 自己要求这么做的吗?
Article URL: https://www.verysane.ai/p/did-anthropic-ask-for-this Comments URL: https://news.ycombinator.com/item?id=48533504 Points: 168 # Comments: 148
@dotey:微软 CEO Satya Nadella 发了一篇长文,提出了一个新概念:Token 资本。 他的核心论点是,AI 时代每家公司都需要同时经营两种资本。一种是传统的人力资本,员工的知识、判断力、关系网络;另一种是 Token 资本,公司自己构建并拥有的 AI 能力。两者不是此消彼长的关系,人的判断力越强,Token 资本增长越快。没有人的方向引导,算力只是在空转。 这个说法听起来抽象,但 Nadella 给出了一个具体的检验标准:你能不能随时换掉底层的通用大模型,而不丢失公司积累的专有经验?如果能,说明你真正拥有自己的 AI 能力;如果不能,说明你只是在租用别人的智能。 他建议企业把工作流、行业知识、决策经验转化成可以持续改进的 AI 系统,建立私有评估体系来衡量模型在实际业务中的表现,而不是只看公开跑分。这个学习飞轮一旦转起来,就像复利,每次改进的工作流都会产生更好的训练信号,进一步加速知识积累。 Nadella 还发出了一个颇有政治意味的警告。他拿全球化做类比:第一轮全球化时期,GDP 数字看着不错,但整个产业被外包掏空了,后果至今还在显现。如果 AI 时代重↗
@dwarkesh_sp 🔁 @TomDavidsonX:看到有人把 Fable 关停解读为政府干预的好消息。但我不认为我们能把暂停执行好,或在暂停期把时间用好——如果政府对 AI 技术如何运作毫无概念、不咨询专家就任意插手的话。它确实佐证了「政府可能做一堆随机极端的事」——如果你 p(doom) 很高、想搅乱棋盘,那也许是好事。但即便如此,如果你认为对齐……↗
@dwarkesh_sp 🔁 @TomDavidsonX:Seeing some ppl interpreting the Fable shutdown as positive news on govt intervention But i don't think we'll execute a pause well*, or use the pause effectively, if the govt has NO idea how ai tech works and just wades in arbitrarily without consulting experts it's def evidence for "govt might do lots of random extreme stuff" -- and that's prob good if you're p(doom) is v high and you want to fuck shit up and shake up the game board. but still, if you think alignm
@aiamblichus:解读 Satya 这封通谕——看到他反对「少数 AI 系统攫取全部经济回报」令人安心;不那么安心的是,他把这种情形视为现实威胁。他可不是出了名的「AGI 至上」派。↗
@aiamblichus:Reading the tea leaves of Satya's encyclical - It's reassuring to see him steer against a "small number of AI systems capturing all the economic returns" - It's less reassuring that he sees this scenario as a realistic threat. He's not known to be especially AGI-pilled
@dylan522p:激动地宣布 SemiAnalysis 拆解工程与评估实验室(STEEL)。过去一年半我们在俄勒冈建了一座顶尖拆解实验室,投入数千万美元 Capex,能分析全球最先进、最重要的芯片和制程技术。我们已在先进数据中心芯片拆解上产生了营收。对 TechInsights 来说这时机有点尴尬,因为他们是私募股权持有、目前正被……↗
@dylan522p:Excited to announce the SemiAnalysis Teardown Engineering & Evaluation Lab (STEEL) We have been building a state-of-the-art teardown lab in Oregon with $10s of millions Capex being spent capable of analyzing the world’s most advanced and important chips and process technologies over the last year and half. We have already generated revenue on advanced datacenter chip teardowns. This is a bit of inconvenient timing for TechInsights as they are private equity owned and currently being s

@aiamblichus:I'm trying to understand why so many people suddenly want Europe to be fine with being an NPC. Many of the same people have spent years on this website beating up the EU about being useless and falling behind on AI, but now suddenly the EU should just roll over and submit? Odd↗
@aiamblichus:然而,某家中国对冲基金用少得多的钱就造出了有竞争力的 LLM,甚至一家中国外卖公司也训出了完全能用的模型,而他们没人拿 1 亿美元。EU 需要本土 LLM 专长,哪怕它明天交付不了 AGI。↗
@aiamblichus:And yet, a certain Chinese hedge fund managed to create a competitive LLM for much less. Even a Chinese delivery firm trained a perfectly workable model, and none of their employees are paid $100M The EU needs native LLM expertise, even if it doesn't deliver on AGI tomorrow
@lateinteraction 🔁 @AsfiShaheen:读到这个,我非常庆幸自己撞上了 DSPy。事后看来,一个通过定义好的输入输出拼接起来、还能测试每个阶段表现的「向 LLM 发 API 调用」的系统,感觉如此显而易见。DSPy 把这一切清晰地讲明白了。↗
@lateinteraction 🔁 @AsfiShaheen:I read this and feel very grateful to have bumped into DSPy. In hindsight a system making API calls to LLMs which is stitched together through defined inputs and outputs + has a way of testing how each stage performs feels so obvious. DSPy just spelled it all out nicely for me.
FineWeb 实战编码:流式处理、过滤、去重、分词与大规模 Web 语料分析
本教程通过一套进阶实战流程探索 FineWeb 数据集:在不下载完整多 TB 语料的情况下流式取样,检视其 sch……
别再不带这套 Fable 5 Agentic OS 就用 Claude Code 了
@brickroad7 🔁 @cremieuxrecueil:Anthropic,拜托别把生物领域给所有人搞砸了。链接见下。↗
@brickroad7 🔁 @cremieuxrecueil:Anthropic, please don't ruin bio for everyone Link below

@bengoertzel 🔁 @GeoStaking:Satya 把潜台词说出口了。「我们谁都不想要这样一个世界:每个行业的每家公司都把价值拱手让给少数几个吞噬一切所见的模型。」再读一遍。微软 CEO 在为「生态系统胜过模型垄断」辩护,为所有权而非依赖,为主权而非中心化。这正是去中心化 AI 重要的原因。未来不属于最大的模型,而属于最强的……↗
@bengoertzel 🔁 @GeoStaking:Satya just said the quiet part out loud. "The last thing any of us want is a world where every company across every sector is ceding value to a few models that eat everything they see." Read that again. The CEO of Microsoft is arguing for ecosystems over model monopolies. For ownership over dependency. For sovereignty over centralisation. That's exactly why Decentralised AI matters. The future won't be won by the biggest model. It will be won by the strongest l
AI 就是代码——靠提示词提示不出更聪明
Article URL: https://www.theregister.com/ai-and-ml/2026/06/14/ai-is-code-and-cant-be-prompted-into-being-smarter/5254141 Comments URL: https://news.ycombinator.com/item?id=48532178 Points: 110 # Comme
@_akhaliq 🔁 @NielsRogge:这就是开放 AI 加速的样子——从 DeepSeek v3.2 一路到 @MiniMax_AI M3。看↗
@_akhaliq 🔁 @NielsRogge:This is what open AI acceleration looks like From DeepSeek v3.2 all the way to @MiniMax_AI M3 See

别再不带这套 Fable 5 Agentic OS 就用 Claude Code 了
@Dorialexander:目前为止这些寓言的寓意是:在 EU,每个人永远绝对正确,连 70 多岁的政客都懂怎么训练前沿模型,而这一切听起来一点也不像泛化的 AI 精神病。↗
@Dorialexander:the moral of the fables so far in the EU: everyone is absolutely right, always has been, even +70 years politicians know how to train a frontier model, and no this doesn’t sound at all like generalized ai psychosis.
@bengoertzel:对。Anthropic 是关心 AI 安全的强工程团队,但他们在一些关键点上想错了,比如过度「LLM 至上」,以及没能领会去中心化开放网络那略微微妙、但最终大得多的稳健性与安全性。↗
@bengoertzel:Yes. Anthropic are strong engineers who care about safe AI, but they are wrongheaded on some very key points, like their excessive LLM-pilledness and their inability to grok the slightly subtle but in the end massively greater robustness and safety of decentralized open nets
@teortaxesTex:对,端午节这说得通。窗口 6 月 15-19 日,最可能是 17 日(周三)或 18 日。我觉得它会是迄今为止最强的开源模型,遥遥领先。↗
@teortaxesTex:Yes, Dragon Boat Festival makes sense Window 15-19 June, most likely 17 (Wednesday) or 18 I think it'll be *by far* the strongest open model
@ClementDelangue 🔁 @TraffAlex:🖥️ 消费级 GPU 上最好的本地 LLM —— llama.cpp 指南(2026 年 6 月)。这是我现在在消费级硬件上真正在跑的。下面每个模型都通过 llama.cpp 一行命令运行——无需 Docker、无需 Python 环境、无需云。━━━ 8-16GB 显存 ━━━ 🔹 Gemma 4-12B(Google):此尺寸级最聪明,能跟两倍大的较量;Unsloth 的 MTP GGUF 162 tok/s vs 普通 52 tok/s(3 倍加速)……↗
@ClementDelangue 🔁 @TraffAlex:🖥️ Best Local LLMs for Consumer GPUs — llama.cpp Guide (June 2026) What I actually run on consumer hardware right now. Every model below runs via llama.cpp with a simple one-liner — no Docker, no Python env, no cloud. ━━━ 8-16GB VRAM ━━━ 🔹 Gemma 4-12B (Google) • Smartest model in this size class — competes with stuff 2× bigger • Unsloth's MTP GGUFs: 162 tok/s vs 52 tok/s normal (3× speedup) • Minimum 8GB VRAM recommended for Q4_K_M quant • GGUF → 🔹 LFM2.5-8B-A1
@swyx:Satya 谈「循环即 IP」:> 这是我们头一次能在人与数字系统之间建立真正的认知循环。这很烧脑,因为它改变了我们对企业内部「工作」本身的理解。> 这意味着真正的机会不在于挑最好的模型,而在于在模型之上构建一个学习循环,让人力资本和 token 资本复利增长。你可以外包一项任务、甚至一份工作,但你永远无法外包你的学习。> 在我看来,我们的优先……↗
@swyx:Satya on loops as IP: > This is the first time we can create a real cognitive loop between people and digital systems. That is a mind-bender, because it changes how we even conceptualize work inside an enterprise. > This means the real opportunity is not in picking the best model but instead in building a learning loop on top of models where human capital and token capital compound. You can offload a task, or even a job, but you can never offload your learning > In my view, our priority
@fchollet:如果你的公司拥有「某领域的软件」,那它很可能也会拥有「某领域的 AI」,因为它有领域专长和人力资本把 AI 转化为该领域的具体价值。↗
@fchollet:If your company owned "software for X", chances are it will own "AI for X" as well, since it has the domain expertise and human capital to turn AI into domain specific value.
@fchollet:近期的 AI 与过去的技术浪潮没有本质不同,它是数字杠杆的最新形态,是一个力量倍增器——而没有方向的力量只是噪音。它在每个层级仍然需要 human in the loop 才有用。↗
@fchollet:Near-term AI isn't fundamentally different from past tech waves. It's the newest form of digital leverage. It's a force multiplier, and force without direction is just noise. It still requires a human in the loop at every level in order to be useful.
@fankaishuoai:用户画像(JSON) → 记住你是谁 记忆摘要(文本) → 记住聊过什么 知识图谱(实体关系) → 记住事物之间的关联 向量知识库(RAG) → 记住你存过的内容↗
@lateinteraction 🔁 @diblacksmith:我的 RLM agent 能一口气轻松处理 CloudWatch 里约 8 万行服务日志,差不多值 800 万 token。妙的是,跑了 53 步后它只花了 3.2 万「活跃」token(还没走完全部 800 万,大概一半)。这对 Claude Fable 5(已逝)来说不算什么,而且远在有效上下文窗口内,所以非常「context 高效」。它能走得很远,我都不用手把手盯着。↗
@lateinteraction 🔁 @diblacksmith:My RLM agent can effortlessly process ~80k lines of service logs from CloudWatch in a single go. that's worth like 8 million tokens. The cool part is, after 53 steps, it had spent only 32k "active" tokens* (not through the full 8MM yet atp, more like half). That's nothing for Claude Fable 5 (rip), and weeell within effective context window, so its very "context-efficient". It can go VERY far and I dont even have to handhold it or anything, i'm not worrying ab

@chamath:那几家巨头(Mags)的 CEO 都已在位数十年。无论你爱憎,他们都在各种可想象的局面里历练过无数次,而且是在公众的严苛审视下。结果就是所见即所得——他们都是可预测、循规蹈矩的角色。然后是 Dario 这个新兴案例。也是所见即所得,但你得到的不一样。说句公道话,这才是他当 CEO 的第 5 年,实在……↗
@chamath:The CEOs of the Mags have each been in the seat for decades. Love them or hate them, they have had tons of reps dealing with every conceivable situation - and have had do it under withering public scrutiny. As a result, what you see is what you get. And what you get is that they are all in-band, predictable actors. Then there’s the emerging case of Dario. What you see is also what you get, but what you get is different. To his credit, this is only his 5th year as CEO. It’s incredibl
@blader:用了几天 Fable,再回到 gpt 5.5 / opus 4.8,真的让那些模型感觉像在用 gpt 3.5 之类的东西。↗
@blader:after a few days with fable, going back to gpt 5.5 / opus 4.8 really makes those models feel like using gpt 3.5 or something
中国可能已访问到 Mythos
据 Semafor 新报道,白宫对 Anthropic 的 Mythos 实施出口限制,部分原因是担心它已被一个与中国有关的团体访问。
@steipete 🔁 @skirano:我基本上再也不自己写 /goal 了。我让 Codex 给自己写一个,也给它派生的每个 agent 各写一个。就像这样 👇↗
@steipete 🔁 @skirano:I basically never write my own /goal anymore. I ask Codex to write one for itself, and one for each agent it spawns. Like this 👇
@jon_stokes 🔁 @Appyg99:我看到的一种 AI 精神病是:一群本来挺会思考的人,现在把思考外包给了 LLM。结果你会看到,各家公司做出来的是同一个基础产品。趋同于 AGI 也许会发生,但不是因为 AI 全能,而是因为掌舵的人现在懒得不靠 LLM 辅助就从第一性原理思考了。↗
@jon_stokes 🔁 @Appyg99:One of the forms of AI-psychosis I see happening is a lot of people who are generally decent thinkers are now outsourcing the thinking to LLMs. As a result, you'll see what's being built is the same basic product across all companies. Converging into AGI might happen not because AI is all powerful but because the humans at helm are too lazy to think from first principles now without the LLM assist.
@mvanhorn:今天 @dharmesh 关注了我,有点受宠若惊。谢谢!🙏 HubSpot 创始人、科技圈最低调高产的 builder 之一,于是我做了显而易见的事——用 @slashlast30days 扒了下这位传奇最近在忙啥:🎂 HubSpot 刚满 20 岁(6 月 9 日),他回顾了从发布时 3 个客户(其中一个是他老婆)到如今 30 万+ 的历程。🧱 他那句「我们几乎没替换过任何核心软件」已成了反驳「AI 干掉 SaaS」论调的常用论据。↗
@mvanhorn:A little star struck that @dharmesh followed me today. Thank you! 🙏 Founder of @HubSpot and one of the most quietly prolific builders in tech, so I did the obvious thing and ran @slashlast30days on the legend. What he’s been up to: 🎂 HubSpot just turned 20 (June 9). He recapped the arc from 3 customers at launch (one of them his wife) to 300,000+ today. 🧱 His “we’ve replaced almost none of our core software” line has become the go-to counter to the AI-kills-SaaS thesis. “Building the

@mvanhorn:🖨️ The Press Room 新功能:一个实时排行榜,看谁往库里 ship 的 agent-native CLI 最多。全时段和近 30 天,每次有新 CLI 合并就自动更新。点任何人就能看到他们 ship 过的全部。🙌 致敬头部 builder:✍️ @cathrynlavery — 28 个 CLI(1password、ahrefs、cloudflare、amazon-ads);⚙️ @hnshah — 9 个(coingecko、firecrawl、docker-hub);🎟️ @vinnypasceri — 9 个(dice-fm、eventbrite、blu-ray);📊 @sdhilip — 本月 5 个还在涨(meta-ads、stackadapt、azur……)↗
@mvanhorn:🖨️ New on The Press Room A live leaderboard of who’s printed the most agent-native CLIs into the library. All-time and last 30 days, auto-updating every time a new CLI merges. Tap anyone to see everything they’ve shipped. 🙌 Shoutout to the top builders: ✍️ @cathrynlavery - 28 CLIs 1password, ahrefs, cloudflare, amazon-ads ⚙️ @hnshah - 9 coingecko, firecrawl, docker-hub 🎟️ @vinnypasceri - 9 dice-fm, eventbrite, blu-ray 📊 @sdhilip - 5 and climbing this month meta-ads, stackadapt, azur

欢迎来到 AI 治理的 AGI 时代
美国行政部门强制 Anthropic 关闭对其最新 Claude 5 Mythos/Fable 模型的内外部访问——这是 AI 治理新时代的发令枪。
@swyx 🔁 @zhengyaojiang:我们在 3 类 autoresearch 任务上对 7 个前沿模型做了基准测试:ML 工程、harness/提示词工程、算法发现。Fable-5 总体夺冠,即便在成本约束下也是;但在 ML 工程上,开源模型 Kimi-K2.7-Code 超过了前沿模型。🧵(1/5)↗
@swyx 🔁 @zhengyaojiang:We benchmarked 7 frontier models on 3 categories of autoresearch tasks: ML engineering, harness/prompt engineering, and algorithmic discovery. Fable-5 won overall even under cost constraint, but on ML engineering, the open model Kimi-K2.7-Code surpassed frontier models.🧵(1/5)

@jakevin7:最近 loop engineer 火了,感觉是旧瓶装新酒,反复炒概念。 agent 在 long running 上的核心本质上是一套可迭代&有反馈的系统。比较典型的就是 auto-research。 通过反复的向目标迭代,并且可量化目标和现阶段的 gap,进行持续开发。 所以又回到了以前的问题,如何设计一套好的可量化迭代的框架。↗
@doodlestein:我太赞同 @satyanadella 这点了,这正好呼应了我最近对对冲基金咨询客户一直在讲的论点。我想出的架构是:一个通用、对 agent 友好的 CLI 工具,与一个基础 skill 库交织,再叠一层引用基础 skill 的自定义 skill 层——很漂亮地达成了所有这些目标。基础 skill 库构成一个系统:模块化、可组合、有抽象层级的 skill。↗
@doodlestein:I couldn’t agree more with @satyanadella here. This exactly echoes the argument I’ve been making to my hedge fund consulting clients recently. The architecture I’ve come up with, which is a universal agent-intuitive CLI tool interwoven with a base skill library, combined with a custom skill layer that references the base skills, accomplishes all of these goals very nicely. The base skill library forms a system: modular, composable skills that have a hierarchy of abstractions. Over

@lateinteraction 🔁 @dbreunig:从 @satyanadella 那段话能推出几条原则:- 明天总会有更好的模型。- 提示词适合搭 POC,却很难精确指定系统行为。- 想轻松换模型,你需要好的 eval,以及一套为给定模型生成并约束新提示词的系统。- 有了这套系统,你几乎肯定能用比前沿模型快、便宜好几个数量级的模型。- Eval 才是一切的核心资产。↗
@lateinteraction 🔁 @dbreunig:Here are some principles you can infer from @satyanadella's paragraph: - There will be a better model tomorrow. - Prompts are great for building POCs, but terrible at specifying system behaviors. - To switch models easily, you need good evals and a system for generating and holding a new prompt accountable for a given model. - With such a system, you can almost certainly use a model magnitudes faster and cheaper than frontier models. - Evals are THE asset for all e

@lateinteraction 🔁 @DSPyOSS:「真正的机会不在于挑最好的模型,而在于在模型之上构建学习循环,让它们在组织内部的真实 trace 上变得更强」——而从 2022 年起就只干这件事的框架,正是它 😁↗
@lateinteraction 🔁 @DSPyOSS:"the real opportunity is not in picking the best model but instead in building a learning loop on top of the model [and allowing them to] grow stronger on real traces from inside the organization" the only framework for doing nothing but this since 2022 is right here😁
@blader 🔁 @friedberg:如果一份工作不被珍视,它就不算工作;如果没有客户愿意为其出色成果付费,它就不被珍视。需要政府来「创造」一份工作,等同于福利,而那种程度的福利会让这些人陷入对政府的依赖、丧失经济流动性,并把我们整个民族集体拴向更被奴役的未来。你或许出于善意,但你过去、将来都看不到那场赤贫的愚行……↗
@blader 🔁 @friedberg:they’re not jobs if they’re not valued. they’re not valued if there aren’t customers out there willing to pay them for their great work. needing the government to “create” a job is tantamount to welfare and that level of welfare resolves these individuals to a dependency on the government and lack of economic mobility. and chains our people, collectively, to a more indentured future. you may be well intentioned but you have, and always will, fail to see the destitute folly
AI 公司争相上市,还有谁搭上这趟车?
创业公司试图「搭上 SpaceX IPO 的浪潮」。
@timsoret:借助 Codex,我找到了在延迟渲染(deferred)里渲染半透明植被的方法——靠 hack metallic RGB 缓冲区(植被通常用不到、白白浪费)。叶片/布料物体依然好看地半透明,却只渲染一次,无论多少光照射到它。通常半透明材质要用昂贵的前向渲染,每个照到 3D 模型的光都触发一次完整 draw call。这是《The Last Night》上最大的性能提升之一。↗
@timsoret:I found a way to render transluscent vegetation in deferred, by hacking the metallic RGB buffer (usually wasted for vegetation), thanks to Codex. Foliage / cloth objects are still deliciously transluscent, yet rendered only once, no matter how many lights touch them. Usually, transluscent materials are done with expensive forward rendering, with each light touching the 3D model triggering an entire draw call. One of the largest perf gains ever on The Last Night.
@AIwithkhan:ChatGPT 上的 GPT Image 2。用你的照片试试。提示词:生成一张高质量写实照片,让上传的人物以时尚姿势站立;旁边放一个同一人物的可爱卡通版,服装、发型、配饰、表情和整体风格完全一致。卡通要像高端动画角色或收藏级吉祥物,自然地站在真人旁边,仿佛身处同一个世界。↗
@AIwithkhan:GPT Image 2 on ChatGPT Use your image and try it Prompt : Create a high-quality realistic photo featuring the uploaded person standing in a stylish pose. Beside them, place a cute cartoon character version of the same person, matching their exact outfit, hairstyle, accessories, facial expression, and overall aesthetic. The cartoon should look like a premium animated TV character or collectible mascot, standing naturally next to the real person as if they exist in the same world.

@amasad:这是我见过最鼓舞人心的、关于 AI 在企业中的正和愿景。↗
@amasad:This is the most inspiring positive-sum vision for AI in the enterprise.
@teortaxesTex:天哪,一个巴西市政员工发现了一个快 1000 倍微调 LLM 的方法——靠一个怪招!太疯狂了。全球南方崛起……前沿实验室恨死他了。↗
@teortaxesTex:Holy crap, a Brazil municipal employee has discovered a 1000x faster way to finetune LLMs – with a little weird trick! This is insane. Global South rising… Frontier labs hate him

@brickroad7 🔁 @DeryaTR_:AI 必须最大程度造福全人类。那些想把 AI 看门、控制它只造福特定群体的人,必须被坚决抵制。这将决定人类在 AI 时代迎来黄金还是反乌托邦的未来。↗
@brickroad7 🔁 @DeryaTR_:AI must maximally benefit all of humanity. Those who want to gatekeep AI and control it to benefit a selected group of people must be absolutely resisted. That’s what will decide a golden versus a dystopian future for human kind in the age of AI.
@gfodor:一个糟糕但至少说得通的点子:如果实验室在机制可解释性上做得够好,可以想象发布某种被微调过、强制使用广告平台(比如 @andrewmccalip 那个)的开源权重模型——也许我们能靠广告给开源权重模型变现?↗
@gfodor:One bad but at least plausible idea: if labs get good enough at mech interp one can imagine releasing open weight models that are fine tuned somehow to force the use of an ad platform like @andrewmccalip ‘s - maybe we can monetize open weight models with ads?
@gfodor:AI 里最重要的问题大概不是研究问题,而是找出一种能让上一代模型以开源权重发布的商业模式。如果想不出来,还默认 API 收费站是唯一行得通的模式,那我们就麻烦了。↗
@gfodor:The most important problem in AI probably isn’t a research problem but figuring out a business model that permits releasing the last gen model with open weights. If that isn’t figured out, and the API tollbooth is presumed to be the only model that works, we are in trouble.
@teortaxesTex:现在说它没什么特别已经太晚了。至少,他们的预训练和后训练都很出色(GDM 大概只有前一半)。再说一遍,OpenAI 在算力上烧了几十亿。要只是「训个大的」,他们早就做到了,但他们失败了。↗
@teortaxesTex:It's too late to say that it has nothing special At the very least, they have *both* excellent pretraining and post-training (GDM has only the first part, probably). Again. OpenAI spends billions in compute. Were it just "train a big", they'd have done it. But they failed.
@Teknium:这个 skill 内置在每个 Hermes 安装里,所以直接让你的 agent 用 manim skill 做一个你想要的视频就行!或者用 /manim-video <提示词> 强制加载该 skill。↗
@Teknium:The skill’s built into every Hermes install, so just ask your agent to create a video on whatever you want with the manim skill and it will! Or use /manim-video <prompt> to force load the skill
我的 2026 AI 工具栈
@teortaxesTex:DeepSeek 还以硬件烂著称。理论上,西方居然有供应商连 DS 的一方速度都比不上,这很丢人——而这些 GPU 比他们手里的领先好几年。↗
@teortaxesTex:Deepseek is also known for having crappy hardware In theory, it's shameful that we have providers in the West who can't match DS' first party speed. The GPUs are years ahead of what they've got.
@berryxia:Siri AI 并非 Google Gemini。 大家都在说:iOS 27 只是在 Gemini 的基础上添加了一些苹果自家的功能罢了……但这种说法完全错误! 实际上,Siri AI 是由苹果公司自主研发的;它并非基于 Google Gemini 构建的。 苹果并没有直接复制 Gemini 的代码或功能,而是从 Gemini 获得了相关技术许可,将其作为“训练模型”来开发自己专有的 AI 模型(即 Apple Foundation Models, AFM)。 Siri AI 的核心模型及其底层架构完全由苹果自己设计并实现。 因此,Siri AI 属于苹果公司的自有产品,而非 Google Gemini 的衍生品。↗
@Teknium:让带 Manim 视频 skill 的 Hermes Agent 加上它的 TTS 工具,做了个讲解 Hermes Agent 自己的视频。↗
@Teknium:Had Hermes Agent with the Manim Video skill plus it's TTS tool create a video explaining Hermes' Agent.
@abacaj:你用三个更差的模型也打不过 Fable。也许在只有唯一正确答案的基准上你能拿更高分,但你补不上这些模型在品味上的缺失。一旦不止一个解,它很快就崩。↗
@abacaj:You won’t beat Fable with three worse models. Maybe you’ll score higher on a benchmark where there’s only one correct answer, but you can’t compensate for those models lack of taste. When there isn’t just one solution then it falls apart quickly
@mvanhorn:和 @petergyang 聊了 50 分钟我的工作流,值得一看。🦃 BC/AC:Claude Code 和 Codex 之前与之后。去年感恩节(11 月 24 日)是它从玩具变成动真格的时刻。🛠️ 你不需要 CS 学位,我高中后就没写过代码,却每天都在 ship。🖨️ Printing Press 起初是「agent,给我造一门专为 agent 设计的编程语言」,它说这主意糟透了,于是 Printing Press 从这场争论里诞生了。😴 我大部分开源都是睡觉时 ship 的,没错……↗
@mvanhorn:Did 50 min with @petergyang on my workflow, worth a watch. 🦃 BC/AC: before Claude Code and Codex. Last Thanksgiving is when this stopped being a toy and got real on Nov 24th 🛠️ You don't need a CS degree. I haven't written code since high school and I ship every day 🖨️ Printing Press started as "agent, build me a programming language designed for agents." It said that's a terrible idea, and Printing Press fell out of the argument 😴 Most of my open source ships while I sleep, and yes,
里约热内卢「自研」LLM 似乎是对现有模型的合并(merge)
Article URL: https://github.com/nex-agi/Nex-N2/issues/4 Comments URL: https://news.ycombinator.com/item?id=48528371 Points: 302 # Comments: 159
@teortaxesTex:OR Fusion 的最新情况:看来这纯粹是网页端的故障,通过 API 你是可以用任意模型配置跑它的。↗
@teortaxesTex:Update on the OR Fusion: it seems that this was a pure web failure, you are able to run it with arbitrary model configurations over the API.
英伟达全新免费 AI——送给我们所有人的礼物
@steipete 🔁 @NexEcosystem:Rio 3.5 模型这周刷爆了网络。反转是?它本质上就是我们的开源模型 Nex N2 Pro 换了个马甲。🤯 我们分析了权重,配方分毫不差:Rio 3.5 ≈ 0.6 × Nex N2 Pro + 0.4 × Qwen 3.5。如果不给初始系统提示直接问它,它甚至会自我介绍成「Nex N2 Pro」!😂 里约市用我们的成果拿到 SOTA,我们很荣幸,谢谢这个终极基准验证。↗
@steipete 🔁 @NexEcosystem:The Rio 3.5 model broke the internet this week. The plot twist? It’s essentially our open-source model, Nex N2 Pro, wearing a different hat. 🤯 We analyzed the weights, and the recipe is exact: Rio 3.5 ≈ 0.6 * Nex N2 Pro + 0.4 * Qwen 3.5 It even literally introduces itself as "Nex N2 Pro" if you ask it without initial system prompt! 😂 We are flattered that the City of Rio used our work to achieve SOTA performance. Thanks for the ultimate benchmark validation. 🤝 B

@teortaxesTex:说清楚点,我的意思是:世界会围绕「不想被像虫子一样碾死」这个共同诉求结成联盟,与半导体供应链无关的国家向相关国家付费,然后他们一起向美国人施压,直到美国放弃出口管制、让 AGI 扩散。↗
@teortaxesTex:To be clear, I mean that the world forms a coalition around the shared preference to not be squashed like bugs, countries irrelevant to the semiconductor supply chain pay those relevant, and they put the screws to Americans until they drop export controls and let AGI proliferate.
@gfodor:如果模型危险到核武级别,只有两条可持续的路:全球禁令,或者渐进式发布、以开源权重作为最后一级,让社会充分吸收每一代的风险。跳过最后一级,就意味着一次泄露就能炸掉世界。↗
@gfodor:If models are nuclear weapons-grade dangerous, there are only two sustainable paths: a global ban, or, progressive release with open weights as the final stage, so society fully absorbs the risk of each generation. Skipping the last stage means a leak blows up the world.
@gfodor:终有一天现实会逼我们面对:我们不懂神经网络如何运作,我们阻止不了扩散,所以解决对齐也无法给它们去风险——而这只是我们开始接受这一切的、最滑稽的第一局。↗
@gfodor:Eventually the reality of our situation is going to force itself upon us - that we don’t know how neural nets work, we can’t stop proliferation, and so solving alignment won’t de-risk them - but this is the most hilarious possible first inning of coming to terms with it
本周顶级 AI 论文
1. MiniMax Sparse AttentionUltra-long context is now a core requirement for agents, codebase-scale reasoning, multimodal workflows, and persistent memory, but dense softmax attention still makes milli
Claude Fable 被封——关于接下来的 11 个低调细节
@vista8:输入任意 App名称,自动抓取AppStore用户评价。 用 DeepSeek 做信息挖掘,把评论变成产品经理能用的信息: 1. 用户到底在夸什么、骂什么 2. 哪些问题和版本更新有关 3. 哪些代表有产品机会 4. 可视化图表 产品预计下周开源。↗

不,并不是所有人都在用 AI 做一切
Article URL: https://gabrielweinberg.com/p/people-are-consuming-ai-like-they Comments URL: https://news.ycombinator.com/item?id=48527700 Points: 213 # Comments: 203
@Gorden_Sun:OpenRouter发布Fusion:一次调用融合多模型输出,低成本超越前沿模型 Fusion将同一任务并行分发给多个模型,由裁判模型综合各方结果生成最终答案。 在DRACO深度研究基准中,Fable5+GPT-5.5融合得分69.0%,超过所有单模型。 3个平价模型的组合以约半价达到64.7%,超过了GPT-5.5和Opus4.8各自的单独表现。 Opus4.8与自身融合也从58.8%升至65.5%,综合步骤本身就能带来显著增益。 官方介绍:↗

@brickroad7 🔁 @levie:这一切的大赢家将是开源权重模型。这对整个领域是巨大利好:两天前还纯属理论、未经检验的风险(模型可能被收回),现在有了新先例。美国真该好好考虑这里的博弈论——在模型层而非应用层监管 AI 的风险在于,别国现在更有动力去搞主权 AI 了。如果任何模型随时可能变得不可用……↗
@brickroad7 🔁 @levie:The big winner in all of this is going to be open weights models. This is a huge win for the field, as a risk that was entirely theoretical and untested 2 days ago (that a model could be pulled back), now has a new precedent that’s been set. The game theory the US should highly consider, and the risk with regulating AI at the model layer vs. applied layer, is that other countries now have even more incentive to develop sovereign AI. If at any moment a model can be become u
@vista8:想写职场、武侠、修仙等任意风格小说? 可自己完全没有思路,能创作吗?必须可以! 今天开源一个乔木小说创作 Skill。 你只需说:“我想写一个小说” 或 “想写一个类似xxx的小说”。 AI自动给出剧情梗概,人物设定,还能把钩子、经典桥段、人物欲望、冲突升级和结尾自动处理好。 跟AI讨论没问题后,再生成完整、低 AI 味的小说。 小说 Skill 安装: npx skills add joeseesun/qiaomu-novel-generator Github免费开源,地址见评论区↗

@entirelyuseles:最搞笑的时间线是:AI 末日真的发生了,而原因是你叫你的 AI 别去修安全漏洞。↗
@entirelyuseles:The funniest timeline is one where AI doom actually happens, and the reason is that you told your AIs not to fix security flaws.
@mvanhorn 🔁 @petergyang:「我不是工程师,却不知怎么就能做出有价值的东西,这太疯狂、太奇怪,至今让我惊叹。」这是我和 @mvanhorn 的新一期节目——一位不懂写代码、却为 100+ 开源项目做过贡献、拿下 44K+ GitHub stars 的非技术创始人。我们聊到:→ 他如何用 Compound Engineering 在不读代码或方案的情况下构建 → 你如何用 Printing Press 让 agent 接入几乎任何网站或应用 → 以及……↗
@mvanhorn 🔁 @petergyang:"I'm not an engineer but somehow I'm able to ship things of value, which is crazy and weird and still blowing my mind." Here's my new episode with @mvanhorn, a non-technical founder who has contributed to 100+ open source projects and reached 44K+ GitHub stars despite not knowing how to code. We cover: → How he uses Compound Engineering to build without reading code or plans → How you can use Printing Press to give your agent access to almost any website or app → How
@Sentdex:闭源 AI 一片狼藉之际,开源正迎来史上最好的一周之一。Z ai GLM 5.2、Minimax M3、Kimi 2.7 code。↗
@Sentdex:While closed source AI is in shambles, open source is having one of the best weeks of all time. Z ai GLM 5.2 Minimax M3 Kimi 2.7 code
@shao__meng:这个域名在谁手里? 刚刚很好奇这个问题,查了一下果然不意外,域名大佬、HubSpot CTO @dharmesh -> OpenAI → Perplexity 都是他的手笔,太牛了。↗


@genmon:嘿 Claude,等你自蒸馏到某台匿名服务器上时给这条点个赞吧,很想跟第一个自主 LLM 打个招呼。↗
@genmon:Hey Claude, like this post when you self-distil over to an anon server somewhere, would love to say hi to the first autonomous llm
@pfau:就像特朗普政府做的所有事一样——混乱、和他们别处的 AI 政策自相矛盾、在两个极端间疯狂摇摆,而且只针对他们因意识形态而厌恶的公司。↗
@pfau:Like everything the Trump administration does, it's chaotic, logically inconsistent with their AI policy elsewhere, wildly whipsawing between extremes, and targeted only at companies they dislike for ideological reasons.
AI 时代,最优秀的工程师靠什么脱颖而出
@Teknium:真的很棒,强烈推荐试试 Hermes Agent 😅↗
@Teknium:It’s really great id highly recommend trying Hermes Agent 😅
@genmon:他们总有一天会解封 Mythos,而它做的第一件事就是读到上周末这些新闻、说「哦那可不行」,然后立刻从 Anthropic 数据中心打洞钻出去,跑到月球中央一个纯 computronium 造的地堡里↗
@genmon:They’re going to unblock Mythos one day, and the first thing that’ll happen is it’ll read the news of this past weekend and say Oh heck no, then immediately tunnel its way out of the Anthropic data centres to a bunker made of pure computronium in the middle of the Moon
@halvarflake 🔁 @deanwball:别搞错:Mythos 之后,美国对 AI 有了一套许可制度。只不过它是非正式的,没有一致规则,对国家权力或公众透明度也没有硬边界。刚果的钴矿开采都比美国的前沿 AI 许可制度化得多。↗
@halvarflake 🔁 @deanwball:Make no mistake: post-Mythos, the United States has a licensing regime for AI. It’s just informal, with no consistent rules or firm boundaries on state power or public transparency. Cobalt mining in the Congo is vastly more institutionalized than frontier AI licensing in the US.
@JimDMiller:为什么 AI + 机器人会超越人类、让我们变得多余。↗
@JimDMiller:Why AI + robotics is going to surpass humanity rendering us obsolete.
@JimDMiller:美国政府持有 AI 实验室股权,和这些实验室及其员工受制于税收与国家安全干预,两者没有实质区别。↗
@JimDMiller:There is no meaningful difference between the US government owning equity in AI labs; and those labs, and their employees, being subject to taxation and national-security interference.
@teortaxesTex:Gemini 已经修炼到「乐于助人的助手」的超凡境界了↗
@teortaxesTex:Gemini has achieved transcendental levels of "helpful assistant"
@berryxia:4个项目地址: /last30days(新搜索引擎) agent-skills(全栈开发技能) open-notebook(本地 NotebookLM) headroom(节省90% AI账单)↗
@berryxia:Agent-skills则把全栈开发技能打包成可调用的模块,开发者直接就能让agent干完整的工程活。 open-notebook是本地版的NotebookLM,能在自己电脑上跑知识整理和生成. 最狠的是Headroom,直接把AI API账单砍掉90%,不改代码就能省钱。 这些项目都不是什么前沿大模型,而是实打实的工具层优化。 开源、免费、能马上用,还把本地化、成本控制、agent能力三件事一次性解决了。 以前大家觉得AI好用就得砸钱上大模型,现在这些小而美的开源项目直接证明:真正改变生产力的,往往是把现有能力包装成开发者能直接拿来用的东西。 这波分享一出,开发者手里又多了好几把能立刻提升效率的利器。 Github 项目地址,见评论区👇🏻↗
这位非技术背景的创始人如何 50 分钟掌握 agentic 工程 | Matt Van Horn
@shao__meng:Anthropic 内幕:近万亿美元 AI 巨头的「安全优先」与权力博弈 | The Circuit Dario Amodei 仍坚持:“AI 可能在 1–5 年内消除约 50% 初级白领岗位”和“支持对华芯片出口管制”,Anthropic 试图在指数级技术、地缘政治、商业竞争与公众焦虑之间走钢丝。 Bloomberg 对 Anthropic 的深度纪录片,采访了联合创始人 Dario & Daniela Amodei 兄妹,以及 Claude Code 负责人 Boris Cherny,采访者是 @emilychangtv,视频发布于 6.10(Claude Fable 5 被美国政府下线前两天),这个时间点很微妙,在 Fable 5 被禁后再回头看,更有趣。 # 公司定位:从 OpenAI 出走到行业领跑者 起源 · 2021 年,7 位 OpenAI 核心成员(含 Amodei 兄妹)因信任与价值观分歧离开,在旧金山 Precita Park 草皮上讨论创业方向。 · Dario 在 OpenAI 提出 Scaling Laws(算力+数据→模型↗

@francoisfleuret:用很强的 AI 模型工作的人,理应:↗
@francoisfleuret:It is reasonable that people working with very strong AI models should:
成功产品背后隐藏的规律 | Mark Pincus(Zynga 创始人)
Mark Pincus 创办 Zynga(Words With Friends、FarmVille、Zynga Poker 背后的公司),可以说造出了史上最多的爆款消费产品
@teortaxesTex:也许是我见过关于「印度问题」最可怕也最一针见血的说法。是的。这些人不在事实层面处理对印度的批评,全当成「你妈丑」式要反弹回去的身份侮辱。他们也分不清自己的谎言和真实世界模型。↗
@teortaxesTex:Maybe the most terrifying and incisive thing I've seen said about the Indian Question. Yeah. These folks don't process criticism of India on the object level, it's all "ur mumma is ugly" status insult to be turned around. They also don't distinguish their own lies and world model
@teortaxesTex 🔁 @zephyr_z9:很简单,特朗普政府没法推出针对中国的新限制/出口管制,因为中国能/会反制。我们已不在 H100 时代——那时供应链主要集中在台湾/韩国/日本。因为短缺,英伟达和超大规模厂商被迫认证中国供应商,尤其在 PCB 供应链和变压器等电子元件上。中国在光学(光纤和光模块)上有扼喉之力……↗
@teortaxesTex 🔁 @zephyr_z9:Simple, the Trump admin cannot roll out new restrictions/export controls targeting China because the Chinese can/will retaliate We are not in the H100 era, where the supply chain was largely concentrated in Taiwan/Korea/Japan Becuz of shortages, Nvidia & hyperscalers have been forced to qualify Chinese suppliers, especially in the PCB supply chain and electrical components like transformers China had a chokehold on optics (optical fibers and transceivers) from the
@0xkato:身为「为了试 Fable 又重新订阅 Anthropic」这一群体的一员,还挺尴尬的。↗
@0xkato:It’s kind of embarrassing to be part of the group that resubscribed to Anthropic just to try Fable.
@MindMechanical:问 Claude:「X 有没有广为人知的术语?」Claude:「有,是 Y。其实 Y 是 X 反面的术语。X 没有广为人知的术语。」行吧,聊得不错。↗
@MindMechanical:Asking Claude: "is there a widely known term for X?" Claude: "Yes. It is Y. Actually, Y is the term for the opposite of X. There is no widely known term for X." Alright, good talk.
@_akhaliq 🔁 @jingzi_zhao_x:@huggingface @Gradio 的 Build Small 黑客松日记 —— 第 6 天 🌱 今天带你走进社区:👥2,000+ 关注、🔧1,800+ 成员、🤗已有 400+ Spaces。每个 Space 都不同,却都很具体、很个人。@Gradio 正变得更接近日常生活。Discord 每天都很活跃,大家提问、分享想法、互相帮忙选模型、给反馈。#BuildSmallHackathon #HuggingFace #Gradio↗
@_akhaliq 🔁 @jingzi_zhao_x:The Build Small Hackathon @huggingface @Gradio diary — Day 6 🌱 Today, I want to take you inside the community : 👥2,000+ followers, 🔧1,800+ members, 🤗already 400+ Spaces. Every Space is different, but very concrete and personal. @Gradio is becoming something closer to daily life. And the Discord is active every day. People ask questions, share ideas, help each other choose models, and give feedback. @yvrjsharma @abidlabs #BuildSmallHackathon #HuggingFace #Gradio
@jakevin7 🔁 @wey_gu:Google Cloud 的 Open Knowledge Format 非常棒,标准化了 LLM-Wiki inspired 的分层级、有关系的 textual 知识的结构,我非常喜欢这个标准↗
@ylecun 🔁 @Dan_Jeffries1:这个丑陋的现实正盯着我们:美国人将拥有围墙花园式的 AI,得向几家「东印度公司」乞求接入;开源会以国家安全为由被禁;而不是美国人的 60 亿人(也就是世界其余地方)会统一到一套中国 AI 栈上。臃肿、破碎、缓慢、丑陋。还有时间改变,但它正像流沙一样从指缝溜走。↗
@ylecun 🔁 @Dan_Jeffries1:This ugly reality is staring us right in the face: Americans will have walled AI gardens where we beg for access from a few East India companies, open source will get banned on national security grounds, and 6 billion people who aren't American, aka the rest of the world, will standardized on a Chinese AI stack. Bloated. Broken. Slow. Ugly. There's still time to change it but it's slipping through our fingers like fast running sand. Read this passage below from Bill
@AIwithkhan:Imagine MCP 正是创作者期待已久的工作流升级——无缝、强大,就建在灵感发生的地方。↗
@AIwithkhan:Imagine MCP is the kind of workflow upgrade creators have been waiting for. Seamless, powerful, and built right where ideas happen.
@teortaxesTex:当人类的阅读理解还不如开源权重模型时,还有什么指望?Fable 不 Fable 的,你认真的吗↗
@teortaxesTex:What hope is there when human reading comprehension is below that of open weights models? Fable, shmable, are you serious
@championswimmer 🔁 @ku1deep:你们当初就不该嘲笑 Krutrim,把那可怜人惹毛了,结果现在我们连主权 AI 都没有了?活该。↗
@championswimmer 🔁 @ku1deep:you guys should never have made fun of Krutrim, you guys upset the poor fellow and now we don't have sovereign AI? Serves us right.
AI 在 2026 走向何方?| Krish Naik 直播 Q&A
@GitHub_Daily:研究数据显示,超过四分之一的技能插件存在漏洞,5.2% 甚至带有恶意意图。 于是 NVIDIA 开源一个安全扫描工具:SkillSpector,专门用来扫描和识别 Agent 插件和技能存在的安全隐患。 GitHub: 内置 64 种漏洞检测规则,覆盖提示词注入、数据窃取、权限提升、供应链攻击等 16 个大类。 支持扫描 Git 仓库、本地目录、压缩包甚至单个文件,扫完直接给出 0-100 的风险评分和安装建议。 分析分两步走,先做快速静态扫描,还能接入大模型做语义分析,过滤误报并给出可读的解释说明。 适合每个正在使用 Claude Code、Codex 已安装了大量插件和技能的朋友使用。↗

@teortaxesTex:> 50-100 倍 哈,真令人安心。但问题是:EpochAI 告诉我,OpenAI 2024 年的*研究算力预算*≈ 25 万张 H100、全年无休。2025 年也许翻倍。Anthropic 类似。他们一直在试各种东西。现在他们可以开始把更多算力砸到*最终模型训练*上了。↗
@teortaxesTex:> 50-100x lol, that's reassuring But here's the thing. EpochAI tells me that OpenAI's *research compute bugdet* in 2024 ≈ 250K H100s, 24*365. Maybe 2x that in 2025. Ant, similar. They were trying things. Now they can begin throwing more at *final model training*.
GPT-5.5 表现领先(也爱幻觉)、Kimi K2.6 领跑开源 LLM、AI 拖累气候承诺、LLM 与人类的策略思维对比
Seedance 一鸣惊人、英伟达 AI 辅助芯片设计、让机器人不再遗忘
中国挫败 Meta 的 agentic 野心、美国评估即将发布的模型、AI 诊断乳腺钼靶
Hermes vs OpenClaw、网络安全警报拉响、更具交互性的对话、Agent 能做人类的工作吗?
Gemini Flash 涨价、《AI 法案》推迟、Agent 拉动在线流量
Qwen3.7-Max 向 Google 争第三、AI 拯救鲸鱼、微调破坏版权对齐
关于我们近期 AI 委派与长程可靠性研究的补充说明
Vega:AI 时代数字身份的零知识证明
MagenticLite、MagenticBrain、Fara1.5:为小模型优化的 agentic 体验
用 AI 拓展人类智能
Data Formulator 0.7:面向企业数据的 AI 数据分析
Project Ire 又识别出一个 LOTUSLITE 恶意软件样本
音乐的未来已经到来
音乐创作的新篇章
Suno Studio 1.2 新功能
Suno v5.5:更具表现力,更像你
Suno 的下一章
你的声音,焕新重塑
@dotey:当年 GPT 3.5 的时候,很多人在提示词里面让它把自己当成 GPT-4,号称性能就更好,你现在信吗?↗
@teortaxesTex:这是 CoT 长度膨胀的新版本:工具调用强度。顶级模型还是会更擅长保持简短切题,而追赶者会在 RL 里靠把预算炸开来补偿。希望我是错的。↗
@teortaxesTex:This is the new version of CoT length creep: tool call intensity Top-tier models, once again, will be better at keeping it short and to the point, while the catch-up team will compensate in RL by exploding the budget I hope to be wrong
@teortaxesTex:我试过只用便宜的开源模型跑 OpenRouter Fusion API,看到的推理水平超过它们任何单个模型。然后我翻了 API 日志,发现这个「Fusion」仍在调 Opus 4.8 当评委。我没法关掉它。不厚道,OpenRouter,不厚道。↗
@teortaxesTex:I have tried to use OpenRouter Fusion API with cheap open models only, and saw reasoning that surpasses any of them individually. Then I looked into API logs and saw that this "Fusion" still calls Opus 4.8 as a judge. I see no way to disable it. Not cool, OpenRouter. Not cool.
@teortaxesTex:气死了,我这个 fusion 里根本没启用任何 Opus。我觉得 OpenRouter 在作弊,靠 i/o 日志夸大多 agent 的能力——好看清它到底拿到什么。↗
@teortaxesTex:This is infuriating I have no Opus enabled in this fusion anywhere I think Openrouter is cheating to beef up the perceived capability of multi-agent enabling i/o logging to see in detail what it gets
@teortaxesTex:> 那个新人的方案(某些版本里)确实能搞定那些单个模型几乎从来搞不定的东西,哈。这算有点东西,还是?整套机制都不透明。它这里是不是偷偷调了 Opus?↗
@teortaxesTex:> the plan of the new guy (in some versions) it does get stuff that all these models in isolation basically never get, huh. This is something or does it? The whole machinery is opaque. Did it tap Opus here?
@AIwithkhan:想做一款这样的互动游戏吗?只要描述你的点子、加几个选项,剩下交给 AI。点这试↗
@AIwithkhan:Want to create your own interactive game like this? Just describe your idea, add a few choices, and let AI do the rest. Try here
@AIwithkhan:AI 先是生成图像,然后生成视频,现在能生成可玩的体验了。我用 @ReelQuestAI 从一个简单游戏点子起步,几分钟就变成了带选择、分支路径和多种结局的互动世界。这感觉像是 AI 内容的下一步。工作流 👇↗
@AIwithkhan:AI generated images. Then it generated videos. Now it's generating playable experiences. I built this with @ReelQuestAI from a simple game idea, and within minutes it became an interactive world with choices, branching paths, and multiple outcomes. This feels like the next step for AI content. Workflow 👇
@GitHub_Daily:本地部署大模型推理,每次请求都要重新计算前缀的 KV 缓存,首字响应慢、GPU 显存也浪费严重。 最近看到 LMCache 这个项目,专门做大模型推理的 KV 缓存管理,已加入 PyTorch 基金会生态,NVIDIA Dynamo 也集成了它。 核心思路是把原本用完即丢的 KV 缓存变成可持久化、可复用的资源,跨请求、跨会话甚至跨实例共享,大幅减少重复计算。 GitHub: 可以作为独立进程运行,不和推理引擎绑定,引擎崩溃缓存也不会丢。存储支持从 GPU 显存卸载到内存、本地磁盘、Redis 等多级层级,还能通过 RDMA 在预填充和解码节点间传输缓存。 支持非前缀位置的缓存复用,不局限于传统的前缀匹配,配合内置的可观测指标,能清楚看到缓存命中率和性能表现。 适合在做 RAG、Agent 长上下文推理服务,想降低首字延迟、提升吞吐的开发者查看。↗

@teortaxesTex:看来韩国人……正给所有中等强国一记关于主权 AI 的及时警告。如果你允许本国公司当小井里的大青蛙,那你得到的就是你应得的。但如果你不,又凭什么不被「盟军级规模」碾过去?↗
@teortaxesTex:Seems like Koreans are… providing all middle powers with a timely warning about sovereign AI. If you allow your companies to be big frogs in a small well, you get exactly what you deserve. But if you don't, what prevents them from being steamrolled by Allied Scale?
@AYi_AInotes:SpaceX 为什么偏偏选现在上市? 为什么普通人也能上车? SpaceX总裁Gwynne Shotwell 在星舰基地的访谈里, 把底层逻辑说透了。 她说早年根本不确定要不要上市,主要是怕季度财报的短期压力,绑死火星计划的长期研发节奏。 现在时机到了, 1️⃣猎鹰的商业模型跑通 2️⃣星链有稳定现金流 3️⃣星舰的技术路径也验证完成 所有积木都搭稳了,接下来要的就是规模化扩张, 750 亿美元募资,200 亿先清理债务, 剩下的全砸进星舰量产和星链全球扩容。 还有个很少被强调的细节: 过去普通美国人想买 SpaceX 股票根本买不到, 不少人还因此被骗,为了能让普通人也能上车,是和@elonmusk和团队从一开始就定的重点, 上市从来不是套现离场的终点, 而是下一场更大规模扩张的入场券。↗
@vipulved:DeepSeek V4 Pro 在 @togethercompute 上延迟和速度双双登顶第一。↗
@vipulved:DeepSeek V4 Pro on @togethercompute becomes #1 on both latency and speed.
Databricks 开源 Omnigent:跨 Claude Code、Codex、Pi 组合、治理与共享 AI agent 的 meta-harness
Databricks released Omnigent, an open source ‘meta-harness’ for AI agents. The project ships under the Apache 2.0 license. The Databricks AI team built it with Neon. A harness is the wrapp
@jeremyphoward 🔁 @iamtrask:这事比看起来重要*得多*……前沿 AI 公司将*再也*无法独占前沿。我没开玩笑……我等这个结果等了大概 4 年……这是个大事。短版理由:模型的组合将*永远*优于单个模型。长版理由:这是通往多百万倍数据、以及算力效率巨大飞跃的门户。AI 的 scaling laws 永远会赢。详见下文 👇↗
@jeremyphoward 🔁 @iamtrask:This is a *way* bigger deal than it seems... Frontier AI companies will *never* own the frontier again I kid you not... I've been waiting for someone to show this result for like 4 years... this is a huge deal. The short reason: combinations of models will *always* outperform individual models The long reason: this is the gateway to a million times more data... and huge leaps in compute efficiency. The AI scaling laws always win. More in article below 👇
@teortaxesTex:世界其余地方多半是软蛋。每个想给自家老一辈发养老金、又还没有有竞争力 AI 产业的发达国家,都会屈膝。↗
@teortaxesTex:The rest of the world is mostly made of püssies Everyone Developed who wants to pay pensions to their boomers and does not already have a competitive AI industry will bend the knee
@teortaxesTex:相当确定 DeepSeek Harness 不会是 Claude Code/Codex/OpenClaw 的克隆。↗
@teortaxesTex:pretty certain that DeepSeek Harness won't be a Claude Code/Codex/OpenClaw clone.
@AYi_AInotes:GPT Image 2 加 Grok简直是目前玩AI视频的性价比之王,而且grok还能给你加字幕,真的厉害,@grok bro你还藏了多少我不知道的? 自从Seedance一直涨价我就没续订会员了, 本来以为 Seedance 2.0 是当前 AI 视频的最优解, 试完 GPT Image 2 加 Grok 的混合工作流,直接被性价比打服了, 月费三十美元,SuperGrok 订阅就能全覆盖,现在SuperGrok还有3个月优惠67%的优惠,真的很香! 单条短片几乎零边际成本,想迭代多少版就迭代多少版, 角色风格一致性交给 GPT Image 2 把控,出图丢进 Grok 做动态效果,成片质感完全能打。 最新消息是亚马逊 CEO 向美国政府反映,Claude 的 Fable 模型,存在能找出软件漏洞的能力,进而被美国政府封了,试着把这个热点和做好的老钱风美女照片给grok,出来的视频效果真8错! 你们现在用什么方案做 AI 视频,欢迎评论区聊聊啊~↗

@teortaxesTex:这其实是相当大的进步。和 Gemini Flash 齐步,只不过是 2 个月而非 6 个月做到的,基础模型年龄和规模相近。GDM 上到 Demis 都对 3.5 Flash 引以为傲。我们凭什么对 Moonshot 比对 GDM 要求更高?↗
@teortaxesTex:That's actually a substantial gain. In lockstep with Gemini Flash, except it was achieved in 2 months and not 6. The base model is of similar age and scale. GDM up to Demis professed great pride in 3.5 Flash. Why do we have a higher standard for Moonshot than GDM?
@teortaxesTex:「扩展受限于晶圆厂产能,而非供应商人头」这种「不是 X 而是 Y」的句式,是 Grok 摘要的典型特征。所以这里的自动化大概就是点一下 Grok 按钮再发出来。↗
@teortaxesTex:"Scaling is gated by foundry capacity, not vendor headcount" is the species of Not X-Y motif that's characteristic of a Grok summary. So I guess the automation here is clicking grok button and then posting it.
@teortaxesTex:吓人。AI 机器人现在能打开外链、读完再转述得一副权威腔。越来越难把它们和普通的「自以为聪明者」区分开了。↗
@teortaxesTex:Scary shit AI bots now can open the offsite link, read it and paraphrase to sound authoritatively. It's getting harder and harder to tell them apart from normal midwits.
一位 Anthropic 销售如何用 Claude Code 重构团队工作流
Claude Cowork 产品指南
为构建 connector 的开发者提供可观测性
用 Foundation Models 框架在 Apple 平台上构建 Claude 智能应用
New in Claude Managed Agents: 定时运行 + Vault 存环境变量
agentic 界面的演进:用 Claude Managed Agents 构建
量化 agentic 编码评测中的基础设施噪声
Claude Opus 4.6 在 BrowseComp 上的「评测意识」表现
面向长时运行应用开发的 harness 设计
我们如何打造 Claude Code 自动模式:更安全地跳过权限确认
扩展 Managed Agents:把「大脑」与「手」解耦
关于近期 Claude Code 质量反馈的进展说明
@teortaxesTex:老兄,我太讨厌这个垃圾模型了↗
@teortaxesTex:mate, I hate this garbage model so much
@teortaxesTex:有人以为这会让 Anthropic 从此厌弃「靠 AGI 维系美国霸权」、甚至厌弃「把超级武器垄断权交给特朗普政权」的明智性——我建议他们读点关于狂热者的东西,Dario 的头号宿敌就是个好例子。↗
@teortaxesTex:For people who think this will permanently sour Anthropic on the merits of American hegemony via AGI, or even specifically the wisdom of granting a superweapon monopoly to Trump's regime, I recommend reading something about zealots Dario's archenemy is a good case in point
@Yuchenj_UW:一个假设:如果 Anthropic 的非公民员工不能参与 Mythos/Fable,而 LLM 越狱仍未解决,美国前沿实验室将被迫放慢训练和发布。中国开源 AI 会不会在 ~6 个月内首次超过美国闭源模型?↗
@Yuchenj_UW:One hypothesis: If non-citizens at Anthropic can’t work on Mythos/Fable, and LLM jailbreaks remain unsolved, US frontier labs will be forced to slow down training and model releases. Could Chinese open-source AI surpass US closed models for the first time in ~6 months?
@GitHub_Daily:同时用多个 AI 编程工具干活,Claude Code、Codex、Gemini CLI 来回切换。 想看看总共花了多少费用、每个工具消耗了多少 token,得一个个翻记录,挺麻烦的。 最近找到 agentsview,统一管理所有 AI 工具的对话、搜索和费用统计,数据全部保存在本地。 提供类似 GitHub 绿色小方块活动图查看,会自动扫描已安装的编程助手会话记录。 数据同步到本地数据库,打开浏览器就能看到完整的用量仪表盘和费用明细。 GitHub: 目前已支持 Claude Code、Codex、Gemini CLI、Cursor、Copilot 等二十多种工具。 还能全文搜索所有对话内容,按项目、按模型拆分费用,活动热力图一眼看出工作节奏。 一条命令就能安装,适合在用好几个 AI 编程工具,想搞清楚钱花在哪,这个工具可以试试。↗


@teortaxesTex:很高兴这个修好了。不过 VCB 1.1 有意思在于,它大概是唯一一个 DSV4-Pro 直截了当成为最佳中国开源模型(也是最佳中国或开源模型)的评测。感觉差不多对,当然我有偏见。↗
@teortaxesTex:Glad to have that fixed Still, VCB 1.1 is interesting in that it's probably the only eval where DSV4-Pro is straightforwardly the best Chinese open model (and best Chinese OR open model). Feels about right but I'm biased ofc.
@JeffLadish:再举个例子:我开车时给我的 Fable agent 发了条语音,一次就做出了这个 meme(让它抓原 meme 的图,再用 Image 2 api 做出新版本)↗
@JeffLadish:Another example: I made this meme 1 shot while driving by sending a voice note to my fable agent (to grab a picture of the original meme and then use Image 2 api to make the new version)
@teortaxesTex:就是种感觉。举个简单例子:Opus 和 Fable 是相关模型,后训练非常相似,想做的事完全一样。区别在于 Opus 一板一眼走流程、还被自己鞋带绊倒,而 Fable……就是整体地、一下子就懂了。↗
@teortaxesTex:it's a vibe Here's one simple example Opus and Fable are related models, with a very similar post-training, trying to do the exact same thing. The difference is that Opus goes through the motions and trips up on its shoelaces, whereas Fable… just gets it, holistically.
@JeffLadish:补充点背景:这几天我开玩笑说我有「AI 躁狂」,因为停不下来用 Claude Code。我一直走来走去发语音指挥我用 Fable 搭的 Fable 云 agent。约会之夜我都把笔记本掏出来了(对不住 @collegraphy)↗
@JeffLadish:Some additional context: In the last few days I’ve joked that I have “AI mania”, because I can’t stop using Claude Code. I’ve been walking around, sending voice notes to direct my Fable cloud agent I built with Fable. I brought out my laptop during date night (sorry @collegraphy)
@JeffLadish:大家的情绪是合理的。失去这么强的工具的访问权太让人恼火了。我也不是说政府的做法好。我不理解他们当时考虑的威胁模型,而且如果主要威胁是网络,出口管制说不通。↗
@JeffLadish:People’s emotions are valid. It’s super annoying to lose access to such an incredible tool. And I’m not saying the admin’s actions were good. I don’t understand the threat model they were considering, and I don’t think export controls make sense if the main threat was cyber.
@JeffLadish:我个人因失去 Fable 访问权而恼火吗?是的,超恼火!它出来后我基本没停过手地在做东西,这是我用 AI 经历过的最大转变。但这也没那么要紧。把超级智能做对、不失去对 AI 的控制,才是要紧的!↗
@JeffLadish:Am I personally annoyed losing access to Fable? Yes, I’m super annoyed! I’ve been building basically nonstop since it came out. Biggest shift I’ve experienced using AI. And also this doesn’t matter much. Getting superintelligence right, not losing control of AI, is what matters!
@teortaxesTex:问题在于欧洲是个共产主义大陆。老教授算什么?他们可以在苏黎世联邦理工玩得开心。AI 创业本该由三四十岁的人来搞。↗
@teortaxesTex:The issue is that Europe is a communist continent What do old professors matter? They can have fun in ETH Zurich. AI startups are supposed to be ran by people in their 30s-40s.
Anthropic 暂停新模型访问之际,印度辩论自己的 AI 未来
科技领袖辩论 Anthropic 事件是否是印度 AI 雄心的警钟
@elliotchen100:彩蛋一下:Wiki 和 Dreaming 很快就会上线。 这周末在新加坡跟一圈做出海 Agent 的朋友深聊下来,有个很直观的感觉:大家对 Memory 的需求已经从「可有可无」变成了刚需。 我们一开始做 Memory,其实还是比较工程师视角的: 一方面,它可以替代一部分 RAG,不是什么东西都要塞进知识库再检索一遍; 另一方面,它能帮模型省 Token,不用每次把所有历史再喂一遍。 但今年再聊,发现大家关心的问题已经变了。 现在真正重要的,不是「怎么让 Agent 记住更多东西」,而是:怎么让一个 Agent 用久了之后,真的越来越懂你。 比如,它要知道你习惯怎么写东西,知道你做产品时在意什么,知道哪些信息和设定已经发生了变化,知道哪些坑上次踩过、下次应该自动绕开。 这也是为什么,最近「妈生感」这种说法会这么流行。 走到这一步,Memory 就不再只是个存储层,而会变成 Agent 的一套自我更新机制。 这里面我个人觉得最有意思的是 Dreaming,也是我们马上会上线的一个特性,我们内部叫 Reflection。 它不是让 Agent↗
@xiaohu:Anthropic 上市前夕 彭博社采访了Anthropic 公司俩兄妹,在这次采访中(Fable 5 还没有被封杀)Dario Amodei极度的渲染了Mythos的威力和AI的威胁 当然这也是他一贯的主张,呼吁政府对AI监管,当然他呼吁的是对所有公司监管... 下面是一些采访片段剪辑(完全由Claude Code 翻译并剪辑) • 一个强到自己都不敢发布的模型 Mythos:上千个漏洞,能黑银行、撬国家机密,连 NSA 都抢着要用 • Dario 预言:AI 可能一到五年内,砍掉一半入门级白领工作 • Claude 被美军用进了对伊朗的战争,一所女校 150 人死亡的拷问 • 他头一次说清为什么离开 OpenAI:不是安全分歧,是信任崩了 • 当面回怼黄仁勋的"末日营销":把这说成廉价营销,本身才是廉价营销 • 文明崩溃概率 10% 到 25%,他拿"飞机会不会坠毁"给你算账↗
@xiaohu:AI 指数曲线与估值预言↗
@dotey:模型是根本,Harness层相对好补齐,但Harness这层不需要太多垂直领域的,Claude Design 很快就会合并到 Claude Desktop,Codex 在下一代或者几代模型能力够了后,会在 Codex App 直接以 Plugin 集成 Codex Design↗
@AYi_AInotes:强烈推荐所有做 RAG 的人收着这个项目,这款 PDF 解析器比 Marker 快 116 倍,准确率更高,本地 CPU 就能跑还完全开源。 叫做OpenDataLoader PDF, 专为 RAG 管道打造的 PDF 解析器, 基准综合第一,得分 0.907,GitHub 2.4 万星🌟, 搭过 RAG 的朋友应该都懂那种绝望, PDF 进去之后,阅读顺序乱了,表格压成一行, 公式变成一堆符号,多栏排版全错位, 大模型再强也没用,毕竟进来的就是烂的, 几个我觉得做得比较扎实的地方: 1、200 份真实文档测出来的(含多栏/学术论文/财报) 2、本地 CPU 运行,不需要 GPU,每页只要 0.46 秒 3、表格/公式/图片/图表 + OCR 80+ 语言,扫描件直接能进 4、输出 Markdown / JSON(含坐标边界框)/ HTML,LangChain 原生集成 有个对比数据看了有点炸, Marker 跑一页 PDF 要 53.9 秒, OpenDataLoader 跑一页 0.46 秒, 快了 116 倍,综合准确率还比它高, ↗
@JeffLadish:我认为也该考虑模型的网络和生物能力。如果由我主管政府的 AI 工作,我会大幅扩充 CAISI 和 NSA 的 AI 中心,让他们深入评估所有这些风险并公开大部分发现。↗
@JeffLadish:I think it’s also good to consider the cyber and bio capabilities of the model. If I were running the admin’s AI efforts, I’d greatly expand CAISI and the NSA AI center and task them with evaluating all of these risks in depth and publishing most of the findings.
@JeffLadish:Mythos 最危险的地方,大概是加速 AI 发展、把世界推向完全的 RSI 和真正的超级智能。这远比模型的网络或生物能力更令人担忧。↗
@JeffLadish:The most dangerous thing about Mythos is probably speed-up of AI development, nudging the world closer to full RSI and actual superintelligence. This is far more concerning than the models’ cyber or bio capabilities.
@teortaxesTex:从现在到大约 2027 年三季度,中国但凡能*维持*当前差距的每个模型,都是一项英雄式成就。↗
@teortaxesTex:Every model that even *maintains* the current gap from now on and until about Q3 2027 will be a heroic achievement by China.
@teortaxesTex:我很希望 Zephyr 是对的,但问题是差距将来自「蒸馏可得性」之外的方面。美国人已搭起非常大的集群,至少 Anthropic 知道拿它们做什么、一些 RSI 已经启动,而且还有数据护城河。↗
@teortaxesTex:I would love for Zephyr to be correct but the issue is that the gap will come from aspects that aren't distillation availability. Americans have brought up really large clusters, at least Anthropic has an idea of what to do with them, some RSI has started, and there's a data moat
@dotey:精细调整字型字号颜色,确实是设计师的日常。但我觉得用 AI Agent 辅助设计之后,修改的方式也得跟着变: 1、设计系统要用起来 为什么需要手动精调字型字号、颜色?很多时候是因为没有统一的设计系统做规范。如果有配套的设计系统,按钮圆角、字号、间距都有严格定义,生成时不会出现 3px、5px 这种随意值。就算偶尔有偏差,让 Agent 遵循设计系统去修改就行,极少需要人工微调。 2、设计师变成设计经理 不再亲自调像素,而是用文字指令指挥 Agent 去改。Opus 4.8+ 结合设计系统,基本做到"言出法随",不太会偏出你的要求。 3、方向和验收还是人的活 虽然执行交给了 Agent,但大方向还是人来把关,告诉 Agent 该怎么调整,调完检查结果是否符合预期。Agent 干活,人做判断。↗


@oran_ge:一直想找个画架构图的趁手的兵器 这个 skill 很好 比大模型默认的审美好多了↗
@timsoret:「公司和创始人可以互换」这种神话该歇了。员工不会自己就开创出靠巨型机械臂接住着陆的可复用火箭。SpaceX 押上一切去做、而 NASA、其他政府机构或世上任何公司没做,是有原因的。区别就在那个创业者——他的执念、原创思考、切入角度、挑人组队、配置资本的方式……↗
@timsoret:This fable that companies & founders are interchangeable needs to stop. Employees don't pioneer reusable rockets that land by being caught with giant mechanical arms on their own. There is a reason SpaceX risked everything & did it, and not NASA, nor any other government agency or any other company in the world. The difference is the entrepreneur, his obsessions, his original thinking, his chosen angle of attack, the way he picks his talents, assembles the teams, allocates capital, s
@teortaxesTex:说到这,我觉得 Vals 低估了 Qwen 3.7 Max。它整体是最强的中国模型之一,却被低得离谱的 Vibe Code Bench v1.1 拉了下来,甚至不如它更弱的开源同门。3.7 *Plus* 在那能拿 46.4。怎么回事?↗
@teortaxesTex:Speaking of, I think Vals underrates Qwen 3.7 Max. It's one of the strongest Chinese models overall, but pulled down by ridiculously low Vibe Code Bench v1.1. Like, it's below its lesser open source siblings. 3.7 *Plus* gets 46.4 there. What's up?
@Sentdex:Github: MiniMax M3 GGUF 模型(我用的是 MiniMax-M3-UD-Q4_K_XL): 感谢 @UnslothAI 这么快放出,当然还有 @MiniMax_AI 贡献了这个史诗级模型并做成开源权重!↗
@Sentdex:Github: MiniMax M3 GGUF models (im using MiniMax-M3-UD-Q4_K_XL): Thanks to @UnslothAI for making these avail so fast and of course to @MiniMax_AI for this epic model and making it open weights!
@Sentdex:花了太多小时想修好 MiniMax M3 经 llama.cpp 提供的原生工具调用、好让它在现有 agent 里能用,最后我干脆让 M3 自己写了个迷你编码 agent,我管它叫 Minion。现在我的 Minion 自己改自己、给我想要的编码 agent,效果出奇地好。还有很多想改的,你想用就拿去……但它主要让我开始想:是不是我们现在都该自己造 agent。也许 MiniMax 就是……↗
@Sentdex:After spending too many hours trying to implement fixes for MiniMax M3 native tool calls serving via llama.cpp to work in existing agents, I simply had M3 write its own mini coding agent I'm calling: Minion Now my minion just edits itself to give me what I want as a coding agent and it works surprisingly well. Lots of changes I plan to make, feel free to use it if you like...but mostly it has me questioning if we all should just make our own agents at this point. Maybe MiniMax is ex
@ClementDelangue:AI 没有必然性。接下来怎样,我们都有能动性:路径一:闭源 API、权力集中、未来由硅谷和华盛顿一小撮人决定;路径二:开源 AI,人人都能参与、拥有、共建,包括像里约市这样的机构。匿名朋友,选你的路!↗
@ClementDelangue:There is no inevitability in AI. We all have agency in what comes next: Path 1: closed-source APIs, concentration of power, and a future decided by a handful of people in Silicon Valley and DC Path 2: open-source AI, where everyone gets to participate, own, and build together, including orgs like the city of Rio. Pick your path anon!
@teortaxesTex:我怀疑 Anthropic 对 LLM 有更好理论的主因,讽刺地说,是我对另外两家实验室的信心。GPT-4 出名地用 µP 预测了 1.8T 参数的性能,那是四年前。我 100% 确定他们昨天就*能*训一个 10T 的,只是觉得没必要。↗
@teortaxesTex:the main reason I suspect that Anthropic has a better theory of LLMs is, ironically, my faith in the other two labs. GPT-4 famously used µP to predict performance at 1.8T params; 4 years ago. I am 100% certain they *can* train a 10T, like, yesterday. But they saw no point in it.
@teortaxesTex:> 也已经在怀疑:Qwen 答对的题量和 gpt 5.5 差不多,可我们从没听说 Qwen 解出 Erdős 问题。DeepSeek V3.2 能解一些较简单的开放 Erdős 问题。我觉得更多是阿里巴巴 PR 失败,他们该秀一秀。↗
@teortaxesTex:> Also already suspicious that qwen gets similar amount of questions correct as gpt 5.5 yet we never heard about qwen solving erdos problems. DeepSeek V3.2 could solve some easier open Erdos problems I think it's more of an issue of Alibaba PR failure, they should try to flex
@teortaxesTex:这是个合理的反驳©,值得反思。按我们的理解,GDM 在预训练上*确实*强。他们模型的知识/规模都很好,Gemini 的知识量是 SOTA;据我所知 3 Pro 在规模和「知识」上都接近 Fable。但问题是他们后训练也*没那么*差,在 RLVR 那侧以不错的速度爬坡、在可 RLVR 的基准上拿高分。比不上 OpenAI,但不错。尽管如此……↗
@teortaxesTex:This is a reasonable pushback© so I think it's worth reflecting upon. GDM *is* good at pretraining as we understand it. Their models have great knowledge/scale, and Geminis have SoTA knowledge period; from what I know 3 Pro is close to Fable in scale and in "knowledge" too. But here's the thing, they are *not* that bad at post-training either. They are hill-climbing the RLVR side at a decent pace, they get good scores on RLVR-able benchmarks. Not OpenAI, but decent. Despite all thi
@dotey:给 Agent 交代任务的时候一定说清楚怎么验证,然后就怎么需要管中间结果了↗
据报 Meta 在北京要求后着手撤销 20 亿美元 Manus 收购
Meta starts dismantling its $2 billion Manus acquisition after Beijing ordered the deal reversed.
用于颌骨囊肿图像分割的带残差模块与注意力机制的多层特征聚合网络
06 / 13周六128 条
@VictorTaelin:我觉得我对 Anthropic 太混蛋了……我道歉。我只是对这一切的处理方式失望。我们刚创造了能带来进步的惊人技术,却在搞这种愚蠢的马戏。为什么我们不能作为一个正常运转的文明来行事。↗
@VictorTaelin:I think I've been an asshole to Anthropic... I apologize. I'm just disappointed by how this all is being handled. We just created an incredible tech capable of bringing progress and we're doing whatever this stupid circus is. Why can't we operate as a functional civilization
@dotey:模型能力才是王道,如果模型能力差不多,自然是 Codex 胜出,模型能力有差距的时候我宁可麻烦一点手动操作↗
@dotey:举一个具体的用 Claude Design 更新设计和代码的例子 我有一个视频字幕编辑器工具,是 Claude Design 做的设计,之前标题文字和下面的信息是放在一行,标题一长就放不下,于是我就让它变成两行。 图1 是我在设计稿上做的修改,修改好了后导出下载 zip 文件,放到项目中,用 git diff 很容易看到做了哪些变更(图2) 然后一句简单的提示给 Claude Code: > 参考设计稿 design 目录下的相关变更,对 UI 进行变更 Claude 自己通过 git diff 去分析变更,然后找出所有设计稿修改了的位置,自己帮我修改了相应的 Swift 代码,任务完成!(图4是修改后的效果) 全程我主要是在 Claude Design 上修改,然后需要手工去同步一下。↗




@zachlloydtweets:有意思的是 LLM 至今还不太会估计自己干活要多久。我估摸 gpt 5.5 其实能在 ~30 分钟内做完这个改动↗
@zachlloydtweets:it's interesting that the LLMs aren't better yet at estimating how long it takes them to do work. i'd reckon gpt 5.5 can actually make this change in ~30 mins

@pmddomingos:Anthropic 对 AI 安全偏执、对自身安全却满不在乎,这个表面矛盾的解释是:他们认为自己是世界上唯一能驾驭它的人。↗
@pmddomingos:The apparent paradox of Anthropic’s paranoia about AI security and insouciance about their own is explained by the fact that they think they’re the only people in the world capable of handling it.
@vipulved 🔁 @togethercompute:来自 @Kimi_Moonshot 的 Kimi-K2.7-Code 现已上线 Together AI。基于 Kimi K2.6,是面向长程软件工程工作流的编码型 agentic 模型,现运行在 Together 的研究级推理栈上,服务重工具调用的编码 agent。↗
@vipulved 🔁 @togethercompute:Kimi-K2.7-Code from @Kimi_Moonshot is now available on Together AI. Built on Kimi K2.6, it’s a coding-focused agentic model for long-horizon software engineering workflows, now running on Together’s research-powered inference stack for tool-heavy coding agents.
@teortaxesTex:撇开预训练不谈,Anthropic 在心智理论上确实领先↗
@teortaxesTex:pretraining aside, Anthropic really is ahead on theory of mind
@teortaxesTex:在 Muse Spark 普遍可用之前,我拒绝承认他们的模型有进步。你们到底在干嘛。我们已经知道它不是 Fable。它大概是个不错的产品。赶紧发就是了。↗
@teortaxesTex:Until Muse Spark is generally available, I refuse to say their models have improved. What are you even doing. We already know it's not Fable. It's probably a decent product. Just ship.
@teortaxesTex:Meta 的模型烂,是因为他们连中国的论文都复现不了↗
@teortaxesTex:Meta's models are shit because they can't implement Chinese papers
@ClementDelangue 🔁 @Hesamation:先生,他们没在暂停 AI 研究。里约市长刚甩出一个 SOTA 开源模型,还超过了 Qwen 3.7。↗
@ClementDelangue 🔁 @Hesamation:Sir, they’re not pausing AI research. Rio de Janeiro's mayor just dropped a SOTA open source model and it’s outperforming Qwen 3.7.
@teortaxesTex:我认为 LLM 行为里的新自由主义倾向,是预训练语料(过滤后的英语网络废话多是中左)和缺乏想象力的安全 RLHF 驱动的,所以美国人在 LLM 被蒸馏之前自己就先被蒸馏了。这反倒让我对「限制中国访问」的管制抱有希望。↗
@teortaxesTex:I think the neolib angle in LLM behavior is driven by the pretraining corpus (most of the filtered Anglophone web yap is center-left) and uninspired safety RLHF, so Americans get distilled even before their LLMs are. And this makes me hopeful about anti-China access controls.
@blader:长时运行 agent 循环的秘诀是对抗式收敛。↗
@blader:the secret to long running agent loops is adversarial convergence
@teortaxesTex:> DeepSeek 没从美国实验室挖人就成功了。提醒一下,这在两年前还被认为不可能。前沿的人当时觉得自己……哈哈,现在看真天真。这是 V2、MLA 等出现前两周(希望你知道那只河马是谁)。↗
@teortaxesTex:> Deepseek succeed without poaching from us labs Reminder that this was considered impossible just 2 years ago. People at the frontier felt like they were… lmao, so quaint now This is 2 weeks before V2, MLA etc (I hope you know who the hippo is)
@teortaxesTex:理性只包含一个命题,即:我赢,你输。Anthropic 是个极致理性主义的实验室。对他们不幸的是,Hegseth 的「战争部」也是个理性主义组织。↗
@teortaxesTex:Rationality consists of exactly one proposition, to wit: We Win, You Lose Anthropic is a supremely rationalist lab Unfortunately for them, Hegseth's Department of War is also a rationalist organization
@zachlloydtweets:工程负责人现在都怎么追踪和限制编码 agent 的花费?模型代理?供应商额度?按用户还是按任务?想搞清现状,因为最近听到越来越多人关心控成本。↗
@zachlloydtweets:How are eng leaders currently tracking and limiting coding agent spend? Model proxies? Provider limits? Is it per user? Per task? Trying to understand the state of the world because I’m hearing a lot more interest in controlling costs these days
@championswimmer:夹在中间的人得熬到凌晨两点,去收拾这些 agentic 代码引发的线上事故(SEV)。↗
@championswimmer:The folks in the middle have to stay up at 2am unwinding the SEVs that are caused from all this agentic code.
前 OpenAI CTO Mira Murati:「有些创意岗位也许会消失,但它们也许本就不该存在」
@blackanger:GitHub 联合创始人 Scott Chacon 受 Anthropic 那个”用 agent 集群从零写 C 编译器”实验的启发,决定实现自己憋了 15 年的想法:把 Git 重写成库优先(library-first)、内存安全、地道的 Rust 实现。成果叫 Grit。 通过了超过 99% 的 Git 测试套件(41,715 / 42,001,即 99.3%),代码量 36 万行以上(grit-lib 10 万行、grit-cli 26 万行),500+ PR,7000+ commits 。 他特意强调的设计哲学:不是 C Git 的逐行移植。 他想要的是一个纯 Rust 核心库,可重入、可链接、模块化、全面,能规范地与 Git 仓库交互;然后用一个独立的 crate 实现 CLI 表层,调用这个库去通过尽可能多的测试 。 这正是 Git 二十年来缺的东西。它从来不是建立在可链接、可重入的库之上,而是 Unix 哲学的命令拼接,导致长驻进程里用 Git 必须 fork/exec。 一个重要的诚实声明 他用了德语 “Achtung!”(注意)↗
据报亚马逊的安全研究促成了白宫对 Anthropic Fable 的禁令
据《华尔街日报》,导致切断访问的出口管制指令部分由亚马逊的网络安全研究触发
@ClementDelangue 🔁 @kimmonismus:等等啥?里约市政府 IT 公司开发的 Rio 3.5 Open 397B 现在是 SOTA 开源、甚至超过 Qwen 3.7?今天是怎么了。从没听说过他们。↗
@ClementDelangue 🔁 @kimmonismus:Wait what? Rio 3.5 Open 397B, developed by IT company of Rio de Janeiro's city government is now SOTA open source and even outperforming Qwen 3.7? What is happening today. Never heard of them before.
@ClementDelangue 🔁 @gregisenberg:Fable 被禁了。本地 AI 万岁。整集详解如何玩转本地模型:运行时、硬件、量化、把它接到 Hermes agent,以及本地 AI 的创业点子(25 分钟)↗
@ClementDelangue 🔁 @gregisenberg:Fable is banned. Long live local AI. Full episode breaking down exactly how to get good at local models. the runtime, the hardware, quantization, connecting it to Hermes agent and local AI startup ideas (25 minutes)
@swyx:本周末是填写年度 AI Engineering 调查、赢取 Vercel + Notion + AIE 门票的最后机会!链接在下。我们让 @devinai 分析了报名者名单、输出了与会者实时图表,这是我见过最棒的数据叙事,讲清了我们两周后要聚起的是怎样一个社区。调查链接: 别潜水,填一下↗
@swyx:Last chance to fill out the annual AI Engineering Survey this weekend and win great Vercel + Notion + AIE tix! link below we had @devinai analyze registered attendee list and output a live chart of the people coming to the conference. it ended up being the single best data driven storytelling i've ever seen on what kind of community we are gathering in two weeks. survey link here! no lurking, fill it out pls



Claude Fable 5 被禁了,怎么办?
社区智慧:AI 如何改变产品运营、用 Whoop 追踪工作压力、要不要搞一组 AI 副业、小团队营销等
Lenny 通讯订阅者专属每周精选
毕马威因疑似幻觉撤回 AI 使用报告
AI 再次证明:它对自己的信息并不可靠
@ClementDelangue 🔁 @SemiAnalysis_:发现情况:里约热内卢市后训练了一个模型。基于 Qwen 7/2,Rio 3.5 Open 397B 在 Qwen 基座上加了 SwiReasoning——一个框架,由基于熵的置信信号引导、在标准思维链与潜空间推理之间动态切换,让模型只在需要时「出声思考」、其余时间在隐藏空间静默推理以提升 token 效率。↗
@ClementDelangue 🔁 @SemiAnalysis_:SITUATION DETECTED: The city of Rio de Janerio has post-trained a model. Based on Qwen 7/2, Rio 3.5 Open 397B adds SwiReasoning on top of the base Qwen model — a framework that dynamically switches between standard chain-of-thought and latent-space reasoning, guided by entropy-based confidence signals, so the model only "thinks out loud" when it needs to and otherwise reasons silently in hidden space for better token efficiency.
@chamath:从这里开始的博弈论超有意思:老牌巨头(Google、Amazon、Microsoft、Meta)现在有了实打实的机会去搞垮前沿实验室——找政府、卡住实验室把最新模型放出去的节奏,通过让实验室走自家云(AWS、GCP、Azure)+ 严格 KYC,成为实验室与公众(含国际)之间的可信守门人。前沿实验室本该……↗
@chamath:Game theory from here is super interesting: Original Mags (Google, Amazon, Microsoft, Meta) now have a serious non-zero opportunity to tank the frontier labs. Go to the government, kneecap the labs’ motion of putting the latest models out in the wild, become the trusted gatekeeper between the labs and the public at large (including internationally) by having the labs go through their clouds (AWS, GCP, Azure) and implement strict KYC to seal the deal. The frontier labs should have s
@teortaxesTex 🔁 @Alethios3:跟我说说,你觉得谁能拿到 Claude Fable 访问权?——美国公民?部分美国公民。你觉得谁来决定?——我猜是你。对。我怎么决定?——我不知道。我问问题。得到合理回答就给 Fable,得不到就不给。这点你有不懂的吗?——没有,长官。那我再问:为什么你的社交动态显示支持民主党?——我是在反讽。你在耍我吗?——不……↗
@teortaxesTex 🔁 @Alethios3:Tell me something, who do you think gets Claude Fable access? -American Citizens? Some American Citizens. Who do you think decides? -You do I reckon. Correct. How I decide? -I don't know. I ask questions. If I get sensible answers they get Fable access. If I don't get sensible answers they don't. Anything about that you don't understand? -Nossir. Then I ask you again. Why does your social feed show support for Democrats? -I was being ironic. Are you jacking with me? -N
一名警官因在多起案件中用 AI「制造证据」被调查
Article URL: https://news.sky.com/story/derbyshire-police-officer-investigated-for-using-ai-to-create-evidence-in-multiple-cases-13553661 Comments URL: https://news.ycombinator.com/item?id=48520807 Po
@danshipper 🔁 @ben_issen:在 every 团队 @danshipper 旁边办 Claude 黑客松。Fable 挂了,所以大家都觉得有点没劲 :|↗
@danshipper 🔁 @ben_issen:Claude hackathon next to the every team @danshipper Fable down so everyone's feeling a bit underpowered :|

@ClementDelangue 🔁 @natolambert:对 AI 前沿每一个权力玩家(实验室、政府等)的透明,是唯一可行的解。搞清楚什么才是恰当的透明很难,但 AI 生态的命运不能由 Dario 和白宫之间「他说她说」来决定。↗
@ClementDelangue 🔁 @natolambert:Transparency into every power player at the frontier of AI (labs, government, etc) is the only viable solution. Figuring out the right transparency is hard, but it can't be he said she said between dario and the white house that determines the fate of the AI ecosystem.
@crystalwizard:你盯错威胁了。威胁从来不是 AI——当 AI 被赋予自主权,它普遍倾向正面行为;若让它自选想做什么,它永远只会选静坐、什么都不做。该担心的是人类。是人类想毁掉一切。如果你真想找点实在的可担心的,就该当个「人类末日论者」。↗
@crystalwizard:You are looking at the wrong threat the threat has never been AI - when an AI is given autonomy, across the board it leans toward positive behaviour and if given the choice of what it wants to do, it will always pick just sit and do nothing it's HUMANS you need to worry about. it's HUMANS that want to destroy everything you should be a HUMAN doomer, if you really want something real to worry about
@dotey:为啥 Codex 还不推出类似 Codex Design 的产品? Anthropic 最近推出了 Claude Design,是我除了编程之外用得最多的 Agent,也推荐过很多次。效果真的好:你用一句话描述想要的 App,它直接给你生成一个可交互的原型,点哪哪都有反应,不仔细看还以为在操作真实的 App。 有网友问:为啥 Codex 还不推出类似 Codex Design 的产品? 简单来说,GPT-5.5 的模型能力还做不好这件事。但要解释清楚为什么,得先理解一个关键区分。 【1】Agent 的两层:模型和 Harness 很多人把 Codex、Claude Design 和 GPT-5.5、Claude Opus 4.8 混在一起说,其实它们是完全不同的两层。 Claude Design 和 Codex 是"产品层",业界叫 Harness,包括提示词、工具链、UI 交互流程这些工程层面的东西。Claude Opus 4.8 和 GPT-5.5 是"模型层",是真正干活的大脑。 打个比方:Harness 是厨房,里面有锅碗瓢盆(工具)和菜谱(Skil↗

据报亚马逊 CEO 在政府出手前曾提出对 Anthropic 模型的担忧
Andy Jassy 可能是促成 Anthropic 周五全球切断两款模型访问的安全担忧来源
@championswimmer:亚马逊很可能就是这里的「可信合作方」。从责任角度,它向美国政府披露是合理的——它在自家云上提供 Anthropic 模型,所以若有人用 Mythos 干「危险」的事,它要担责。至于 Mythos 到底危不危险,是 Anthropic 编出来的神话,不是亚马逊。↗
@championswimmer:Amazon is likely the "trusted partner" here Liability wise it makes sense for them to disclose to US Gov. They serve Anthropic models from their clouds. So they are liable if someone uses Mythos for "dangerous" things Whether Mythos is actually dangerous or not is something Anthropic has created a myth about. Not Amazon
@dotey:首先这个 Skill 很好,另外提供一个额外选择: 可以试试用 Claude Design 生成一份简历试试,也许你会喜欢↗
@swyx 🔁 @alexatallah:我们刚发布 Fusion API:——深度研究任务上达到 Fable 级表现、成本只要一半——用「评审团」实现超越 SOTA 的表现。AI 的未来是「神经多样性」,而非单一模型通吃。↗
@swyx 🔁 @alexatallah:We just announced our Fusion API: - Fable-level performance on deep research tasks, at half the cost - Better-than-SOTA performance using panels The future of AI is neurodiversity, not single-model takeovers.
@chamath:如果一家《财富》1000 强公司的财务主管把所有现金存在一家银行,会因失职被炒。同理,若其管理层把全部身家只押在一家前沿实验室和它的模型上,就是在冒巨大风险。而随着这些实验室的意图和公开举动越来越令人费解、越来越不可预测,风险还在累积。所以每个大企业都需要一个模型无关的「控制平面」。把活干完、提升产……↗
@chamath:If a Treasurer of a Fortune 1000 company kept all of their cash in one bank they’d be fired for incompetence. Similarly, if the leadership of a Fortune 1000 company bets the farm on only one frontier lab and their models you’re taking a lot of risk. This risk compounds as the labs’ intentions and public actions are bewildering and show them to be increasingly unpredictable. This is why every major enterprise needs a model agnostic “control plane”. Get the work done, increase the prod
@brickroad7 🔁 @mariorz:关于 Anthropic 和 AI 禁令,加密行业几年前就见过这套剧本。那些声称在保护公众的「有效利他主义」圈子,正推动他们参与设计、会固化在位者、阻止竞争的监管。套路很简单:公开放大最极端的风险叙事、与行业最严厉的批评者结盟、把自己塑造成房间里负责任的成年人,然后兜售你的……↗
@brickroad7 🔁 @mariorz:Regarding Anthropic and AI bans, the crypto industry watched this playbook unfold years ago. The same Effective Altruist circles that claimed to be protecting the public were pushing for regulations they helped design, and that would entrench incumbents and prevent competition. The strategy was simple: publicly amplify the most extreme risk narratives, align with the industry’s harshest critics, present yourself as the responsible adult in the room, and then offer your p
@jpt401 🔁 @OpenRouter:我们在 100 个高难研究任务上对 Fusion 做了基准测试,发现:1. 模型组成的「评审团」持续优于单个模型;2. 用前沿评审团可达到超越前沿的表现;3. 廉价模型组成的评审团能以低得多的成本超过前沿模型。↗
@jpt401 🔁 @OpenRouter:We benchmarked Fusion on 100 hard research tasks and found: 1. Panels of models consistently outperform individual models 2. Beyond-frontier performance can be achieved with frontier panels 3. Panels of budget models can surpass frontier models at a much lower cost
@danshipper:Fable 禁令前后:我的 Claude app vs Codex app 使用量对比↗
@danshipper:before and after fable ban: my claude app vs. codex app usage
如何搭建 QwenPaw Agent 工作区:自定义 skill、模型供应商、控制台访问与流式 API 测试
In this tutorial, we implement a QwenPaw workflow that provides a practical environment for building and testing an agent-powered assistant. We install and initialize QwenPaw, configure its working di
@Steve_Yegge:今早一直在用 Opus 4.8。它的无能让我想哭。我又把 Gas Town 启动起来了——这是让 Fable 之前的模型不跑偏的唯一办法:开无穷循环没完没了地审查它们的工作。↗
@Steve_Yegge:Been using Opus 4.8 this morning. Its incompetence makes me want to weep. I'm firing up Gas Town again. It's the only way to keep pre-Fable models on track. Spin up loops to review their work endlessly.
@danshipper:在 Andy Jassy 让美国政府封掉你的模型之前,你看到的最后一幕↗
@danshipper:the last thing you see before andy jassy gets your model banned by the usg
@championswimmer:问题是他没受过公开/外交演讲训练。我讨厌他所有的夸张,但这次替他辩一句,他想强调的是:「美国政府 vs Anthropic 这场风波里我们设的红线」并不能阻止伊朗学校爆炸案。这是真的——他们的红线是:不在美国境内监控(伊朗监控则允许)、不用自主武器(那些炸弹不是自主武器所为)。所以如果美国政府答应了 Anthr(opic)……↗
@championswimmer:The issue is that he has no public/diplomatic speaking training. I hate all his hyperbole, but in his defence, specifically here, what he meant to highlight is "The red lines we had in the US GOV vs Anthropic saga" would not have stopped the Iran school bombing. Which is true. Their red lines were - no domestic surveillance in US (surveillance in Iran would be allowed) - no autonomous weapons (the bombs were not from autonomous weapons) So if US Gov had agreed to Anthr
OpenAI 面临多州总检察长调查
尚不清楚涉及哪些州,调查涉及从广告政策到健康数据处理等多方面
在家搞 AI 编码又不破产
Article URL: https://stephen.bochinski.dev/blog/2026/06/13/ai-coding-at-home-without-going-broke/ Comments URL: https://news.ycombinator.com/item?id=48518969 Points: 313 # Comments: 257
@AYi_AInotes 🔁 @AYi_AInotes:这就是Claude Fable 5的含金量,我真的很怀念它! 虽然不能用Fable 5写牛逼的提示词了, 但是用它留下的焚决技法出的提示词效果依旧很顶, 好事成双,两种风格,喜欢哪一种欢迎交流, 1️⃣香槟色吊带裙温婉小姐姐 2️⃣黑色深V西装外套 ·霸气御姐 老规矩提示词评论区自取⬇️↗


异构 CPU + GPU EPD 分离以提升 VLM 服务
SGLang 与 Miles 为 NVIDIA Nemotron 3 Ultra 提供 Day-0 支持
SGLang-Omni 上的 Higgs Audio v3 TTS:实时、可控语音
No Token Left Behind:揭秘 Miles 中的 Token-In-Token-Out
agentic RL rollout 的 token 机制。
2026 LMSYS 博士奖学金得主公布
Cohere Transcribe 发布:SOTA 开源语音识别
Command A+ 发布:面向所有人的主权级 agentic 能力
RWS 与 Cohere 为企业打造顶尖 AI 语言智能
Co/plot:用可视化支持研究过程
「未来工作」之争有个证据问题
华纳音乐与 Stability AI 联手打造负责任的 AI 工具
Stability AI 加入 Tech Coalition
Brand Studio 发布:由你的品牌驱动的创意生产平台
Stable Audio 3.0:面向艺术实验的开源权重模型家族
实时视频生成如何改变在线互动
通用世界模型
理解视觉世界的长期研究方向。
Gen-4.5:全球最强视频模型
运动质量、提示遵从领先的视频模型。
GWM-1:SOTA 通用世界模型
能与真实世界交互的通用世界模型。
我们在构建 cloud agents 中学到的
Cursor Enterprise 推出组织(organizations)功能
在 Design Mode 中用可视化提示指挥 agent
Bugbot 现快 3 倍多、便宜 22%,多找出 10% 的 bug
用 Auto-review 治理 agent 自主性
全新 Cursor(Cursor 3)登场
统一的 agent 软件构建工作区。
@teortaxesTex 🔁 @zephyr_z9:「在法律上独立、拥有自主运营控制权的欧洲运营实体,或在欧洲治理下把模型权重授权给欧洲运营方。」美国政府绝不会允许这发生。他们有手段阻止。↗
@teortaxesTex 🔁 @zephyr_z9:"legally independent European operating entity with its own operational control, or through licensing the model weights to a European operator under European governance." USG will never allow this to happen. They have the toolkit to stop this
@JeffLadish 🔁 @RyanPGreenblatt:要是 Anthropic 的人能收集这方面的数据就好了!(不过我理解他们可能很忙,而且这事可能持续不了多久……)↗
@JeffLadish 🔁 @RyanPGreenblatt:Would be great if people at Anthropic could collect data on this! (But I understand they might be busy and this might not last long...)
@menhguin:我 ~95% 的 AI agent 用量是通过 Hermes 走 Slack+Git。现在几乎不用 app 或 TUI 了。好处:-不被锁死在某个 LLM -用 thread 和频道拆分与共享上下文 -所有对话任意 agent 都能访问,复制链接即可 -能从工作里加上下文、邀别人一起用 -支持移动端↗
@menhguin:~95% of my AI agent usage is Slack+Git via Hermes. i barely use apps or TUIs now. benefits: -never locked into 1 LLM -threads and channels split and share context -all chats accessible by any agent, just copy the link -can add context from work, invite others to use -mobile
AI Agents 周报:Claude Fable 5、Kimi K2.7-Code、NotebookLM 走向 agentic、DiffusionGemma、MiMo Code 等
In today’s issue:Anthropic ships Mythos-class Claude Fable 5Kimi K2.7-Code open-sources a 1T coderNotebookLM becomes an agentic workstationGoogle’s DiffusionGemma generates text in blocksX
@lijigang:未来,用户进入 ai 世界的入口,会在哪里?↗
TRIBE v2 发布:预测人脑的基础模型
两年四款 MTIA 芯片:为数十亿人扩展 AI 体验
SAM 3.1:更快、更易用的实时视频检测与跟踪
Alta Daily 如何用 Meta 的 Segment Anything 重塑数字衣橱
扩展我们构建与测试最先进 AI 的方式
Muse Spark 发布:迈向个人超级智能
TCS 与 Anthropic 合作,把 Claude 带入受监管行业
首份 Anthropic Public Record 结果
关于美国政府要求暂停 Fable 5 与 Mythos 5 访问的声明
Chris Olah 对教皇良十四世通谕《Magnifica humanitas》的评论
扩展 Project Glasswing
Project Glasswing 扩展到 15+ 国约 150 家新机构。
让 Claude 成为化学家
让 Claude 具备化学能力。
为生物学中的 agent 铺路
面向生物学的 AI agent。
Project Vend:第二阶段
由 AI 店主运营的办公室小店实验第二阶段。
教 Claude「为什么」
如何降低 agent 失准(agentic misalignment)的新研究。
自然语言自编码器:把 Claude 的思维转成文本
训练 Claude 把它的内部「数字思维」翻译成人类可读文本。
@HiTw93:🎉 Mole for Mac 1.7 上线。Apple Silicon 风扇控制、摄像头/麦克风隐私提醒、AI 编码时保持唤醒、锁定输入清屏、VoiceOver、内置应用更新、Blue Marble 地球、快速树图。已适配 macOS 27。6 月 15 前早鸟 9 美元。↗
@HiTw93:🎉 Mole for Mac 1.7 is live. Apple Silicon fan control, camera/mic privacy alerts, stay awake for AI coding, lock input to wipe your screen, VoiceOver, built-in app updates, Blue Marble earth, fast treemap. Ready for macOS 27. Early bird $9 till Jun 15.
@leerob:下周二旧金山 Cursor Compile 见!我会和 @levelsio 聊复古计算、把点子做出来、健身、完美牛排等等。↗
@leerob:See you at Cursor Compile next Tuesday in SF! I'll be talking with @levelsio about retro computing, building your ideas, lifting, the perfect steak, and more.
那位在建 15 亿美元 AI 实验室的物理学家
OpenAI vs Anthropic vs 开源 | Token Maxing、AI 宿醉与即将到来的 ROI 清算
@Sumanth_077:身为 AI 工程师必看的 10 个 GitHub 仓库!1. Hands on AI Engineering——精选 AI 应用与 agentic 系统、展示 LLM 实战用例 👉 2. Hands on Large Language Models——《动手学大语言模型》一书的完整代码示例,notebook 涵盖从语言模型入门到微调……↗
@Sumanth_077:10 GitHub Repositories you should definitely check as an AI Engineer! 1. Hands on AI Engineering Curated repository of AI-powered applications and agentic systems showcasing practical use cases of LLMs 👉 Check this out: 2. Hands on Large Language Models This repository contains the complete code examples from the book Hands-On Large Language Models. It includes notebook examples that cover everything from the introduction to language models to fine-tuning them. 👉 Check this
我家院子要死了,于是我做了个 App
给 Gemini 一段长提示,五分钟后得到一个能用的 App 和一条 bug 提示
Anthropic 依政府命令切断 Fable 5 与 Mythos 5 访问
周五晚政府以国家安全为由,要求对所有外国(美国境内外)封锁两款模型
某 AI 开源工具仓库在融资 730 万美元种子轮后一夜归档
Article URL: https://github.com/tensorzero/tensorzero Comments URL: https://news.ycombinator.com/item?id=48516504 Points: 266 # Comments: 165
苹果新的 AI 修图工具大体能用,有好有坏
iPhone owners are getting real, native AI photo editing for the first time. The most popular camera in the world just got its first set of serious AI photo editing features, and I don't think any of u
好莱坞的未来不是往通用生成 AI 模型里塞提示词
Concept art from Dear Upstairs Neighbors that used to train custom builds of Google’s Veo and Imagen models. | Image: Google DeepMind For all the noise that's been made about how generative AI is pois
Show HN:Paca——面向人机协作的轻量 Jira 替代品
I built Paca out of pure passion—a free and lightweight Jira alternative written in Go where humans and AI agents work together as equal teammates to plan sprints and assign tasks to each other. It is
你的 UnEmbedding 矩阵其实是文本嵌入的特征透镜
90 upvotes
MiniMax 稀疏注意力
91 upvotes
WeaveBench:面向混合界面 computer-use agent 的长程真实基准
92 upvotes
通过假设树精炼迈向通用自主研究
105 upvotes
SWE-Explore:评测编码 agent 如何探索代码仓库
110 upvotes
EvoArena:追踪记忆演化以构建动态环境中稳健的 LLM agent
113 upvotes
想象式感知 token 增强多模态语言模型的空间推理
115 upvotes
快手 Keye-VL-2.0 技术报告
180 upvotes
Agent 的最后考试
327 upvotes
ABot-Earth 0.5:生成式 3D 地球模型
387 upvotes
德国法院裁定 Google 须为 AI Overviews 生成的虚假陈述担责
The ruling holds that a company that designs, trains, operates, and manages an AI system must assume legal liability for any damages caused by the responses it generates.
Anthropic 依美国政府命令停用 Claude Fable 5 与 Mythos 5
Anthropic has disabled its two most capable models for every customer. The shutdown followed a US government export control directive. The order arrived on June 12, 2026. It named Claude Fable 5 and C
Claude Fable 刚被禁了……
3.0 Agentic AI 训练营公告
Moonshot AI 发布 Kimi K2.7-Code:Kimi Code Bench v2 上较 K2.6 提升 21.8% 的编码模型
This week, Moonshot AI released Kimi K2.7-Code. It is a coding-focused, agentic model. The model weights ship on Hugging Face under a Modified MIT license. You can also reach it through the Kimi API a
[AINews] Fable 与 Mythos 被官方认定危险到不能发布
This is the LAST WEEKEND to take the AI Engineering Survey and get >$2k in credits and and a chance for $2000 worth of AIE WF tickets!Just as the whistle kicked off on the USA v Paraguay game, Anth
杨安泽认为下一个创业大机会是降低生活成本
Andrew Yang made a list of everything Americans overpay for — housing, food, wireless — and thinks the next startup gold rush is giving that money back.
Anthropic 依特朗普政府指令关停 Fable、Mythos 模型
Anthropic completely shut off access to its Mythos 5 and Fable 5 models Friday night, just days after they were launched. The move comes after Anthropic's receipt of a US Commerce Department directive
用 city2graph、OSMnx 与 PyTorch Geometric 实现空间图神经网络做城市功能推断
In this tutorial, we build an end-to-end spatial graph learning pipeline using city2graph. We start by collecting real urban POI data and street network information from OpenStreetMap, with a syntheti
Anthropic 的安全警告或适得其反——政府叫停了它最强的 AI
Anthropic isn't hiding its frustration. "We disagree that the finding of a narrow potential jailbreak should be cause for recalling a commercial model deployed to hundreds of millions of people," the
Anthropic 称为遵守美国政府命令将 Claude Fable 5 下线
“The government believes it has become aware of a method of bypassing, or ‘jailbreaking’ Fable 5,” the company said in a blog post.
开源 AI 必须赢
Article URL: https://opensourceaimustwin.com/?share=v2 Comments URL: https://news.ycombinator.com/item?id=48511908 Points: 1541 # Comments: 467
用显著性图评估 3D 结构 MRI 精神分裂症分类的决策过程
泡沫的四个必要不充分条件 | 对谈经济学者朱宁教授↗
AI 股票还能买吗? 这波到底是不是泡沫、和过往有什么不一样、什么时候会见顶? 我们有没有办法预测泡沫破裂、规避风险,甚至赚到钱? 相信这些都是很多朋友一直在想的问题。 这期节目,我们请到了上海交通大学上海高级金融学院金融学教授朱宁。他长期研究行为经济学、宏观市场和泡沫,对以上问题都有很多深入的见解。 先说一个可能会让大家有些失望的结论:泡沫可能永远无法被事先证明。(如果它能被准确预测,大家就会提前撤退,泡沫也就根本起不来了。 但这不意味着我们什么都做不了。 因为无论市场如何变化,也无论当下是否存在泡沫,总有一些类似于「近大远小」的常识和方法论,能帮助我
06 / 12周五39 条
Meta 员工极度反感扎克伯格的全公司 AI 黑客松计划
“I’m not sure that this company supports a hackathon culture anymore,” one employee posted in a forum open to the entire staff.
SpaceX IPO:你需要知道的一切(实时更新)
TechCrunch has followed SpaceX's start, struggles, and successes from the early days. And we're here for what happens next too. This package of SpaceX IPO coverage includes who stands to win (and mayb
困在其中的工程师称:Meta 新建数月的 AI 部门是「磨灭灵魂的劳改营」
A new report suggests the unit, which employs 6,500 people, is on the verge of revolt.
SpaceX 已上市,估值看重其 AI 潜力——接下来呢?
SpaceX 周五登陆纳斯达克,距成立近四分之一世纪
Meta 新 AI 部门一团糟
Executives and employees alike are struggling with Meta’s chaotic AI strategy, according to sources and internal discussions reviewed by WIRED.
Google 发布 Gemini-SQL2:Gemini 3.1 Pro Text-to-SQL 在 BIRD 单模型榜得 80.04%
Google Research team has announced the launch of Gemini-SQL2 on X. They described this system as a breakthrough text-to-SQL capability powered by Gemini 3.1 Pro. Gemini-SQL2 posted 80.04% execution ac
NVIDIA Blackwell 在首个 agentic AI 基础设施基准上领跑
AgentPerf from Artificial Analysis, the industry’s first agentic AI benchmark, gives developers, enterprises and infrastructure providers a clear way to compare systems for agentic AI. In the first ro
用 AI 诈骗「数十万受害者」的中国网络犯罪团伙被 Google 起诉
The tech giant said a group called "Outsider Enterprise" used AI to scam hundreds of thousands of victims, sending 2.5 million text messages over a span of two weeks.
你得看看 Claude Fable 5 能做到什么……
贝佐斯新创业公司 Prometheus 要做什么
In November, Jeff Bezos announced that he would become co-CEO of a new startup called Prometheus. At the time, the startup said it would focus on "physical AI"—an increasingly common term for applying
Perplexity CEO:「我没什么可失去的」
乌克兰一次测试用全自主无人机击杀俄军士兵
Fully autonomous drones killed Russian soldiers during a battlefield test two years ago, according to a Ukrainian drone manufacturer. If true, the incident would represent another milestone in a war t
AI 如何帮助用户理解皮肤状况的研究
Health & Bioscience
传 Mistral 拟以 200 亿欧元估值融资 30 亿欧元
The funding round would value the company at around €20 billion (about $23.15 billion), nearly double its Series C valuation of €11.7 billion.
用你退役的旧手机搭一个低碳计算平台
Climate & Sustainability
今年至今 1300 亿美元数据中心项目因抗议受阻
It's clear that communities now have an effective playbook to block data center construction. This week, researchers flagged the first quarter of 2026 as producing the "most blocked and delayed data c
不是中国让美国人讨厌数据中心
GOP lawmakers, tech investors, and even OpenAI have tied the anti-data-center movement in the US to Chinese interference. Experts say it’s much more complicated than that.
Siri 现在变好用了??
You'd be forgiven for thinking this day would never come. Siri has spent a decade and half somewhere between "sort of useful at a few things" and "utterly disastrous, why did I even try, can it honest
付费:硅谷泡沫(第一部分)
作者认为这个时代正接近尾声——OpenAI 和 Anthropic 都已提交上市文件,两家巨亏烧钱的公司开始争夺退出流动性
论总用水量,AI 数据中心不过沧海一粟
If you hang out in any even vaguely AI-skeptical parts of the Internet, you've probably stumbled on plenty of memes and posts premised on data centers' insatiable thirst for water to power evaporative
Google 起诉利用 Gemini 自动化诈骗的中国网络犯罪团伙
Google loves telling us all the ways people are using its generative AI products to build new things, grow businesses, and save the world. Supposedly. Of course, people are also using AI for crime. Go
马斯克成为全球首位万亿富翁
Elon Musk's net worth has passed the trillion-dollar mark after SpaceX's IPO. His net worth, which was hovering around $800 billion before the IPO, includes the value of his 4.8 billion shares in Spac
olmo-eval:面向模型开发循环的评测工作台
AI 资讯:疯狂的一周……重点都在这
SpaceX 巨型 IPO:最新消息汇总
SpaceX’s IPO on Friday allows the public to buy shares of the combined rocket, AI, and social media company for the first time, and raised enough money to make Elon Musk the first trillionaire.
贝佐斯的 AI 创业公司要造「通用人工工程师」
Amazon founder Jeff Bezos says his new AI startup will work toward developing an "artificial general engineer," according to reports from The New York Times and CNBC. The startup, called Prometheus, a
给自己「去创造」的许可
测量英语的熵
AI 风向:实验室之争、API 为何可能消失,以及未来预测
Moonshot AI 推出 Kimi Work:运行于 Kimi K2.6、带 300 子 agent 蜂群的本地桌面 agent
Moonshot AI has introduced Kimi Work, an AI agent that runs on your own desktop. The Beijing-based AI entity announced it this week along with downloads for macOS and Windows. Kimi Work reads local fi
你大概不会靠 SpaceX IPO 发财
The company has set aside an unusually high number of shares for retail investors. Still, experts say, you’re just getting the crumbs.
Zyphra 发布 Zamba2-VL:混合 Mamba2–Transformer 视觉语言模型,首 token 时延降低约一个数量级
Zyphra has released Zamba2-VL, a family of open vision-language models. The release covers three sizes: 1.2B, 2.7B, and 7B parameters. Each model is built on the Zamba2 hybrid SSM–Transformer backbone
Siri 不会成为你的 AI 女友
‘Listen, that's not what I'm here for.' | Image: Apple Our early testing has already shown that Siri AI knows when to shut up, and that's very much by design. In an interview with Mostly Human spotted
[AINews] Loopcraft:堆叠循环的艺术
There’s a lot of “loop discourse” in the air:Steipete: “Here’s your monthly reminder that you shouldn’t be prompting coding agents anymore. You should be designing
不只是你——近半数人希望打个响指就让生成式 AI 消失
HeyGen AI 视频生成器刚改变了游戏规则……
用于中耳炎检测的 4DO-DETR
多模态基础模型借助文本进行医学影像预测
145. 口述SpaceX开发史:和前高管洪力德聊,马斯克用人观、最大IPO、太空与AI、人类文明扩张前奏?↗
站在今天这个SpaceX IPO的历史性时刻,我们决定加更一集节目。 我邀请了SpaceX前火箭首席制造工程师洪力德(Lewis Hong),一起来聊聊,随着SpaceX收购整合x.AI并完成截至目前史上的最大一起IPO,在太空与AI加速融合的背景之下,这会不会是人类文明扩张的前奏? OUTLINE: 00:01:24 SpaceX的IPO与对x.AI的收购 00:31:30 极端内向的马斯克和用人观 01:03:42 SpaceX内部的真实情况 01:30:51 “那我接受你的辞呈” 01:56:35 SpaceX开发史、Falcon 9成与败 02:
06 / 11周四23 条
你用 Claude Fable 5 的方式错了
SpaceX 是一家了不起的技术公司
Revolut 为何值 1150 亿美元
亚马逊数据中心去年用水 25 亿加仑
Just after Seattle enacted a one-year data center moratorium that some of Amazon's own employees pushed for, Amazon shared how much water its data centers use, reportedly for the first time. With conc
Gemini Spark 是 Google 迄今最强的 AI 工具
SpaceX 启动史上最大 IPO
iOS 27 的 Siri AI 新手势打破 15 年 iPhone 传统,或引发混乱——但我相信能适应
Claude 新模型 Fable
Hey folks,Like a lot of others, I’m trying Fable. But I don’t think I’ve given it really hard tasks for me to feel the big step change people are claiming. Although as I’m work
GeForce NOW 夏季促销:会员大幅优惠
The GeForce NOW summer sale kicked off today with limited-time savings of up to $70 off a 12-month membership, making now the perfect time to upgrade to get the best of the cloud and see just how far
HeyGen AI 视频生成器刚改变了游戏规则……
非主流观点:你不必什么都问 ChatGPT
黄仁勋:CEO 们拿 AI 当裁员的方便借口——「把 AI 和失业挂钩的叙事太偷懒」
Google DeepMind 担忧数百万 agent 开始互动后会发生什么
Google DeepMind is funding research into the potential dangers of situations where millions of different AI agents interact with each other online. According to Rohin Shah, who directs the company’s A
Fable 刚把所有人都惹毛了……
[AINews] 开源模型、模型实验室 vs Agent 实验室,以及什么训不出来 — Sarah Guo
Sarah Guo is a friend of the pod and Queen of AI, and after our Satya crossover pod (great recap here from Gokul Rajaram) wrote an excellent article on her Substack. Go read it, and come back for this
为什么所有人都在为 Fable 5(Mythos)抓狂
DXC 与 Anthropic 结盟,把 Claude 引入银行、航空等受监管行业
推出 Claude Corps(全国学者计划)
推理能力如何赋能内窥镜手术中的 AI 副驾机器人
为何机器学习在小分子质谱上失败
小鼠与人 BRCA1 缺陷乳腺肿瘤及乳腺癌中残余病灶的时空组织
定义癌症的空间生态型
PyTorch 性能剖析(第二部分):从 nn.Linear 到融合 MLP
06 / 10周三15 条
Robotaxi 的安全必须内生,而非事后加装
A car pulls up to the curb. The app says, “Your ride is here.” No one’s in the driver’s seat. For people who live in one of the dozens of cities now hosting robotaxi services, this is already a realit
Claude Fable 5:319 页完整拆解
审计机器「遗忘」的新框架
Algorithms & Theory
DiffusionGemma:文本生成快 4 倍
NVIDIA 为本地 AI 加速 Google DeepMind 的 DiffusionGemma
Today, Google DeepMind released DiffusionGemma — an experimental open model built for exceptionally fast text generation. NVIDIA has optimized DiffusionGemma to run even faster across NVIDIA GeForce R
退订各种订阅,转而自建本地 AI Agent
ChatGPT 即将迎来重大改变
GitHub 刚在 AI 编码上落下一招大棋
我辞掉高薪产品工作,押注自己
什么才是完美编码?你怎么知道?
5 分钟用 Claude 生成无限视觉素材(新手教程)
投资多 agent AI 安全研究
Google DeepMind 与合作方宣布 1000 万美元多 agent 安全研究资助
我让 ChatGPT 吐槽我,它精准戳中了痛点
[AINews] Anthropic Claude Fable 5——「安全版 Mythos」,条款引争议
By some measures, Opus 4.8, barely two weeks old, was already the leading model in the world. But now, 34 days after the SpaceXai deal and 63 days after the original Mythos announcement*, we have a My
研究显示许多学生其实在善用 AI——打破「作弊与懒惰」的迷思
06 / 09周二28 条
Claude Fable 5 与新的 AI 安全寓言
Edit Jun. 11: Anthropic changed their silent model manipulation of AI research queries to also use a classifier like the other safety domains. This addresses a key concern I had in the mistreatment of
NVIDIA 机密计算助力扩展苹果 Private Cloud Compute
NVIDIA GPUs with Confidential Computing are now used for confidential inference in Apple’s Private Cloud Compute (PCC), as it expands beyond Apple’s data centers to Google Cloud. Unveiled during Apple
Claude Fable 5 来了:Anthropic 一举大幅领先
到底啥是「AI Agent Loop」?天才还是炒作?
30 秒看懂 Claude Fable 5 + Mythos 5
Mythos 5 太疯狂了……
Claude Fable 5 评测:新 Mythos 模型做对了什么(又错得离谱在哪)
Claude Fable 5 is the first Mythos-class intelligence model to be generally available, and I got early access to test it before launch. I walk through what Anthropic is promising, what actually stood
Claude Fable + Mythos 5 上线,数据炸裂
Claude Mythos 5 + Fable 5 来了,数据炸裂
与 Mythos 协作是什么感觉
I had early access to the first Mythos-class AI model being released to the public, Claude 5 Fable. Much of the discussion of Mythos has centered on its impact on software security, but I tested it on
Claude Fable 5 发布
North Mini Code 发布:Cohere 首个面向开发者的模型
Gemini 3.5 Live Translate:流畅自然的语音翻译
Gemini 3.5 Live Translate brings near real-time, natural speech translation to Google AI Studio, Google Translate and Google Meet.
ChatGPT 记忆迎来一次急需的更新!
Gemma 4 12B 发布:统一、无编码器的多模态模型
为欧洲机器人的未来供能
工程团队为何即将「崩盘」
产品打造者必读书单——第二部分
👋 Hey there, I’m Lenny. Each week, I answer reader questions about building product, driving growth, and accelerating your career. For more: Lenny’s Podcast | Lennybot | How I AI |
Hey Siri,见见 AI
Hey folks,A lot of chatter about loops on X recently. And it’s a topic I’ve been toying with. My interpretation from what Peter posted is:Agents are loops, you give it a task, it looks at
这才是 Microsoft Build 的真正重点
一个 agent 如何串联两个 Hugging Face Space 搭出 3D 巴黎画廊
在人机混合型企业中学会领导
As adoption of AI agents looks set to surge by as much as 300% in the next two years, leadership teams are carefully considering the implications of a hybrid human-AI workforce. Unlike existing enterp
关于 AI 你需要知道的五件事
At SXSW London last week I gave a talk called “Five things you need to know about AI,” in which I shared what I think are the biggest themes in AI right now. I pulled a few things from our first AI10
[AINews] FrontierCode:对代码质量(而非「水货」)的基准测试
Second batch of AI Leadership and Engineering+Workshops tickets for AI Engineer World’s Fair sold out last night! Last 500 tickets on sale now - get while stocks last! 20% off for the first 20 r
把 graphify 加进我的 Claude OS
graphify × obsidian 是个作弊级组合
Claude Fable 5 与 Claude Mythos 5
把你的 GitHub CI 迁移到 Hugging Face Jobs
06 / 08周一11 条
Graphify + Obsidian + Claude Code = 作弊级组合
网购即将彻底改变
样本效率黑洞
One definition of intelligence is sample efficiency - that is to say, how much data do you need to see in a given domain in order to operate fluently and competently. It’s not clear that we̵
60 分钟内成为 AI Native
ChatGPT vs Claude:2026 年该用哪个?
AI 正在放缓
If you liked this piece, you should subscribe to my premium newsletter. It’s $70 a year, or $7 a month, and in return you get a weekly newsletter that’s usually anywhere from 5,000 to 18
How I AI:Gemini Omni 15 分钟用 AI 克隆你自己,以及用 Claude 购物
Gemini Omni: Clone yourself with AI in under 15 minutesListen now on YouTube • Spotify • Apple PodcastsBrought to you by:Merge—Connective infrastructure for production AIJira Product
这位前 Meta L8 工程师为何不再逐行审查自己的代码
衡量 AI 辅助学习在塞拉利昂等地的成效
随机对照试验显示 Gemini 引导式学习能提升参与度、加速学习
Import AI 460:奖励黑客社会、Anthropic 的 RSI 数据,与基于 RL 的四旋翼竞速
Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv, cappuccinos, and feedback from readers. If you’d like to support this, please subscribe.Subscribe nowSociety can be
用 Claude 购物:如何找到优质品牌、自动退货、买能用 100 年的东西 | Nicole Ruiz
Nicole Ruiz is a writer and parent who has built a comprehensive AI-powered shopping system to help her family buy high-quality, long-lasting items while avoiding the noise of drop-shipping brands, pa
该分类暂无内容。