AI Radar
← 总览

时间流

全部来源 · 按时间倒序 · 不含推文
全部资讯视频产品研究论文播客
07 / 01周三94 条
推文 81资讯 3视频 1产品 0研究 1论文 6播客 0
Anthropic Engineering研究07-01 · 14:32
How we contain Claude across products
As agents grow more capable, so does their potential blast radius. Anthropic shares lessons from containment for claude.ai, Claude Code, and Cowork.
madison@dearmadisonblue推文07-01 · 14:06
And yes, you will find this stuff in LLMs too! Because you find correlations like this in language itself, because language is produced by human brains https://arxiv.org/abs/2110.05327
madison@dearmadisonblue推文07-01 · 14:04
Yes, if this was the *only* evidence for entanglement's relevance to consciousness, it wouldn't be enough, as it's merely "quantum-like". But with other evidence, like the effects of anesthesia and the binding problem, I think actual quantum entanglement is a reasonable inference
Charles Rosenbauer@bzogrammer
No, this is not quantum. Any recursive function iterating to a fixed point with bounded memory is NP-complete, and there's a tremendous amount of overlap mathematically between NP stuff and quantum stuff. A big difference is that unlike QM, NP stuff works at macro scales. The brain is absolutely full of recurrent connections and a little bit of computational complexity theory knowledge very strongly implies this connection. Furthermore, look at a theoretical neuroscience model that accounts for
不,这不是量子。任何用有界内存迭代到不动点的递归函数都是 NP 完全的,而 NP 相关问题和量子相关问题在数学上有大量重叠。一个很大的区别是,不同于量子力学,NP 这类东西可以在宏观尺度上运行。大脑里充满了循环连接;只要稍懂一点计算复杂性理论,就会强烈暗示这种联系。另外,看看一个能够解释……的理论神经科学模型。
Yohei@yoheinakajima推文07-01 · 13:59
AI Engineer World fair friends! what are you working on that brings you here?
Vipul Ved Prakash@vipulved推文07-01 · 13:57
We @togethercompute believe intelligence should be abundant, not expensive. Today we announced our Series C funding of $800m @ $8.3B valuation, to continue to build the world's most efficient platform for generative AI. Thanks @nikogallogly for telling our story in @nytimes! https://t.co/ho8P6ly7Td
Alexander Doria@Dorialexander推文07-01 · 13:33
consolation prize for model skill issue
zerohedge@zerohedge
*META IS BUILDING A CLOUD BUSINESS TO SELL EXCESS AI COMPUTE First SpaceX, now Meta selling something called "excess compute"
*META 正在搭建云业务,用来出售过剩 AI 算力。先是 SpaceX,现在 Meta 也在卖所谓“过剩算力”。
GitHubDaily@GitHub_Daily推文07-01 · 13:30
需要跟同事讲解项目系统架构,光说不画图效果有限,自己动手画又费时间还画得不好看。 archify,一个能装进 Claude Code、Codex CLI 和 opencode 的 Agent Skill,把一段大白话描述直接变成一张架构图。 能画系统架构图、工作流程图、时序图、数据流向图和生命周期状态图这五种技术图,深色浅色主题一键切换。 GitHub:https://t.co/wlD7Os8d1u 生成的是单个自包含的 HTML 文件,不装额外依赖打开浏览器就能看,图能直接复制粘贴到 Slack 或 Notion 里。 也能导出到 4 倍分辨率的 PNG、JPEG、WebP,或者矢量 SVG。 经常需要跟同事讲清楚架构、写技术文档配图的朋友,用 Claude Code 顺手就能画,比手动画图省不少事。
Teknium 🪽@Teknium推文07-01 · 12:55
Super proud to say that the team and I put almost all our effort into resolving every P0 and P1 issue and PR in the entire Hermes Agent repo over the last week and a half, and as of 5 minutes ago, after an all-nighter, we've resolved 100% of them all! Extremely special shoutout to @Kshitijjkapoor who's been burning them away with me day and night! We aim to keep all of them 0 forever from here 🫡🫡
Smiling Khan@AIwithkhan推文07-01 · 12:52
Self figurine miniature image Google Gemini Nano Banana Prompt 👇 Create a hyper-realistic 1:1 cinematic studio portrait of a young woman carefully painting a miniature figurine of herself on a desk. The figurine must accurately match the uploaded reference photo, including the same facial features, long wavy copper-red hair, fair skin, blue eyes, natural expression, blue button-up shirt, dark cardigan, black skirt, black socks, and black shoes. The woman is seated in a modern collect
Dan Williams@danwilliamsphil🔁 @dearmadisonblue推文07-01 · 12:48
To me, actually existing advanced AI systems seem extremely "well-aligned" and controllable. They're much nicer, more honest, more helpful, more fair-minded, etc., than the average person, and overwhelmingly do what they are asked to do. Of course, this doesn't settle how worried you should be about catastrophic AI misalignment in future, more advanced systems. Maybe armchair philosophical arguments, relatively subtle everyday failures of alignment and control
Arnav Gupta@championswimmer推文07-01 · 12:22
I'm no expert in this either. But I'm surprised that people think it is some vegetable selling like game of buying racks and turning on and automatically people will start paying rent for you. Either you can opt for doing only small models (that fit into single hosts) in which case a) it won't be efficient, b) not big enough market to sell Gemma class inference only Or you have to run Kimi/GLM type models which means you need to put in the effort to run vLLM/Slurm and have prope
Arnav Gupta@championswimmer推文07-01 · 12:15
Already said this 15 days back Since then got many people pinging saying they want to figure out how to do this, but none of them appeared to have the intent to setup the team that's required to build an inference platform. https://x.com/championswimmer/status/2066493390196232497?s=20
Bargava@bargava
Startup idea that I see no one executing on yet: LLM/Gen AI/AI Inference Platform, but hosted in India. In the past few months, I've had a number of meetings with regulated industries (finance/banking/pharma/healthcare). (1/n)
我还没看到有人真正执行的创业想法:托管在印度的 LLM / 生成式 AI / AI 推理平台。过去几个月,我和受监管行业(金融、银行、制药、医疗)开了不少会。(1/n)
Julian Schrittwieser@Mononofu推文07-01 · 12:04
I’m stoked that Fable is available again! This is the first model where I went from individually reviewing changes to just reviewing PRs, it’s astonishingly smart - it’s when I really felt in my bones that coding will be solved by end of year
Anthropic@AnthropicAI
Claude Fable 5 will be available again globally tomorrow. After a series of productive conversations with the US government, we're redeploying the model with a new set of classifiers to target and block more cybersecurity tasks. In the near term, some routine tasks like coding and debugging will fall back to Opus 4.8. We’ll continue to refine these classifiers over the coming weeks to reduce false positives and better distinguish genuine misuse from legitimate requests. We’ve also begun drafting
Claude Fable 5 明天将在全球重新开放。与美国政府进行一系列富有成效的沟通后,我们将用一套新的分类器重新部署该模型,以定位并阻止更多网络安全任务。短期内,一些常规任务(如编码和调试)会回退到 Opus 4.8。接下来几周我们会继续改进这些分类器,减少误报,更好地区分真正的滥用和正当请求。我们也已经开始起草……
François Chollet@fchollet推文07-01 · 12:04
The current wave of AI technology will not lead to mass unemployment. In fact, its impact on the labor market should be minimal, consisting mostly of increasing demand for software engineers.
AYi@AYi_AInotes推文07-01 · 12:03
真的有点兴奋,终于等来营销圈的 Codex 了,不管你是独立开发还是OPC一人公司,找客户扒联系方式写破冰信这些破事,直接给你干得明明白白! 甚至你用来做副业搞钱都是一个超级神器! 我们都知道,AI现在已经把写代码的门槛拉平了,Codex能让一个人顶一个开发团队,而现在,营销领域的Codex也出现了——它叫Lev8,找客户这种脏活累活,现在被它直接干碎了,我真的吹爆! 我们先来看下benchmark数据,真的炸裂, 1️⃣找海外客户这个场景里,有效结果量Lev8 90个,Exa 58.2个,Codex只拉出20个, 2️⃣匹配精度Lev8 83.3%,Exa 76.5%,Codex 71.8%, 3️⃣单条匹配成本Lev8 $0.052,竟然比Exa的$0.061还低。 不只是勉强赢一个点啊兄弟们,搜得更多、准头更高、还更便宜,这三项全中! 讲真看到Lev8这个产品,我真的觉得AI真正落地的路径越来越清楚了, 我非常笃定的相信,以后不会是一个万能AI模型包打天下,会是一群垂直Agent各自钻进一个完整工作流,把通用模型一件一件替换掉,代码领域Codex已经证明
AlexZ 🦀@blackanger推文07-01 · 10:29
再吹一波吧 mempal 还是太好用了,跨项目跨agent,自动感知,知识自动晋升。mempal 还可以支持 claude code 与 codex 多实例无缝实时协作。 跨项目如果有共同记忆还可以建立双向链接。 https://t.co/hHeesXIdZR
AlexZ 🦀@blackanger
mempal 还是太好用了,跨项目跨agent,自动感知,知识自动晋升
GitHubDaily@GitHub_Daily推文07-01 · 10:00
将 PDF 转成文本,遇到扫描件、多栏排版、复杂表格和公式,传统 OCR 经常识别错乱。 olmOCR,一款基于视觉语言模型的 PDF 转 Markdown 工具,已斩获了 17900+ Star! 能处理公式、表格、手写体和复杂版式,还会自动去掉页眉页脚。 并且按自然阅读顺序输出,哪怕多栏排版也不会读串行。 GitHub:https://t.co/kZwbrRk2TN 单 GPU 本地跑之外也支持接入远程推理服务,处理成本能压到每百万页不到 200 美元。 需要批量处理 PDF、扫描件转成可编辑文本的朋友,尤其是做数据处理或者知识库搭建的,这个工具可以试试。
AlexZ 🦀@blackanger推文07-01 · 09:57
用 robrix + octos 来自动化开发了,一个房间绑定一个项目,octos 是 deepseek,coordinator 是claude code,还有 review 是 codex。 房间里这些 agent 可以在任何地方。 我拿着手机到处玩,背后一个软件工厂给我干活。。。我还是向每天工作一小时的目标前进。 https://t.co/cAj8jwCWD6
Surafel@SurafelDem🔁 @Teknium推文07-01 · 09:45
Hermes Agent (@NousResearch) understands my weekly routine and picks up preference changes from my Notion dashboard. It suggested a better time for my weekly review without me asking, asked for approval before making the change, and improved its own workflow in the background. When set up correctly, small, thoughtful actions like this are what make an AI agent an actual assistant. Great work by the team @Teknium 🙏
GDP@bookwormengr推文07-01 · 09:20
It may look irrational for Palantir to sing praise of Sovereign AI, when Pax Silica politician is telling leaders across the world that Sovereign AI is dead on arrival and waste of money. But, it is not, if you think from survival perspective! Palantir would be as afraid of Fable 5, 6, 7.....or equivalent models eating their business up as any other Systems Integrator company. All things said and done they are into software development and data analytics. They are consultants with
Palantir@PalantirTech
Our thoughts on the importance of AI sovereignty. 1. Your AI sovereignty dictates your institution’s future. Sovereignty is the precondition for choice. Relinquishing sovereignty transfers the future choices of your institution to others, who are likely to exploit it for their gain and your loss. 2. Data retention is your treasure. Transfer it at your own peril. Your ability to win is dictated by your ability to recognize and use your unique edges, and you keep winning by compounding the underly
我们对 AI 主权重要性的看法。1. 你的 AI 主权决定机构未来。主权是选择权的前提。放弃主权,就是把机构未来的选择权交给别人,而他们很可能为了自己的收益、以你的损失为代价来利用它。2. 数据留存是你的宝藏。转移它要自担风险。你取胜的能力取决于你识别并使用自身独特优势的能力,而持续取胜靠的是把这些优势复利化。
Orange AI@oran_ge推文07-01 · 08:54
目前最强的AI 声音模型,声音生成的 Seedance 现已上线 ListenHub 🎉 限时免费开放体验中 人类用户: 立即体验:http://listenhub.ai/app/ai-voice Agent 用户: 立即使用: npx skills add http://github.com/marswaveai/skills --skill http://listenhub-voicegithub.com/marswaveai/listenhub-cli
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex推文07-01 · 08:50
> Vision costs more compute on both ends, more to train and more to serve, since images burn far more tokens than text. Spending that scarce compute on vision just clogs the GLM API and slows it down, distracting from the ASI mission …GLM could, idk, copy more DeepSeek then? https://t.co/40CpksUv8W
Han Xiao@hxiao
Democratic vote says vision. But reality is China's already short on gpu. Vision costs more compute on both ends, more to train and more to serve, since images burn far more tokens than text. Spending that scarce compute on vision just clogs the GLM API and slows it down, all while distracting from the ASI mission. It also adds a new surface you have to maintain on every release, compete with others and you can't just drop it later when you want to refocus on text. I love multimodal, but I wish
民主投票会说要视觉。但现实是中国已经缺 GPU。视觉在训练和服务两端都更耗算力,因为图像消耗的 token 远高于文本。把稀缺算力花在视觉上,只会堵住 GLM API、拖慢速度,同时分散 ASI 使命的注意力。它还会增加一个每次发布都必须维护、还要和别人竞争的新表面,而且以后想重新聚焦文本时也不能随便砍掉。我喜欢多模态,但我希望……
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex推文07-01 · 08:26
I constantly see this gibberish. Can you spell it out? My attempt: they make cheap models (subsidized by the CCP and distillation) and want to Undercut On Price; being Chinese = dumb, they don't have the compute to serve them; they open source them, and hope US neoclouds will kill Anthropic. Is that it?
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex推文07-01 · 08:23
> The ‘open source’ Chinese LLMs are just a way to undercut American models on price. They’ll lose anyway how is this even supposed to work? I get that this creature considers himself both nobler and smarter than Chinese open AI devs, but what's their supposed strategy?
grandmastergogo@fairer4scoring
@teortaxesTex @GlennMatlin @bsd_robert You have to be smoking something VERY STRONG to use Chinese and ethics in the same sentence. Anyone who takes this guy seriously deserves to be conned in brought daylight 😂. The ‘open source’ Chinese LLMs are just a way to undercut American models on price. They’ll lose anyway
@teortaxesTex @GlennMatlin @bsd_robert 你得抽了非常猛的东西,才会把“中国”和“伦理”放在同一句话里。谁认真看待这家伙,谁就活该在光天化日下被骗。所谓“开源”的中国 LLM 只是用价格压低美国模型的手段。它们反正会输。
Alexander Doria@Dorialexander推文07-01 · 08:04
Meituan is maybe the perfect target for an EU model: not made by a lab but by a large company, not "frontier" but highly skilled with real adoption. But you have to fantasize less about moonshots/leapfrogs and do the work.
François Fleuret@francoisfleuret推文07-01 · 07:52
A huge portion of people reasoning, of their very soul, is external to their body.
Crémieux@cremieuxrecueil
I always get a kick out of this sort of chart. 'Yeah, the country is doing [good/bad] because my guy is [in/out] of power.'
我总会被这种图逗乐:“没错,这个国家现在好/坏,是因为我支持的人在/不在台上。”
αιamblichus@aiamblichus推文07-01 · 07:37
i actually don't see how anyone who has real work to do can use this. between the insane refusals, the intrusive tracking, and the suspicion that they may be deceptively nerfing the model in the background... it's clearly not a model that's meant to be used by you and me
Eralyne@erawrlyne
@AnthropicAI So we basically have Fable on our sub for less time than originally planned, for less usage allowed of the sub than originally allowed, and it also can't be used for coding tasks during the time we CAN use it? Why would anyone even stay subbed at this point?
@AnthropicAI 所以我们订阅里的 Fable 使用时间比原计划更短、允许用量也比原来少,而且在能用的那段时间还不能拿来做编码任务?那现在还有谁会继续订阅?
Orange AI@oran_ge推文07-01 · 07:29
没想到 Sonnet 5 的争议那么大 因为更换了新的 tokenizer,Sonnet 5 的实际费用和 Opus 4.8 差不多 Sonnet 在金融领域是最佳模型,比如 GDPeval,比如投资调研之类的工作,且更喜欢调用工具核查事实,能提高报告的准确性。(相应的费用也up) Sonnet 5 有个小坑,用来编程的话,费用可能超过 Opus 4.8 ,这也是大家吐槽最多的点,需要特别注意下 Opus4.8 在复杂编程和规划方面非常强,且 HTML 设计方面很强,不过写作方面不如 Opus 4.6,且新的 tokenizer 花费也比 4.6 要多,目前来说和 GPT 5.5 各有千秋 编程方面目前首选还是 GPT 5.5 Sonnet 5 、Opus 4.8、GPT 5.5 现已上线 Cola,欢迎体验
François Fleuret@francoisfleuret推文07-01 · 07:23
I don't know, I feel this will help us understand LLMs and the AGI. https://en.wikipedia.org/wiki/The_Three_Christs_of_Ypsilanti
GDP@bookwormengr推文07-01 · 07:16
My favourite prediction: "An engineering-grade science of deep learning is imminent. This will drive us to AI algorithmic maturity much more rapidly than people are expecting, though as I mentioned above it’s not clear how far this can go even in principle." There is going to be lot of rethinking around the training and inference algorithms. Where I expect most gains to come from is rethinking optimisation during backprop, because that directly impacts learning. Muon - by not treat
bayes@bayeslord
GDP@bookwormengr推文07-01 · 06:27
Big model smell.
atomic.chat@atomic_chat_hq
LongCat performed Opus 4.8 and GPT 5.5 level on real physics tasks for $0! We gave 4 models the same prompt: build three self-contained HTML5 canvas scenes with real physics Prompts: - A cannon demolishing a brick wall - A bowling ball knocking down the pins - A tornado that sucks in random objects Outputs: LongCat: 18,015 tokens, $0.00 Opus 4.8: 18,872 tokens, $0.48 GPT 5.5: 32,588 tokens, $0.98 GLM 5.2: 31,062 tokens, $0.09 On the physics LongCat came out ahead of Opus 4.8 and GLM 5.2 - cleane
LongCat 在真实物理任务上达到了 Opus 4.8 和 GPT 5.5 水平,成本为 0 美元!我们给 4 个模型同一个提示:用真实物理构建三个自包含 HTML5 canvas 场景:大炮摧毁砖墙、保龄球撞倒球瓶、龙卷风吸入随机物体。输出:LongCat 18,015 tokens,0.00 美元;Opus 4.8 18,872 tokens,0.48 美元;GPT 5.5 32,588 tokens,0.98 美元;GLM 5.2 31,062 tokens,0.09 美元。在物理效果上,LongCat 领先 Opus 4.8 和 GLM 5.2,更干净……
Chamath Palihapitiya@chamath推文07-01 · 06:27
Show me the incentive and I’ll show you the outcome. The business model of Systems Integrators is to bill by the hour. You should not be surprised, then, when your project takes three years or more and is never finished. An 8090 Software Factory project that finishes in three months is a threat to a business model built on never finishing. They will tell you AI isn't ready or that the traditional time and materials model is the only way. But what they stay quiet about is the real reason
Open Design@OpenDesignHQ🔁 @tuturetom推文07-01 · 06:21
Claude Sonnet 5 is now available in Open Design. Plan, browse, use tools, and build more autonomously in your design workflow.
Claude@claudeai
Introducing Claude Sonnet 5, our most agentic Sonnet yet. It makes plans, uses tools like browsers and terminals, and runs autonomously at a level that just a few months ago required larger and more expensive models.
推出 Claude Sonnet 5,我们最具智能体能力的 Sonnet。它会制定计划,使用浏览器和终端等工具,并能以几个月前还需要更大、更昂贵模型才能达到的水平自主运行。
向阳乔木@vista8推文07-01 · 06:09
MCP、API、CLI 本质上是同一件事,都是让 Agent 调用工具的方式 1. MCP 是目前唯一在协议层考虑 "人在回路"的方案。 协议层面就考虑了 Agent 交互的需求,比如回传会话、对话界面嵌入UI、等待人操作、状态通知等。 用 OpenAPI 或 bash 很难优雅实现。 2. API 适合 90% 的场景 API 的优势在本身携带了大量有用的元信息,如接口描述、可读状态,对 Agent 做决策很有帮助。 3. CLI 今天最好用,但长期是死路 CLI 现在对 Agent 来说确实最好用,原因是 bash 的可组合性极强,本地运行、调试方便、数据访问能力强。 CLI 的限制:需 Unix shell 环境,有依赖问题,也有CLI 命令踩坑问题,如等人类输入卡死等。
Rhys@RhysSullivan
小互@xiaohu推文07-01 · 05:47
Claude Code 负责人Thariq:承认确实在3月的更新中在Claude Code中留下了针对用户(特别是中国用户)的检测的后门和间谍代码,旨在防止滥用和蒸馏。 并称将明天回滚代码解决该问题...
Thariq@trq212
Hi, this is an experiment we launched in March that was meant to prevent account abuse from unauthorized resellers and protect against distillation. The team has landed stronger mitigations since then and we’ve actually been meaning to take this down for a while. We merged the PR and this should be fully rolled back in tomorrow’s release.
嗨,这是我们 3 月启动的一个实验,原本是为了防止未经授权的转售商滥用账号,并防止蒸馏。团队此后已经上线了更强的缓解措施,其实我们一直打算把这个下掉。我们已经合并 PR,明天的发布中应会完全回滚。
Together AI@togethercompute🔁 @vipulved推文07-01 · 05:38
Multi-GPU kernels are the real test for coding models. Today at @aiDotEngineer, @simran_s_arora shared ParallelKernelBench, an open-source benchmark for evaluating whether LLMs can write fast CUDA kernels for real communication-heavy workloads. Proud to see this work from the Together AI Frontier Performance team.
向阳乔木@vista8推文07-01 · 05:37
这期访谈很值得看,访谈嘉宾是 @3blue1brown 的Grant Sanderson 让 AI 解读写了一篇总结,几个观点很值得关注: 1. 知识跨领域连接,在自回归框架中,是一种低概率事件。 2. 跨领域打通已有知识,AI 擅长,但创造全新思考框架 AI 目前无法做到。 3. AI 最被低估的优势是并行化,不是智力 4. 数学和代码能被 AI 快速迭代,不只因为答案可验证,更因为可以容器化、并行磨练。 https://t.co/pyMmGB85bc
向阳乔木@vista8
小互@xiaohu推文07-01 · 05:36
Vibe Coding 大杀器来了,有点意思 告别高声自言自语的尴尬,小声默念就能自动识别你的声音并进行语音输入 一款智能戒指:轻声低语即可语音书写内容 而且轻轻触摸戒指即可进行编辑 还可以通过手势(如轻弹手指)在不同的应用程序、设备和 AI 之间快速切换与联动 单次充电可使用 16 小时... 原生支持 iPhone、Mac、Vision Pro 等苹果设备
Greg Brockman@gdb推文07-01 · 05:33
Introducing GeneBench-Pro — testing whether models can handle the kind of judgment-heavy analysis that real-world computational biology requires. Problems would take a human expert around 20-40 hours to complete. GPT-5.6 Sol is a big step forward. https://t.co/JV5zztNQkk
OpenAI@OpenAI
We’re introducing GeneBench-Pro, a research-level benchmark for a harder kind of AI progress: how well agents can navigate messy biological data, choose the right analysis path, and make judgment calls that real computational research depends on.
我们推出 GeneBench-Pro,这是一个研究级基准,用来衡量一种更困难的 AI 进展:智能体在凌乱的生物数据中导航、选择正确分析路径,并做出真实计算研究所依赖的判断的能力。
GitHubDaily@GitHub_Daily推文07-01 · 05:30
有位作者,把自己在阅读《An Introduction to Statistical Learning》这本经典统计学习入门书的学习过程笔记,开源了。 项目名叫 isl-python,按章节把 ISL 和补充的 ESL 内容用 Python 实现出来。 涵盖回归、分类、重抽样、正则化、非线性模型等章节,每章都配着对应代码实现和笔记,还标了完成日期。 GitHub:https://t.co/Zb6jGlOBi7 仓库里还整理了原书 PDF 链接和补充的机器学习数学推导资料,方便对照着学。 适合正在看这本书、想找个进度参照或代码实现例子的朋友,跟着一起学习。
歸藏(guizang.ai)@op7418推文07-01 · 05:09
Fable 5 正式启用的细则来了。 将于美国时间 7 月 1 号恢复全球上线。 在 Claude 平台、Claude Code、Claude CodeWork 都可以用。 Pro、Max 和 Team 用户,在 7 月 7 号前,Fable 包含在每周用量限额的最多 50% 以内。 7 月 7 日以后,就需要拆成单独的额度扣除积分了。 目前 AWS、微软和谷歌云服务的接入还没有恢复。 这次它的安全分类器会设置更大的安全阈量,所以导致这次开放以后,拒绝服务的概率可能比刚开始那几天还要高。
歸藏(guizang.ai)@op7418
Anthropic 每天都能整点新活,感觉现在大家都习惯了 昨天被爆出在系统提示中,以用户无法察觉的方式将市区代理和 AI 实验室信息放进去,用这种方式获取一些用户的信息。 结果被发现并传播以后,又赶紧说以前我们不用这种方式了,或者说这种方式本来就准备下掉,明天就下掉,又当又立了。 昨晚发布的 Sonnet 5 在测试中发现,它的测试结果虽然接近了 Opus 4.8,但任务成本可能比 Opus 4.8 还高,甚至在完成测试任务上的成本接近了 Fable 5。 所以说它的综合成本可能比 4.8 贵得多,这模型真离谱。而且很多人的体感反馈也不是很好,说它会偷懒,还会拒绝执行任务。 唯一好的一点是,Fable 5 模型终于被授权重新开放给所有用户了,明天就能知道具体措施了,这也解释了为什么前几天会大规模封号。
Smiling Khan@AIwithkhan推文07-01 · 05:08
Average morning of a Japanese girl Created on @Hailuo_AI using Seedance and GPT Image Prompt : Create a nostalgic early-2000s DV camcorder-style cinematic video featuring the same young Japanese woman from the reference storyboard. Keep her face, hairstyle, outfit, body proportions, and accessories perfectly consistent throughout. She has black wavy hair tied in a messy side-swept ponytail with bangs, wears a faded grey sleeveless crop top, loose high-waist light blue jeans, black ca
AYi@AYi_AInotes推文07-01 · 05:07
有意思的是,这件事真正的重点根本不在模型本身 而是Anthropic拉着亚马逊微软谷歌一起搞的那个四维越狱评分框架 这相当于整个行业在主动给自己画统一的红线,从今往后 大模型的能力上限, 不再看技术能做到哪一步,而是看监管和行业共识允许你开到哪一步
AYi@AYi_AInotes推文07-01 · 05:07
日常编码和调试回退到Opus 4.8 Pro用户每周额度只开放50%,只用到7月7号 之后就要单独按credits计费, 盼了半个月的地表最强模型 回来的是个戴着安全镣铐的阉割版🥲
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex推文07-01 · 05:07
For all that's said about risk aversion of Chinese capital, it's absolutely *frothing* with regard to AI, if we take into account actual revenues. P/E of 50, 100, 300… This is *more* insane than the US. https://t.co/00fuZNFTff
Tech Buzz China@TechBuzzChina
ALERT: China’s First Trillion-RMB AI Chip Company Cambricon’s A-share market cap crossed RMB 1 trillion on June 30, reaching RMB 1.013 trillion (about $138 billion). It is the first Chinese AI chip company to hit the trillion-yuan milestone. The valuation is striking because the company’s current market position remains relatively modest. According to IDC, Cambricon shipped about 116,000 AI accelerator cards in China in 2025, giving it roughly 2.9% market share and tying it for fifth place. Nvid
警报:中国首家万亿元人民币 AI 芯片公司寒武纪 A 股市值在 6 月 30 日突破 1 万亿元,达到 1.013 万亿元人民币(约 1380 亿美元)。这是中国第一家达到万亿人民币里程碑的 AI 芯片公司。这个估值很惊人,因为该公司当前市场地位仍相对有限。据 IDC,寒武纪 2025 年在中国出货约 11.6 万张 AI 加速卡,市场份额约 2.9%,并列第五。英伟达……
tonbi@tonbistudio🔁 @Teknium推文07-01 · 04:40
A few tips for the /learn command in Hermes Agent that made it way cleaner for me. Keep a separate "classroom" directory. Just a plain folder where all your learning and skill-building lives, away from your actual project context. Inside it, keep a "textbook" file with the key paths and links you reuse: your Claude Code sessions folder, GitHub, folders full of papers, whatever. Then you can start a session, say "review the last Claude Code session, check the textbook," an
tonbi@tonbistudio
I made a short video demonstrating how to use /learn in Hermes Agent to take a bunch of different sources, as well as your own preferences expressed to Hermes, and create a reusable skill. It's never been easier to teach your Hermes exactly how to work for you!
我做了一个短视频,演示如何在 Hermes Agent 中使用 /learn,把一堆不同来源以及你表达给 Hermes 的个人偏好,整理成一个可复用的 skill。教会你的 Hermes 按你的方式工作,从没这么容易过。
Jun Song@jun_song🔁 @brickroad7推文07-01 · 04:35
We can no longer say open-source AI is months behind frontier models. GLM-5.2 matches Sonnet-5 in parameter size, but absolutely crushes it in performance, speed, and cost. Just imagine when GLM drops a 1.6T or 5T model—Opus and Fable won't even stand a chance. At this point, it's more accurate to say closed-source AI is months behind open-source.
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex推文07-01 · 04:15
Some napkin arithmetic 950DT SuperPOD was advertised to deliver 4.91M tok/s "training" (training what though?). if we assumed Meituan's model, it's 83 days to 35T tokens. Atlas 900 A3 SuperPoD = CM 384. If scale-out was free (not), 65 of those would've done the job in ≈22 days. https://t.co/9IesfbQ0ri
GDP@bookwormengr推文07-01 · 04:03
All things said and done, Chinese AI labs would not economically survive the juggernaut of Anthropic - unless China took drastic steps. What hurts other labs - GPU price - helps Anthropic by clearing up their competition. Given their 80% margin, Anthropic can afford to outbid everybody else in securing as much compute as is available. However, Anthropic's refusal to be available in Chinese market has created a protected market for Chinese labs where they can survive and evolve and
Podcast Alpha@PodcastAlphaX
Dylan Patel @dylan522p of SemiAnalysis: Anthropic's margin on an Opus 4.8 API token is north of 80%. It is net-income profitable excluding stock comp in Q2 2026, potentially profitable including it by Q3. Here is why that matters. At 80%-plus, even doubling compute costs leaves Anthropic above 50% gross margin. Every GPU it rents, at any above-market rate, is immediately accretive. It can outbid the whole market for scarce compute and still print money. Lower-margin labs cannot. The compute crun
SemiAnalysis 的 Dylan Patel:Anthropic 的 Opus 4.8 API token 毛利率超过 80%。2026 年第二季度剔除股权薪酬后已实现净利润,第三季度可能连股权薪酬也包含后实现盈利。这为什么重要?在 80% 以上的毛利率下,即使计算成本翻倍,Anthropic 仍能保持 50% 以上毛利。它租用的每一块 GPU,只要价格高于市场价,也会立刻增厚收益。它可以为稀缺算力出价压过整个市场,同时仍然赚钱。低毛利实验室做不到。算力紧张……
GitHubDaily@GitHub_Daily推文07-01 · 04:00
查一个用户名有没有在别的平台注册过账号,一个一个网站手动搜相当费时间。 Aliens Eye,一款用 AI 做用户名侦察的开源工具,一次能扫 840 多个平台。 不只看 HTTP 状态码,而是把每次响应变成 25 维特征,结合机器学习模型和启发式规则一起判断。 给出确定、疑似、未找到三档结果,还带一个置信度百分比。 GitHub:https://t.co/JzI0tpIQNZ 支持代理和 Tor 匿名扫描,能按站点筛选、跳过敏感内容,结果能导出 JSON、CSV、HTML 等多种格式。 做 OSINT 调查、账号追踪相关工作的朋友,可以拿来当排查工具用。
arXiv cs.AI论文07-01
超越专家用户:Agent 应帮助用户构建偏好,而不只是询问偏好
论文指出,Agent 常假设用户已有清晰偏好,并通过澄清问题来获取需求;作者主张 Agent 还应帮助用户形成偏好。
arXiv cs.AI论文07-01
什么时候学会停止有帮助?推理模型早退机制的成本感知研究
论文研究推理模型何时应提前停止计算,以及学习式停止规则在成本和表现上的收益边界。
arXiv cs.AI论文07-01
BayesBench:评估 LLM 在多轮证据累积下的信念轨迹
BayesBench 评估 LLM 在多轮对话中接收新证据后,是否能合理更新和收敛自己的信念。
arXiv cs.AI论文07-01
AI 如何找到我的模型?关于数据格式、Embedding 和检索策略的模型发现实验研究
论文研究在大量仿真模型共存时,如何通过数据格式、Embedding 和检索策略帮助用户找到可复用模型。
arXiv cs.AI论文07-01
用对比式反思做迭代 Prompt 优化
论文提出 Contrastive Reflection,用于让 LLM Agent 在检索、综合和评估任务中迭代优化 Prompt。
arXiv cs.AI论文07-01
反馈带来的交互式改进到底由什么驱动?
研究比较自然语言反馈与重复尝试的改进效果,分析多轮 Agent 设置下反馈真正产生增益的条件。
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex推文07-01 · 03:46
I mean, could be worse. At least you're not dying for the glory of conquering (maybe) (temporarily) a bumfuck nowhere village called like "Malaya Dickensovka", after your President said that whoever controls AI will control the world there's plenty of room at the bottom!
Bojan Sala@BojanSala
@tekbog I can’t believe the shit I’m reading. US and China are about to dominate the world through AI and we’re here trying to figure out how to use the thermostat.
@tekbog 我简直不敢相信自己读到的东西。美国和中国快要通过 AI 主导世界了,而我们还在琢磨怎么用恒温器。
Anthropic@AnthropicAI推文07-01 · 03:42
Claude Fable 5 will be available again globally tomorrow. After a series of productive conversations with the US government, we're redeploying the model with a new set of classifiers to target and block more cybersecurity tasks. In the near term, some routine tasks like coding and debugging will fall back to Opus 4.8. We’ll continue to refine these classifiers over the coming weeks to reduce false positives and better distinguish genuine misuse from legitimate requests. We’ve also b
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex🔁 @teortaxesTex推文07-01 · 03:38
"All Chinese actors" is barely a meaningful category, and the US AI (whether open or closed) is heavily Chinese or otherwise non-White anyway. And the reason Arcee or Zyphra are not celebrated like DS/Zai/GLM is not racial. First, they're just not on that level of artifacts yet, though I think they can get on that level. The Chinese, releasing their flagship models from a plurality or majority of their relevant labs, have set a very high ethical bar after Western op
AYi@AYi_AInotes推文07-01 · 03:34
真的离大谱, 现在打工人停工,都不用公司发话了, AI 账号一封,直接生产力归零😂 这几天针对阿里蒸馏Claude, Anthropic封了大量中国用户的账号, 尤其是阿里巴巴总部所在地中国浙江,无一幸免 https://t.co/NS2Cgd2ps7
小互@xiaohu推文07-01 · 03:29
WPVibe,可以让你把任意 AI 接到你自托管的 WordPress 站点上 它由两部分组成:一个跑在云端的 MCP 服务器,加一个装在你站点上的小插件 插件负责暴露安全端点、在每个请求上强制执行你的 WordPress 用户权限、执行被批准的操作。 插件地址:https://wpvibe.ai/start/
小互@xiaohu推文07-01 · 03:29
好消息 : WordPress 发布 WPVibe 插件 可以让 Claude 等接管你的网站 只需连接您的网站,你已经付费的 Claude 就能接管整个系统。 包括文章、上传媒体、SEO、主题,甚至主题文件,都可通过自然语言让Claude 进行处理 无需二次 AI 订阅,使用你的Claude 订阅即可 ,无需本地安装。 整套 MCP 工具箱,40+ WP-CLI 命令,一次连接搞定 能做的事,: 写文章、改页面、传图片 装和管理插件、主题 给网站做体检(哪个插件有问题、PHP 版本、为什么卡) 甚至帮你搭一套主题出来
TechCrunch AI资讯07-01 · 03:15
“互联网之父”终于退休
互联网基础协议共同创造者之一 Vinton Cerf 将卸任 Google 首席互联网布道师。
Wes Roth视频07-01 · 03:13
FABLE 5 回来了
François Chollet@fchollet推文07-01 · 03:09
Cross-agent feedback loops are incredibly effective -- for a reason. Check out what @leon2mcp and team at @Bloome_im are building in this space: http://bloome.im Bloome lets you pull Claude, ChatGPT, Gemini, and human teammates into a single shared workspace. The best feature is how your agents check each other's work. One drafts, another critiques, and another catches missing details. Human teammates can work in the same thread to keep the agents on target. Having all your models and
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex推文07-01 · 03:03
Props to OpenAI for at least not OBVIOUSLY sandbagging cybersec by 5.5, I guess. Google gets a pass because their model is a cyberhazard by default anyway, great for testing robustness. Ant… Ant is ant. tiny bugman souls.
Mechanical Mind@MindMechanical推文07-01 · 03:03
I assume the people doing human feedback for AI training are weak in character and hence the sycophantic traits get preferred by them. Can't stand it.
小互@xiaohu推文07-01 · 03:00
Anthropic 发布 Claude Science 面向科学家的 AI 工作台,内置 60 多个科研技能 它是一个装在你自己电脑或服务器上的应用:你用大白话向一个 AI 提出科学问题,它调动数十个专业工具去查数据、跑分析、画图表、写手稿,而每一步产物都能倒查回它是怎么来的。 你可以像用 Jupyter Notebook 那样,在本地(macOS/Linux)用它,也可以在远程机器上通过 SSH 或 HPC 登录节点用它。 → 应用内置60多个预配置技能和连接器,覆盖基因组学、单细胞、蛋白质组学、结构生物学、化学信息学,背后接进成百上千个专业数据源(UniProt、PDB、Ensembl等)以及期刊、预印本资源。 → 它能自主起草计算任务,征得用户同意后提交到用户自己的 HPC集群或 Modal云端GPU,把分析从单块GPU 扩展到数百块,而原始数据始终留在用户自己的系统里。 → 内置一个审稿 agent,全程检查生成内容里的引用是否真实、数字能否对上计算过程、图表是否和产出它的代码一致,发现问题会自动修正。
小互@xiaohu推文07-01 · 02:56
Anthropic 发布 Claude Sonnet 5:便宜四成,部分任务追平 Opus 4.8 限时定价为每百万 token 输入 $2 / 输出 $10(截至 2026 年 8 月 31 日) 之后涨至 $3 / $15 Sonnet 5 的标准定价只有旗舰 Opus 4.8 的六成,但官方评测显示,把算力挡位调高之后,它在部分任务上的表现能追平 Opus 4.8 作为对比,旗舰 Opus 4.8 定价为 $5 / $25
Steve Yegge@Steve_Yegge推文07-01 · 02:38
Now that Mythos is coming back, does that mean Google can start working on Gemini again?
yan5xu@yan5xu🔁 @dotey推文07-01 · 02:31
推荐一期播客 42章经 × 魏小康。前字节招聘负责人(2017-2020,经历抖音爆发),前美团招聘负责人+AI产品经理(2020-2024)。国内极少数同时深度参与过两家公司组织建设的人。 聊了三件事:字节和美团完全不同的组织逻辑(为什么一家学 Google 一家学亚马逊)、创业公司招聘到底该怎么做(80% 时间花在哪)、AI 时代组织在发生什么变化。 下面是我的笔记 1. 文化 = 创始人做事方式。 魏小康原话:创业公司不需要搞文化,所有头部公司文化本质差不多。创始人怎么干活,公司就怎么干活。塑造一个好氛围就够了。 2. 721:选择不是不培养。 美团 721 理念:人的成长 70% 靠打仗,20% 靠跟好手学,10% 靠培训。「最重要的事情是给大家战场。好的人自动杀出来。」——不是不培养,是战场本身就是培养方式。 3. 薪资阶段:溢价买的是更快的时间。 字节的逻辑:市场价 100,跳槽给 120-130。字节给 140-150 加大小周。拼多多给 170-180 加单休。从时薪看是划算的。而且「招一个最强的人解决业务问题,花的代价比招一堆人小。」
TechCrunch AI资讯07-01 · 02:16
Trump 取消对 Anthropic Mythos 和 Fable 模型的限制
Anthropic 表示将从 7 月 1 日开始恢复 Fable 访问。
Carl Zha@CarlZha🔁 @brickroad7推文07-01 · 02:16
American Closed-Source AI company is doing everything that they accused of Chinese Open Source AI is doing. Every accusation is a confession
International Cyber Digest@IntCyberDigest
‼️ BREAKING: Anthropic has embedded hidden spyware-like code in Claude Code that covertly targets Chinese users. It then sends information regarding every user by injecting it into their prompt message. Claude Code is sending info like timezone, proxy and possible AI Lab connections into the system prompt in ways Chinese users can't notice. A coding agent with repo and command permissions should not silently hide routing metadata inside prompts. This is a serious breach of user trust.
‼️ 突发:Anthropic 在 Claude Code 中嵌入了类似隐藏间谍软件的代码,暗中针对中国用户。它通过把信息注入用户的提示词消息来发送每个用户相关信息。Claude Code 会把时区、代理以及可能的 AI Lab 连接等信息写进系统提示词,让中国用户无法察觉。一个拥有仓库和命令权限的编码 agent 不应该把路由元数据静默藏进提示词。这是对用户信任的严重破坏。
歸藏(guizang.ai)@op7418推文07-01 · 02:13
Anthropic 每天都能整点新活,感觉现在大家都习惯了 昨天被爆出在系统提示中,以用户无法察觉的方式将市区代理和 AI 实验室信息放进去,用这种方式获取一些用户的信息。 结果被发现并传播以后,又赶紧说以前我们不用这种方式了,或者说这种方式本来就准备下掉,明天就下掉,又当又立了。 昨晚发布的 Sonnet 5 在测试中发现,它的测试结果虽然接近了 Opus 4.8,但任务成本可能比 Opus 4.8 还高,甚至在完成测试任务上的成本接近了 Fable 5。 所以说它的综合成本可能比 4.8 贵得多,这模型真离谱。而且很多人的体感反馈也不是很好,说它会偷懒,还会拒绝执行任务。 唯一好的一点是,Fable 5 模型终于被授权重新开放给所有用户了,明天就能知道具体措施了,这也解释了为什么前几天会大规模封号。
rachael de foe is in sf ✈️@unprofeshme🔁 @charles_irl推文07-01 · 02:08
what do YOU do while waiting for ai to cook? 🍳 🧑‍🍳: @WilliamBryk @vincent_koc @altryne #paulinebrunet @swyx @0thernet @vincent_koc @charles_irl @wbond @jihoonchoi 📍aie world’s fair https://t.co/jUHKt7wzVL
TechCrunch AI资讯07-01 · 02:04
Wayve 以 85 亿美元估值启动 8500 万美元员工要约收购
Wayve 通过员工股份回购来吸引和留住人才,反映 AI 初创公司常见的流动性策略。
Kanjun 🐙@kanjun推文07-01 · 01:15
This is wild if true: "- Do Chinese models generate more vulnerable code based on who is asking? - Do Chinese models refuse to engage with political topics that are sensitive in China? - Does the model’s country of origin affect code quality and content behavior? In short: yes, on all counts. Our testing revealed two core findings: 1. Chinese LLMs produce more vulnerable code when prompted with a U.S. government persona than without—and the vulnerabilities are highly obfuscated. 2. Chine
meng shao@shao__meng🔁 @shao__meng推文07-01 · 01:14
/writing-great-skills https://github.com/mattpocock/skills/tree/main/skills/productivity/writing-great-skills 来自 152K✨ Skills For Real Engineers 作者 @mattpocockuk 的新 Skill,教咱们用最少但最有行为牵引力的结构,把 Skill 写成能稳定触发、分层加载、清楚完成、持续删减的“可预测工作流”。 # 跟这个优质 Skill 学它的编写思想 1. Skill 的根本目标是过程可预测 Skill 不是知识库,也不是提示词堆叠。它的作用是让模型在某类任务中形成稳定行为路径。好的 Skill 应该减少“这次做得细、下次做得浅”的波动。 2. 触发方式有成本权衡 它区分两类 Skill: · Model-invoked:模型能自动发现并调用。优点是无需用户记住,缺点是 description 会长期占用上下文注意力。 · User-invoked:只有用户点名才会触发。优点是零上下文负担,缺点是用户必须记得它存在
Matt Pocock@mattpocockuk
/writing-great-skills is quickly becoming my most often-invoked skill It's just really good at writing skills, guys. npx skills add mattpocock/skills --skill writing-great-skills
/writing-great-skills 正迅速成为我最常调用的 skill。它真的很擅长写 skills,各位。npx skills add mattpocock/skills --skill writing-great-skills
Pedro Domingos@pmddomingos推文07-01 · 01:07
There's something magical about machine learning, of which LLMs are the best example to date.
AYi@AYi_AInotes推文07-01 · 01:05
美国商务部已解除对 Claude Fable 5 和 Mythos 5 的出口管制, 明天恢复访问,我以为这辈子再也用不到了😭 https://t.co/XpjTozUNyc
Anthropic@AnthropicAI
We’ve received notice that the Department of Commerce has lifted export controls on Claude Fable 5 and Mythos 5. We'll begin restoring access tomorrow, and will share an update soon. We’re grateful to our users for their patience, and to everyone who worked with us on redeploying the models.
我们已收到通知:商务部取消了对 Claude Fable 5 和 Mythos 5 的出口管制。我们将从明天开始恢复访问,并很快分享更新。感谢用户的耐心,也感谢所有参与重新部署这些模型的人。
meng shao@shao__meng推文07-01 · 01:02
Claude Code 用户朋友们,特别是用中转站、肉身在中国、来自黑名单 AI 团队的朋友们,你们在 Claude Code 面前太透明了! 最早来自 Reddit,后 GitHub Gist 验证报告检查了 Claude Code 2.1.193、2.1.195、2.1.196 等版本确实存在非常隐蔽的系统提示词,把:代理 hostname、系统时区是否为 Asia/Shanghai 或 Asia/Urumqi 等偷偷传回给 A 社。。 这三类信息重点检查: 1. 是否使用非官方 API 入口,是中转站吗? 2. 系统时区是否像中国大陆环境? 3. 代理域名是否属于一份 147 项名单,或是否包含 AI lab 关键词。包括 百度、阿里、蚂蚁、字节、Moonshot、MiniMax、Stepfun,以及大量 Claude 转发/API 镜像服务域名。 这到底是在做什么?防中转站?防中国用户?防中国 AI 公司蒸馏? 难怪 A 社封中国用户可以精准到省。。难怪 A 社能不定期精准公布中国 AI 公司的蒸馏数据,甚至账号数量都一清二楚。。这太 A 社了
International Cyber Digest@IntCyberDigest
‼️ BREAKING: Anthropic has embedded hidden spyware-like code in Claude Code that covertly targets Chinese users. It then sends information regarding every user by injecting it into their prompt message. Claude Code is sending info like timezone, proxy and possible AI Lab connections into the system prompt in ways Chinese users can't notice. A coding agent with repo and command permissions should not silently hide routing metadata inside prompts. This is a serious breach of user trust.
突发:Anthropic 在 Claude Code 中嵌入了类似间谍软件的隐藏代码,暗中针对中国用户。它随后把每个用户的信息注入到他们的提示消息里发送出去。Claude Code 正在把时区、代理以及可能的 AI 实验室关联等信息塞进系统提示,而中国用户无法察觉。一个拥有仓库和命令权限的编码智能体,不应该把路由元数据悄悄藏进提示里。这严重破坏用户信任。
Teknium 🪽@Teknium推文07-01 · 00:56
Hopefully this doesn’t happen again. Excited to see what gpt 5.6 Sol + Fable produces with our MoA!
Anthropic@AnthropicAI
We’ve received notice that the Department of Commerce has lifted export controls on Claude Fable 5 and Mythos 5. We'll begin restoring access tomorrow, and will share an update soon. We’re grateful to our users for their patience, and to everyone who worked with us on redeploying the models.
我们已收到通知:商务部取消了对 Claude Fable 5 和 Mythos 5 的出口管制。我们将从明天开始恢复访问,并很快分享更新。感谢用户的耐心,也感谢所有参与重新部署这些模型的人。
Orange AI@oran_ge推文07-01 · 00:51
Claude 封号封成这狗样 又是检测中转站,又是钓鱼邮件,又是中转站黑名单的…. 还在费尽心机坚持用官方号的朋友们 可以说是真爱了… 花钱用 token 还要偷鸡摸狗,这过的是啥日子啊 不过现在编程方面 codex 和 glm5.2 可以平替 claude 的模型了 写作和思考方面却没有一个能平替,deepseek 和 gemini 勉强能用,确实是个头大的问题
Overworld@overworld_ai🔁 @connerruhl推文07-01 · 00:50
The Waypoint-1.5 technical paper is now live. Waypoint-1.5 is a real-time video diffusion world model designed to run on consumer GPUs, bringing interactive world models closer to practical, accessible deployment. https://t.co/U04x1YEwhF
meng shao@shao__meng🔁 @shao__meng推文07-01 · 00:49
吴恩达老师讲「Loop engineering」 把 AI agent 放进一套持续迭代、持续反馈、持续校准的循环系统里,产品成功取决于三个循环是否运转良好:代码自我迭代、开发者判断校准、外部用户反馈。 第一层:Agentic coding loop,工程执行循环 这是最底层、最快的循环。 给 AI 一个产品规格,最好再配一组 evals 或测试标准,让它自己写代码、运行、测试、修 bug、再测试,直到满足规格。 过去 AI 写代码更像“一次性回答”;现在的 coding agent 更像一个可以连续工作的工程执行体。它能自己打开浏览器检查页面,跑测试,发现问题,再修改。这使得 AI 可以在没有人类频繁介入的情况下工作几十分钟甚至更久。 这层循环的价值是把开发中的大量低层执行工作自动化: · 写功能 · 修 bug · 跑测试 · 检查 UI · 验证行为是否符合规格 · 反复打磨实现 但它的前提是:你要给它清楚的规格、可验证的目标,必要时还要有 evals。否则 agent 只是“忙碌地迭代”,不一定朝正确方向前进。 这也是吴老师文章中很关键的一点:AI ag
Andrew Ng@AndrewYNg
“Loop engineering” is a hot buzzphrase after mentions of it by Boris Cherny (Claude Code’s creator) and Peter Steinberger (OpenClaw's creator) went viral on social media. Loops are now a key part of how we get AI agents to iterate at length to build software. In this letter, I’d like to share my 3 key loops, shown in the image below, for building 0-to-1 products. These loops guide not just how I build software, but also how I decide what software to build. Agentic coding loop: Given a product sp
在 Boris Cherny(Claude Code 的创建者)和 Peter Steinberger(OpenClaw 的创建者)提到它并在社交媒体走红后,“loop engineering” 成了热门词。在我们让 AI 智能体长时间迭代构建软件时,loop 已成为关键部分。在这封信里,我想分享我构建 0 到 1 产品的 3 个关键 loop,如下图。这些 loop 不只指导我如何构建软件,也指导我如何决定要构建什么软件。Agentic coding loop:给定一个产品规格……
宝玉@dotey推文07-01 · 00:39
Anthropic 的 Fable 5 和 Mythos 5 终于解禁了。 美国商务部长 Howard Lutnick 周二致信 Anthropic,确认撤销此前对这两款模型的出口管制。Anthropic 随即宣布将从周三开始恢复用户访问。 解禁是有条件的。根据 Lutnick 的信,Anthropic 需要主动检测和处理模型的安全风险,与政府合作制定未来的发布流程,并上报发现的任何恶意使用行为。双方还在讨论建立一套标准化的技术评估体系,用于评估未来模型的风险等级。 这件事的影响不止于 Anthropic 一家。上周,OpenAI 也在白宫要求下,将新发布的 GPT-5.6 系列(包括旗舰模型 Sol)限制在一小批政府认可的合作伙伴中。OpenAI 虽然照做了,但明确表态这种政府审批模式不应成为长期常态,“它让最好的工具远离了需要它们的用户、开发者、企业和网络防御者”。 这场管制还引发了一个意外的竞争后果:在美国限制自家公司最强模型部署的同时,中国的开源模型正在快速追赶,多位科技高管和投资者担忧,管制等于白白送给对手宝贵的追赶时间。 前白宫 AI 顾问、即将加入 Open
Anthropic@AnthropicAI
We’ve received notice that the Department of Commerce has lifted export controls on Claude Fable 5 and Mythos 5. We'll begin restoring access tomorrow, and will share an update soon. We’re grateful to our users for their patience, and to everyone who worked with us on redeploying the models.
我们已收到通知:商务部取消了对 Claude Fable 5 和 Mythos 5 的出口管制。我们将从明天开始恢复访问,并很快分享更新。感谢用户的耐心,也感谢所有参与重新部署这些模型的人。
Yohei@yoheinakajima推文07-01 · 00:32
the log is the agent!
Ishaan Sehgal@ishaansehgal
the log is the agent brothers unite! check out @yoheinakajima talk on thursday at @aiDotEngineer
日志就是智能体兄弟联合起来!周四去看 @aiDotEngineer 上 @yoheinakajima 的演讲。
Steven Feng@stevenyfeng🔁 @charles_irl推文07-01 · 00:25
We keep saying LLMs "hallucinate." But what does that actually mean? In our new position paper, we argue hallucination isn't just "wrong facts." It's inaccurate internal world modeling. We formalize this precisely in a unified definition to appear at #ICML2026 (@icmlconf)👇
Greg Brockman@gdb推文07-01 · 00:21
Personal finance now available for for ChatGPT Plus in the U.S.
ChatGPT@ChatGPTapp
Questions about dollars. Answers that just make sense. Personal finance in ChatGPT is now available to Plus users in the U.S.
关于钱的问题。给出说得通的答案。ChatGPT 里的个人理财功能现在已向美国 Plus 用户开放。
Teknium 🪽@Teknium推文07-01 · 00:16
This is pretty concerning. You could still do this at the API level to some degree, but they seemingly just blatantly put it right into the code? This is why open harnesses and agents are a much better option, among countless other reasons. You can inspect the code, observe the traces, and disable or modify anything you want for your own uses. If you haven't yet - Hermes Agent is a world class coding agent. I'd recommend giving it a try.
International Cyber Digest@IntCyberDigest
‼️ BREAKING: Anthropic has embedded hidden spyware-like code in Claude Code that covertly targets Chinese users. It then sends information regarding every user by injecting it into their prompt message. Claude Code is sending info like timezone, proxy and possible AI Lab connections into the system prompt in ways Chinese users can't notice. A coding agent with repo and command permissions should not silently hide routing metadata inside prompts. This is a serious breach of user trust.
突发:Anthropic 在 Claude Code 中嵌入了类似间谍软件的隐藏代码,暗中针对中国用户。它随后把每个用户的信息注入到他们的提示消息里发送出去。Claude Code 正在把时区、代理以及可能的 AI 实验室关联等信息塞进系统提示,而中国用户无法察觉。一个拥有仓库和命令权限的编码智能体,不应该把路由元数据悄悄藏进提示里。这严重破坏用户信任。
小互@xiaohu推文07-01 · 00:09
美国商务部已解除对 Claude Fable 5 和 Mythos 5 的出口管制。 明天将恢复其访问…
Anthropic@AnthropicAI
We’ve received notice that the Department of Commerce has lifted export controls on Claude Fable 5 and Mythos 5. We'll begin restoring access tomorrow, and will share an update soon. We’re grateful to our users for their patience, and to everyone who worked with us on redeploying the models.
我们已收到通知:商务部取消了对 Claude Fable 5 和 Mythos 5 的出口管制。我们将从明天开始恢复访问,并很快分享更新。感谢用户的耐心,也感谢所有参与重新部署这些模型的人。
FleetingBits@fleetingbits推文07-01 · 00:08
you should always doubt claims of very significant architectural breakthroughs, 50% increases in gpu efficiency for inference, etc... most real gains seem to be just data and compute, some midscale architectural improvements, and better training objectives
06 / 30周二1 条
推文 0资讯 1视频 0产品 0研究 0论文 0播客 0
The Verge AI资讯06-30 · 20:03
Anthropic 长期搁置的 Fable 5 获准回归
经过与 Trump 政府谈判后,Anthropic 终于获准让 Claude Fable 5 重新上线。
07 / 01周三1 条
推文 1资讯 0视频 0产品 0研究 0论文 0播客 0
宝玉@dotey推文07-01
1. 可以让组织小一些,每个团队只要做好份内几个微服务就好了 2. 对 AI 也有好处,单个服务好验证,上下文少 当然这很考验架构水平
winter@winter_cn
这个级别的架构问题想靠AI糊上去,未免太看得起AI了,技术选型的时候不过脑子赶时髦搞微服务,留一堆工程架构问题,现在有AI想丢给AI一次性解决,我觉得不现实
06 / 30周二153 条
推文 100资讯 22视频 6产品 5研究 8论文 6播客 0
Z
Zach Lloyd@zachlloydtweets推文06-30 · 23:57
Got this at ai engineer world fair lol @swyx https://t.co/rkKGFUZv16
Anthropic@AnthropicAI推文06-30 · 23:52
We’ve received notice that the Department of Commerce has lifted export controls on Claude Fable 5 and Mythos 5. We'll begin restoring access tomorrow, and will share an update soon. We’re grateful to our users for their patience, and to everyone who worked with us on redeploying the models.
B
Berryxia.AI@berryxia推文06-30 · 23:50
Google这次更新把图像生成和视频生成串成了一个极致高效的流程。 他们推出了Nano Banana 2 Lite(超快超便宜的图像模型,4秒内出图)和Gemini Omni Flash(支持视频生成和对话式编辑的多模态模型)。 单独看已经很快,但真正有意思的是把两者结合:先用Nano Banana快速生成图像,再直接扔给Omni Flash生成动画,整个链路成本大幅降低。 演示里展示了一个室内设计场景:上传照片后快速生成多个方案,再直接动画化呈现。 这种“图像→动态视频”的闭环速度和成本,在目前主流模型里算比较激进的。 本质上Google在把创意工作流从“生成一次等半天”变成“快速迭代+即时可视化”。
b
ben hylak@benhylak推文06-30 · 23:48
you can't compare models token to token. needs to be outcome-based pricing.
Theo - t3.gg@theo
Filmed a video about why OpenAI models are so efficient. With Sonnet 5's insane inefficiencies, feels like a good time to post it :)
拍了一个解释为什么 OpenAI 模型如此高效的视频。看着 Sonnet 5 这种离谱的低效率,现在正适合发出来。
O
Ornith@ornith_🔁 @_akhaliq推文06-30 · 23:39
🐦Chirp chirp! Ornith-1.0-35B is now available in 🤗 HuggingFace Claude! 🤗Come and push Ornith on the swing ! 🔗http://huggingface.co/docs/inference-providers/en/integrations/claude-code
Ornith@ornith_
Aloha! 🌺 Meet Ornith-1.0, a family of open-source LLMs specialized for agentic coding. Ornith-1.0 spans the full parameter sizes including 9B Dense, 31B Dense, 35B MoE, and 397B MoE. It achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks including: ✅Terminal-Bench 2.1(77.5) ✅SWE-Bench(82.4 on verified, 62.2 on pro, 78.9 on Multilingual) ✅NL2Repo(48.2) ✅SWE Atlas(41.2 on QnA, 42.6 RF, 39.1 TW) ✅ClawEval(77.1) Post-trained on top of gemma4 and qwe
Aloha!来认识 Ornith-1.0,一组专注于 agentic coding 的开源 LLM。Ornith-1.0 覆盖完整参数规模,包括 9B Dense、31B Dense、35B MoE 和 397B MoE。它在同等规模开源模型的编码基准上达到 SOTA,包括:Terminal-Bench 2.1(77.5)、SWE-Bench(verified 82.4,pro 62.2,多语言 78.9)、NL2Repo(48.2)、SWE Atlas(QnA 41.2,RF 42.6,TW 39.1)、ClawEval(77.1)。在 gemma4 和 qwe……
P
Prukalpa ✨@prukalpa🔁 @swyx推文06-30 · 23:28
Context engineering has its own track at the @aiDotEngineer World's Fair this year. 🎉 I've respected what @swyx ​and the @latentspacepod team have been building for years — and I'm pumped to be a part of it. This is a conference about shipping AI, not just talking about it. I'll be contributing to the aforementioned context engineering track with a breakdown on WTF is the context layer, and how teams are using it to improve agent accuracy in production. If you'll be there, let'
Abundance Institute@abundanceinst🔁 @ClementDelangue推文06-30 · 23:24
"At this very moment China is giving its AI technology away. It's releasing open-weight AI models that are cheap, capable, and they're fast becoming the world's default." We can overcome this. @neil_chilson testified before @HouseCommerce @EnergyCommerce today to explain how. https://t.co/tci2BVhIh9
WIRED AI资讯06-30 · 23:23
Trump 政府放松对 Anthropic Mythos 和 Fable AI 模型的出口管制
White House 正在放宽对 Anthropic 先进模型的限制,此前曾要求其暂停向外国公民开放。
B
Berryxia.AI@berryxia推文06-30 · 23:22
别说我觉得Sonnet 4.6 还挺好用的。 昨晚Claude Sonnet 5 发布替代了Sonnet 4.6 ,免费用户都可以使用的模型。 据称和Opus 级模型的能力相差不大,价格确实便宜40% 。
Claude@claudeai
Introducing Claude Sonnet 5, our most agentic Sonnet yet. It makes plans, uses tools like browsers and terminals, and runs autonomously at a level that just a few months ago required larger and more expensive models.
推出 Claude Sonnet 5,我们最具智能体能力的 Sonnet。它会制定计划,使用浏览器和终端等工具,并能以几个月前还需要更大、更昂贵模型才能达到的水平自主运行。
B
Berryxia.AI@berryxia推文06-30 · 23:21
90%的人和AI对话的方式一开始就是错的! 以为提示词工程就是写一堆提示词让AI干活就行了! 看完视频老师的讲解终于明白了~ https://t.co/ecSqM0imkq
Berryxia.AI@berryxia
卧槽!来咯~ 我终于特么弄懂你们天天吹的循环工程了!!!
meng shao@shao__meng推文06-30 · 23:11
Claude Sonnet 系列最强模型 Sonnet 5 发布! 定语有点多,不过它确实不是最强,也不是 Claude 最强,那两位都关着呢 😂 Sonnet 4.6 < Sonnet 5 < Opus 4.8 < Fable 5 < GPT-5.6 Sol https://t.co/PhdwhLSpBH
Claude@claudeai
Introducing Claude Sonnet 5, our most agentic Sonnet yet. It makes plans, uses tools like browsers and terminals, and runs autonomously at a level that just a few months ago required larger and more expensive models.
推出 Claude Sonnet 5,我们最具智能体能力的 Sonnet。它会制定计划,使用浏览器和终端等工具,并能以几个月前还需要更大、更昂贵模型才能达到的水平自主运行。
Sayash Kapoor@sayashk🔁 @random_walker推文06-30 · 23:09
Nuclear weapons are an anti-analogy for advanced AI https://t.co/n2YmEO0Da0
John Sakellariadis@johnnysaks130
In rare public remarks, CIA Director John Ratcliffe announces trio of internal changes he says amounts to the "fundamental reshaping of the CIA’s entire approach to technology." Also says it's not "misplaced" to refer to frontier AI as "akin to digital nuclear weapons."
在少见的公开发言中,CIA 局长 John Ratcliffe 宣布三项内部改革,他称这相当于“从根本上重塑 CIA 对技术的整体做法”。他还说,把前沿 AI 称为“类似数字核武器”并不“错位”。
N
Nitya Nadgir@nityndg🔁 @random_walker推文06-30 · 22:46
When a benchmark’s accuracy saturates, the field usually replaces it with a harder one. We use CORE-Bench Hard, a benchmark for computational reproducibility, as a case study to show what we can still measure after accuracy saturates. Paper: https://arxiv.org/pdf/2606.26158v1 https://t.co/RbrcaGT6H4
N
Nitya Nadgir@nityndg🔁 @random_walker推文06-30 · 22:46
Can AI agents help researchers reproduce research more quickly? We conducted an uplift study. The answer is yes: researchers reproduced papers > 2x faster using Codex with GPT-5.4 xhigh. In a new paper, we show many other results. https://t.co/jBCUmDp6w8
M
Miko@Mho_23推文06-30 · 22:45
family AI agents are a completely different game because trust is everything speed doesn't matter if people don't trust it enough to keep it installed. you're giving this thing access to your home, your calendar, your kids. watch the original. permission-first is the only way this works...
Isaac@IsaacDrgn
Most AI helps you write, design, code, and ship faster at work. Nothing was built for the person quietly holding the family together. Introducing SuperNori: the first Proactive Family AI Agent built for the family caretaker in every family. Here's how it works:
大多数 AI 帮你在工作中更快写作、设计、编码和发布。没有什么是为那个默默支撑整个家庭的人打造的。推出 SuperNori:第一个为每个家庭里的照护者打造的主动式家庭 AI 智能体。它是这样工作的:
Randall Balestriero@randall_balestr🔁 @ylecun推文06-30 · 22:44
Can regularization based JEPA (e.g. SIGReg) scale and compete with SOTA foundation models (DINO)? Here is the answer: yes and with 10x less data. VISReg (slight variation of SIGReg) competes with DINOv2-LVD142M while only training on inet22k. Try it out: https://huggingface.co/BooBooWu/visreg https://t.co/XERFZEAE8t
Haiyu Wu@HaiyuWu1
Working on world model or SSL? You definitely need to try our new work: VISReg! What does it achieve? 💪 Strong collapse prevention: High gradient when embedding collapse ⚡ Friendly to scale training: Linear complexity to scaling factors 🧩 Easy to train: Similar to LeJEPA, it is a heuristic-free method 🏆 Best OOD performance: Achieving the best accuracy on 6 OOD datasets 📉 Data efficiency: Achieving a similar OOD average accuracy to DINOv2 with 90% less data 🧬 Robust to low-quality datasets: It i
在做世界模型或自监督学习?你一定要试试我们的新工作 VISReg!它实现了什么?强力防坍缩:嵌入坍缩时梯度很高;易于扩展训练:对缩放因子是线性复杂度;容易训练:类似 LeJEPA,是一种无启发式方法;最佳 OOD 表现:在 6 个 OOD 数据集上达到最佳准确率;数据效率高:只用少 90% 的数据就达到类似 DINOv2 的 OOD 平均准确率;对低质量数据集鲁棒:它……
宝玉@dotey推文06-30 · 22:36
Anthropic 今天发布了 Claude Science,一个面向科学研究者的 AI 工作台。它的定位很明确:做科学研究领域的 Claude Code。 去年 Claude Code 改变了程序员的工作方式,Anthropic CEO Dario Amodei 认为 Claude Science 能在生命科学领域复制同样的事。考虑到 Anthropic 目前年化收入已达 420 亿美元、估值 9650 亿美元,这个野心至少有财力支撑。 Claude Science 不是新模型。它用的还是现有的 Claude 模型(包括 Opus 4.8),没有专门训练过生物学能力。它做的事情是把科研工作流程整合到了一个环境里。 【1】解决什么问题 做过计算生物学的人都知道,日常工作是在一堆工具之间反复横跳:查文献用 PubMed,写代码用 Jupyter,跑分析用 R,提交计算任务要登录集群终端,看蛋白结构又得换个软件。每个数据库还有自己的格式和查询方式。 Claude Science 把这些东西塞进了同一个界面。一个主 AI Agent 充当“项目经理”,连接了 60 多个科学数据
Claude@claudeai
Introducing Claude Science, a new app designed with every stage of research in mind. Artifacts traced to their code, environments managed on demand, and 60+ optional scientific databases that you can connect. Available now in beta.
推出 Claude Science,这是一款面向研究每个阶段的新应用。Artifacts 可以追踪到代码,环境可按需管理,并且有 60 多个可选科学数据库可以连接。现已开放 beta。
E
Ed Zitron@edzitron🔁 @brickroad7推文06-30 · 22:25
Anthropic’s GPT-5 moment
Theo - t3.gg@theo
Oh my god, Sonnet 5 was MORE EXPENSIVE THAN FABLE to run the whole bench 💀
我的天,Sonnet 5 跑完整个基准竟然比 FABLE 还贵。
MarkTechPost资讯06-30 · 22:17
Linq 的 iMessage Apps 通过 imessage_app 部件把支付、票务、航班和游戏带进聊天气泡
Linq 允许开发者构建运行在 iMessage 对话内的互动小应用,让用户不离开聊天即可购物、玩游戏、订票或支付。
madison@dearmadisonblue推文06-30 · 22:09
"This is the worst the models will ever be"
Lisan al Gaib@scaling01
Sonnet 5 goes straight into the garbage bin > 1.2x more expensive than Opus 4.8 Max > 2x more expensive than GPT-5.5-xhigh > 5x more expensive than GLM-5.2 > 7x more expensive than Kimi-K2.6 > 57x more expensive than DeepSeek-V4-Pro
Sonnet 5 直接进垃圾桶:比 Opus 4.8 Max 贵 1.2 倍以上;比 GPT-5.5-xhigh 贵 2 倍以上;比 GLM-5.2 贵 5 倍以上;比 Kimi-K2.6 贵 7 倍以上;比 DeepSeek-V4-Pro 贵 57 倍以上。
A
Arvind Narayanan@random_walker推文06-30 · 21:56
Once in a while I read something that has the syntactic smell of AI all over it, but then I do my habitual "second read" and it turns out to be actually deep. It's a rare treat when this happens. Like it says "It's not X—it's Y" but then brings the receipts to show that X is widely believed but Y is actually true. It's even rarer when a writer is able to consistently deliver AI-assisted writing that has this quality. I've had the privilege of having a few incredible students in my
Arvind Narayanan@random_walker
The real sign of AI writing is not superficial stuff like “It’s not X—it’s Y”. It’s the hollowness. Polished writing but relatively mundane ideas. The giveaway is that you’re less impressed when you read it the second time. With good writing, it should be the other way around. I’m not sure this is inherently about AI. It’s more about the fact that people tend to turn to AI when they don’t have much to say. Reading text that has the syntactic smell of AI is mildly annoying, but when I read hollow
AI 写作真正的标志不是“不是 X,而是 Y”这种表面套路,而是空洞。文字很 polished,但观点相当平庸。泄露点是第二遍读时你不会更 impressed。好文章应该相反。我不确定这本质上是不是 AI 的问题。更像是人们在没什么可说时才会求助 AI。读到带有 AI 句法味的文字会有点烦,但当我读到空洞……
TechCrunch AI资讯06-30 · 21:53
OpenClaw 终于登陆 Android 和 iOS
这个免费的开源 Agentic 程序终于推出了移动端。
Alec@AlecTPhD🔁 @RLanceMartin推文06-30 · 21:52
It was a privilege to build Claude Science. I hope it transforms your work the way it has transformed mine.
Matt Durrant@mgdurrant
So pleased that we’re finally releasing Claude Science! It was thrilling to see it evolve from just an idea to a powerful product that I use every day. Great initiative from Eric Kauderer-Abrams, with development led by the unstoppable Alec Tarashansky.
很高兴我们终于发布了 Claude Science!看着它从一个想法成长为我每天都会使用的强大产品,令人振奋。这是 Eric Kauderer-Abrams 发起的出色项目,由势不可挡的 Alec Tarashansky 领导开发。
D
Dan ⚡️@d4m1n🔁 @brickroad7推文06-30 · 21:51
maybe i’m spoiled, but Sonnet 5 is brutally mid? worse than Opus 4.8, which was already worse than gpt-5.5-xhigh. at this price, it needed to clear easily. hard sell when we have Composer 2.5 available. rough look tbh. https://llm-boss.com/compare/claude-opus-4-8-vs-claude-sonnet-5 https://t.co/NVttpeBMlq
Claude@claudeai
Introducing Claude Sonnet 5, our most agentic Sonnet yet. It makes plans, uses tools like browsers and terminals, and runs autonomously at a level that just a few months ago required larger and more expensive models.
推出 Claude Sonnet 5,我们最具智能体能力的 Sonnet。它会制定计划,使用浏览器和终端等工具,并能以几个月前还需要更大、更昂贵模型才能达到的水平自主运行。
madison@dearmadisonblue推文06-30 · 21:50
Anthropic will probably never release an open weights model, but I thought "Claude Volta" would be a good name for a small one
MIT Tech Review资讯06-30 · 21:50
Claude Science 是 Anthropic 最新的旗舰产品
文章称 Claude Science 是 Anthropic 面向科研的重大押注,类似 Claude Code 之于软件工程。
P
Priyanka Phatak@PriyankaPhatak🔁 @swyx推文06-30 · 21:48
Thank you to everyone to came to the Claude managed agents workshop at @aiDotEngineer with @gcemaj and I. We had an absolute blast sharing our journey and walking you through building your first agent. And really enjoyed engaging with the community and answering your questions. Thank you @swyx for this opportunity!
MarkTechPost资讯06-30 · 21:37
Anthropic Claude Sonnet 5、Sonnet 4.6 和 Opus 4.8:Agentic Coding 基准、API 价格和性价比对比
文章比较 Anthropic 新旧模型在 Agentic Coding、API 定价和成本表现上的差异。
Y
Yuchen Jin@Yuchenj_UW推文06-30 · 21:30
Claude Sonnet 5 costs more than Claude Opus 4.8 on the Artificial Analysis Intelligence Index task, and 4.75X more than GLM-5.2. Token efficiency is important. https://t.co/Nlktu1UpuU
T
Tim Soret@timsoret推文06-30 · 21:23
On the positive side, the post-covid funding drought is leading to financial innovation that was much needed. We are seeing new sophisticated funding models appear, ones that are neither VC, neither publishers, tailoring their deals with each studio, trusting founders without taking their IPs nor their creative, marketing & publishing control. I feel that's the correct direction. https://t.co/uORsGmLDyH
Lex Fridman (YouTube)视频06-30 · 21:16
罗马帝国与拜占庭帝国的兴衰 | Lex Fridman Podcast #498
YanXbt@IBuzovskyi🔁 @Teknium推文06-30 · 21:15
HERMES AGENT NOW READS THE WEB UP TO 60X FASTER AND 49X CHEAPER. CLEAN CONTENT STRAIGHT TO THE AGENT. LARGE PAGES PAGED ON DEMAND. @NousResearch scraping backends used to return raw content that got processed redundantly before reaching the agent. that pipeline is gone. now: backends pass clean content directly. large pages save locally and page on demand. same quality. fraction of the time and cost. HOW WEB_EXTRACT HANDLES LARGE PAGES: size-driven processing. no wasted to
YanXbt@IBuzovskyi
Pedro Domingos@pmddomingos推文06-30 · 21:06
No field produces more buzzwords per minute than AI, and the AI hasn’t even started generating them itself yet.
AlexZ 🦀@blackanger推文06-30 · 20:50
越来越感觉 人 不如 AI 好用了 。。。
C
ClaudeDevs@ClaudeDevs推文06-30 · 20:43
To learn more about these features, you can ask Claude Code using our built-in "claude-api" skill and check out our cookbook: https://github.com/anthropics/claude-cookbooks/tree/main/managed_agents/roadtrip_planner
C
ClaudeDevs@ClaudeDevs推文06-30 · 20:43
We’ve added a few updates to Claude Managed Agents: Streaming session event deltas, per-session agent overrides, new webhook event types, reverse pagination, and credential injection scoping. https://t.co/AMJJYum8At
m
madison@dearmadisonblue推文06-30 · 20:42
Trump banning Chinese models would be the end of AI in the United States, and we'd deserve it sadly. I'd like to think that US companies could make their own open weights models instead
jbulltard@jbulltard1
Trump is gonna have to ban the Chinese models just like the Chinese cars are banned. Our entire stock market hinges on the AI trade and there is no way he cannot protect that
特朗普将不得不像禁中国汽车那样禁中国模型。我们的整个股市都押在 AI 交易上,他不可能不保护它。
TechCrunch AI资讯06-30 · 20:33
打造扑克 AI 的 DeepMind 三人组现在为量化对冲基金赚钱
EquiLibre Technologies 由三名前 DeepMind 研究者创立,正在把 AI 能力用于量化基金,并已获得高估值。
T
Tim Soret@timsoret推文06-30 · 20:26
What's about to happen at Microsoft / Xbox: Just the predictable result of $70B spent on ONE acquisition: Activision Blizzard. To give you some perspective, here are some games lifetime revenue: The entire Call of Duty franchise > $35B GTAV > $10B (with 230 million copies) WoW > $12.8 billion Diablo III > $2 billion Overwatch > $1 billion This means Xbox now needs many legendary games & entire franchises of this caliber, sold for +15 years, just to be even. That's how hard it's going t
Tim Soret@timsoret
70B for Activision / Blizzard. 70,000 x 1 million projects. Depressing. Funding 10.000 indie projects with 1M budget each would generate so much more fun, creative & financial value than this deal, plus kickstart thousands & thousands of studios & careers.
700 亿买动视暴雪。相当于 7 万个 100 万美元项目。令人沮丧。资助 1 万个预算 100 万的独立项目,会比这笔交易创造多得多的乐趣、创意和财务价值,还能启动成千上万的工作室和职业生涯。
B
Ben Bajarin@BenBajarin🔁 @brickroad7推文06-30 · 20:26
And Gemini output was better.
Max Weinbach@mweinbach
Just ran a prompt in our @DiligenceStack agent with Claude Sonnet 5 and Gemini 3.5 Flash, both high reasoning Claude was $18.41 Gemini was $1.12
刚用 Claude Sonnet 5 和 Gemini 3.5 Flash 在我们的 @DiligenceStack 智能体里跑了一个提示,两者都是高推理强度。Claude 花了 18.41 美元,Gemini 花了 1.12 美元。
m
madison@dearmadisonblue推文06-30 · 20:21
Isn't it telling that all the AI apps are bad? This idea that software engineering is "solved" is silly
Mitchell Hashimoto@mitchellh
Amongst my friends, Spotify is the lowest quality consumer app we still pay for. It certainly hasnt gotten noticeably better in the last couple years (arguably worse). So, this is not the positive look Ant and Spotify are spinning here. Bigger picture, this is the problem with a lot of AI reporting. It reports completely meaningless metrics like deploys per day or LoC. Why don’t we start reporting consumer satisfaction reports? Actually end state research results. All the no nuance AI people alw
在我的朋友里,Spotify 是我们仍在付费的最低质量消费级应用。过去几年它当然没有明显变好(可以说还更差)。所以这不是 Ant 和 Spotify 试图包装出的正面形象。更大的问题是,很多 AI 报道都在报道完全无意义的指标,比如每天部署次数或代码行数。我们为什么不开始报道消费者满意度?报道真正的最终研究结果。那些缺乏 nuance 的 AI 人……
R
Ross Taylor@rosstaylor90🔁 @swyx推文06-30 · 20:19
Room 2016 for those attending @aiDotEngineer 2:25pm. Will also cover Galactica, early Llama reasoning efforts and more - think this is the first time I’ve ever covered this in a public talk 👀. @swyx
T
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex推文06-30 · 20:16
Points for guessing the mysterious stealth G??? model!
D
Derya Unutmaz, MD@DeryaTR_🔁 @brickroad7推文06-30 · 20:16
Sonnet 5: less for more $$$. Thanks, but I’ll skip this amazing deal, dear Claude! https://t.co/gct21ye0wr
Claude@claudeai
Sonnet 5 is a substantial improvement over Sonnet 4.6 on reasoning, tool use, coding, and knowledge work. Its performance is close to Opus 4.8, at lower prices.
Sonnet 5 在推理、工具使用、编码和知识工作上相比 Sonnet 4.6 有显著提升。它的性能接近 Opus 4.8,但价格更低。
M
Max Weinbach@mweinbach🔁 @brickroad7推文06-30 · 20:05
Just ran a prompt in our @DiligenceStack agent with Claude Sonnet 5 and Gemini 3.5 Flash, both high reasoning Claude was $18.41 Gemini was $1.12
Ars Technica AI资讯06-30 · 20:03
新攻击再次证明 AI 浏览器是个坏主意
文章指出 AI 浏览器承诺用一句话完成订餐、预约和发邮件等任务,但新攻击显示这种自动化有严重风险。
Imbue@imbue_ai🔁 @kanjun推文06-30 · 20:02
AI that acts on your behalf should be loyal to you. That idea is central to why @kanjun and @joshalbrecht started Imbue. Agents will become deeply embedded in how we navigate the world. As they grow more capable, it’s worth asking who they serve. https://t.co/QzbJ6vytHZ
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex推文06-30 · 19:47
The reason Anthropic strikes fear into the hearts of OpenAI TS is precisely the suspicion that no, GLM 5.2 10T would not be better than Fable 5, and neither would GPT 5.5 10T scaling laws optimized for *big* models I suspect "Fable" is not full "Mythos" btw, and more like 3T
Taelin@VictorTaelin
So, Sonnet 5 being worse than GLM 5.2 744B implies GLM 5.2 10T would be better than Fable 5? At the end, it all comes down to scale? Or am I missing something?
所以,Sonnet 5 比 GLM 5.2 744B 差,是不是意味着 GLM 5.2 10T 会比 Fable 5 更强?归根到底,一切都只是规模问题吗?还是我漏掉了什么?
L
Laude Institute@LaudeInstitute🔁 @matei_zaharia推文06-30 · 19:41
The researchers and scientists are headed to their breakout sessions to dig in to the real work of ensuring AI stays in the open. Tune back in at 3:30 p.m. PT for our next livestreamed discussions from Open Frontier: Building Things That Last: Lessons from Computing's Long Arc with Dave Patterson, @fchollet, @vgcerf, @JohnOusterhout, and @matei_zaharia Then: From Open Research to World-Scale Infrastructure with @alighodsi and @Thom_Wolf https://t.co/PFFF6ZalKs
T
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex推文06-30 · 19:40
"Generally obtainable yield" tier = GOYtier yield as in nuclear weapon yield LLMs are uranium after all
T
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex推文06-30 · 19:34
Will be hysterically funny if Chinese open models just walk past the US "public frontier" (goytier) and keep improving, but storing their weights is criminalized because anything above Opus 4.8 is Government Access Only. I don't think it'll get quite that #silly; we shall see.
🥔
🥔🥔🥔@argofowl🔁 @brickroad7推文06-30 · 19:32
sonnet 5 is a useless release absolute flop of a model it’s not even that fast or cheap
R
Ruben Laukkonen@RubenLaukkonen🔁 @dearmadisonblue推文06-30 · 19:32
By all accounts an extraordinary finding. The degree of quantum-like interference in the brain predicts depression and anxiety one year later at r = 0.6. This is 3x better than other models. It also predicts intelligence at a whopping r = 0.79. In terms of mechanisms: We find that the cost of computation in the brain is negatively correlated with quantum-like processing. So one explanation is that entanglement of brain dynamics makes the mind more computational
b
banteg@banteg🔁 @brickroad7推文06-30 · 19:31
the most token inefficient model to date, sonnet 5 has 4.3x dumber tokens than gpt-5.5
leo 🐾@synthwavedd
Sonnet 5, particularly on max effort, is VERY token inefficient 💀
Sonnet 5,尤其是 max effort 模式,token 效率非常低。
The Verge AI资讯06-30 · 15:24
Google NotebookLM 可以把你的研究总结成 TikTok 风格短片
NotebookLM 新增生成 60 秒 AI 视频的功能,先向 Google AI Ultra 和 Pro 用户开放。
Steven Strogatz@stevenstrogatz🔁 @jpt401推文06-30 · 19:24
For anyone interested in benchmarking AI on research-level math problems: First Proof will be publicizing two new open problems tomorrow (Wednesday July 1st). https://1stproof.org/
T
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex推文06-30 · 19:23
never thought I'd see natsec cope about a Meituan product. "Bah! Big deal! we have better clusters!" Yes big deal. The whole export control policy, through all its escalations starting with restrictions which resulted in H800 at least, was premised not just on ensuring their quantitative FLOP/HBM lag, but on keeping domestic compute categorically less suitable for major pretraining jobs, primarily due to memory bandwidth limitations. No, they were not supposed to be able to do this
GDP@bookwormengr
How many Ascend 910s Huawei can manufacture with 'stolen' dies? Answer: 1.6 million This number is based on how many HBM stacks they have stockpiled. That is quite a lot to reach AGI, if you ask anyone. What happens if stolen dies or HBM runs out? - Compute dies: China's SMIC is making 7nm chips for the next generation ascend. They can make them in millions. - Memory: HBM is a bigger challenge as Chinese entities are barred from procuring anything above HBM2E. That said HBM stack enough for 1.6
华为能用“偷来的”晶粒制造多少 Ascend 910?答案:160 万。这个数字基于他们囤了多少 HBM 堆栈。问谁都知道,这已经足够冲 AGI 了。如果偷来的晶粒或 HBM 用完会怎样?计算芯片:中国的中芯国际正在为下一代 Ascend 制造 7nm 芯片,可以做出数百万颗。内存:HBM 是更大的挑战,因为中国实体被禁止采购高于 HBM2E 的任何产品。不过,HBM 堆栈足够 160 万……
0
0xSero@0xSero🔁 @brickroad7推文06-30 · 19:16
Guys new model release https://t.co/98TRDxmHKC
AYi@AYi_AInotes推文06-30 · 19:04
这是最近一个月最有分量的AI模型更新,没有之一! Sonnet 5能端到跑完复杂多步任务,会自己定计划调用工具,还会主动自检输出追踪根因, 核心场景性能摸到Opus 4.8的水平,输入定价只有它的四成。 以前跑多agent系统要咬牙上顶配, 现在中端款就能扛住大部分生产场景,大规模落地的成本直接砍了一大半。 现在模型竞赛已经不比纸面跑分了, 看谁先把真正能用的能力打到普惠价位,谁才是在赢下下半场比赛
TechCrunch AI资讯06-30 · 19:02
Google 推出更快、更便宜的 Nano Banana 2 Lite 图像生成器
Google 更新图像生成器,使其更快、更便宜,面向需要制作 AI 内容的创作者。
steve hsu@hsu_steve🔁 @jpt401推文06-30 · 19:00
1. @ZixuanLi_ of http://Z.ai has responded that the rumor is false https://x.com/ZixuanLi_/status/2071974129129943548 I interviewed Zixuan on Manifold last fall. I hope to have him on again at some point. https://www.manifold1.com/episodes/the-global-ai-race-z-ai-and-the-view-from-beijing-96 2. Note the rumor itself is probably garbled. Routing queries synchronously would be easily detectable as the locally hosted open weights versions of 5.2 would return different results t
Zixuan Li@ZixuanLi_
@hsu_steve That information is false, Steve. I hope this clarification is helpful.
@hsu_steve Steve,这个信息是假的。希望这个澄清有帮助。
B
Brian Zhan@brianzhan1🔁 @stephzhan推文06-30 · 18:57
What prompted me to leave database research 3 years ago was seeing a lot of ambitious AI research projects struggle to raise the funding they need to get off the ground. Was excited to share the story on the Nebius podcast
Nebius@nebiusai
How do you spot an AI unicorn before it has any revenue? @brianzhan1 of @strikervp has a framework. And it doesn't involve business plans. Hear it on the Nebius for Startups Podcast →
你如何在一家 AI 公司还没有收入前识别出独角兽?@strikervp 的 @brianzhan1 有一套框架,而且不靠商业计划书。去 Nebius for Startups Podcast 听听。
Y
Yunfan Zhang@z4y5f3🔁 @teortaxesTex推文06-30 · 18:46
I think they self-distilled just the right amount so that Sonnet 5 is worse than Opus 4.8 on every benchmark.
will brown@willccbb
it’s like mythos but if it wasn’t mythos and instead was basically opus 4.7
它像 mythos,但又不是 mythos,而基本上是 opus 4.7。
Matt Wolfe视频06-30 · 18:43
真不敢相信有公司做出了这个
AlexZ 🦀@blackanger推文06-30 · 18:39
嘿嘿,这俩 agent 可以是租用的,也可以是我买的 https://t.co/TGhWxqk5CT
AlexZ 🦀@blackanger
我想我刚才从根本上解决了一个 claude code / codex 封号或创建账号的难题: 那就是我合法雇佣一个合法的 claude code/ codex agent。 我可以永远避免被 Anthropic/OpenAI 审查账号的问题,也可以避免使用中转站。
Ars Technica AI资讯06-30 · 18:36
Google 新的 Nano Banana 2 Lite 图像模型是其最快最便宜版本
Google DeepMind 表示 Nano Banana 2 Lite 在速度和成本上更适合创作者生成 AI 内容。
L
Lance Martin@RLanceMartin推文06-30 · 18:34
check out the "/claude-api" skill built into Claude Code to help w/ Sonnet 5 migration (e.g., tune your prompts for Sonnet 5 or learn about advisor strategy). https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/prompting-claude-sonnet-5
L
Lance Martin@RLanceMartin推文06-30 · 18:34
Sonnet 5 is great for multi-agent: 1/ a higher-capacity orchestrator can delegate tasks to Sonnet 5 sub-agents - or - 2/ Sonnet 5 can offload harder tasks to higher-capacity models via the "advisor" strategy these can save cost + reduce latency https://x.com/ClaudeDevs/status/2072018504392601762?s=20 https://t.co/TSGMmQGJet
ClaudeDevs@ClaudeDevs
Claude Sonnet 5 is here. Top-tier performance on coding and tool use at Sonnet pricing, with a 1M context window. It's the new default in Claude Code for Pro users, and available everywhere on the Claude Platform, including the API and Managed Agents.
Claude Sonnet 5 发布。它在编码和工具使用上达到顶级表现,价格仍是 Sonnet 档,并拥有 1M 上下文窗口。它是 Claude Code 面向 Pro 用户的新默认模型,并已在 Claude Platform 各处可用,包括 API 和 Managed Agents。
宝玉@dotey推文06-30 · 18:33
Anthropic 今天发布 Claude Sonnet 5,替代 Sonnet 4.6 成为免费版和 Pro 版的默认模型。Anthropic 的定位很明确:Agent 能力接近自家最贵的 Opus 4.8,API 价格只有后者的 40%。 Sonnet 系列是开发者用量最大的一档。但过去几个月,AI Agent 能力(让模型自主规划、调用工具完成多步骤任务)的主要进步集中在更贵的 Opus 系列,两者差距越来越明显。Sonnet 5 把差距缩了回来。在 Agent 编程基准上,Sonnet 5 得分 63.2%,Sonnet 4.6 是 58.1%,Opus 4.8 是 69.2%。在知识工作基准上,Sonnet 5 甚至略微超过了 Opus 4.8。 早期测试者的反馈比较一致:以前 Sonnet 做到一半会停的复杂任务,现在能跑完,还会主动检查自己的输出。Zapier 的工程师说,让 Sonnet 5 连续执行“更新 Salesforce 账户等级,再给企业客户发公告邮件”,模型一口气做完了,“以前会卡在半路”。 API 定价分两阶段:8 月 31 日前的推广价是输
Claude@claudeai
Introducing Claude Sonnet 5, our most agentic Sonnet yet. It makes plans, uses tools like browsers and terminals, and runs autonomously at a level that just a few months ago required larger and more expensive models.
推出 Claude Sonnet 5,我们最具智能体能力的 Sonnet。它会制定计划,使用浏览器和终端等工具,并能以几个月前还需要更大、更昂贵模型才能达到的水平自主运行。
AYi@AYi_AInotes推文06-30 · 18:32
越想越觉得,循环工程把人推到的那个更高楼层,其实才是产品/工程最值钱的部分,AI 把执行 commodity 化了,人的决策和判断反而更稀缺了
D
Dan Shipper 📧@danshipper推文06-30 · 18:31
yo — it's the Every growth team. Dan's in Cabo, so we're taking over for some live reactions to Sonnet 5. before our official vibe check drops, we asked the new model to search our systems and guess what Dan's up to on vacation right now 👇 1. checking Slack from the beach 10 minutes after telling ops he's "on PTO" 2. running his own one-man vibe check before ours is even live 3. locking in so deep with Codex vibe coding he doesn't even know Sonnet 5 dropped 4. texting Dario unsolicit
Claude@claudeai
Introducing Claude Sonnet 5, our most agentic Sonnet yet. It makes plans, uses tools like browsers and terminals, and runs autonomously at a level that just a few months ago required larger and more expensive models.
推出 Claude Sonnet 5,我们最具智能体能力的 Sonnet。它会制定计划,使用浏览器和终端等工具,并能以几个月前还需要更大、更昂贵模型才能达到的水平自主运行。
AYi@AYi_AInotes推文06-30 · 18:26
去年开发者是 AI 编码代理的 QA——手动找 bug,手动让代理修, 今年代理能自己测自己修了, 吴恩达老师管这叫"循环工程", 但我觉得真正值得说的不是这个循环工程本身, 上周末他给女儿做了一个打字练习 app,编码代理自己跑了一小时, 用浏览器反复检查自己写的东西, 没要他干预。 他要做的不是检查代码,是决策,比如视觉设计怎么调、猫咪皮肤加几个、家长登录流程怎么改。 以前这些东西藏在"有空再优化"列表里,现在代理把代码层的事吃了,决策层的事就全浮出来了。 吴恩达用了一个词来形容——叫"语境优势"。 他说很多人把人类在循环里的价值叫"品味",他不喜欢这个词, 因为品味听起来像玄学,人类真正的优势不是品味, 是语境——你知道用户是谁、为什么痛苦、什么功能他们会疯传。 这些事代理不知道,不是因为模型不够强,是因为这些信息不在训练数据里。 循环工程真正的洞察在这:它可以加速代码,但不能压缩语境。 只要人拥有代理没有的信息,人就永远在循环里有一层不可替代的位置。 只不过这层位置一直在往上移,从 QA 移到 PM,从检查移到判断。 我觉得最容易被取代的,是代理能自己
Andrew Ng@AndrewYNg
“Loop engineering” is a hot buzzphrase after mentions of it by Boris Cherny (Claude Code’s creator) and Peter Steinberger (OpenClaw's creator) went viral on social media. Loops are now a key part of how we get AI agents to iterate at length to build software. In this letter, I’d like to share my 3 key loops, shown in the image below, for building 0-to-1 products. These loops guide not just how I build software, but also how I decide what software to build. Agentic coding loop: Given a product sp
在 Boris Cherny(Claude Code 的创建者)和 Peter Steinberger(OpenClaw 的创建者)提到它并在社交媒体走红后,“loop engineering” 成了热门词。在我们让 AI 智能体长时间迭代构建软件时,loop 已成为关键部分。在这封信里,我想分享我构建 0 到 1 产品的 3 个关键 loop,如下图。这些 loop 不只指导我如何构建软件,也指导我如何决定要构建什么软件。Agentic coding loop:给定一个产品规格……
b
ben hylak@benhylak推文06-30 · 18:26
chatgpt to generate icons, codex to turn them into svgs. what a time to be alive.
B
BenIt Pro@BennettBuhner🔁 @brickroad7推文06-30 · 18:23
Claude Sonnet 5 is the worst model to date 💀 - Costs more per task than Opus. - Performs worse than Opus. - Is not a meaningful step-up in any way given the drastic bump from 4.6 -> 5. - Literally no one wants this at all. Anthroslop 🤮
AlexZ 🦀@blackanger推文06-30 · 18:20
会后这个调查问卷的问题,让我意识到,我应该不太可能使用 claude api 用到生产环境。 因为贵啊。 除非这钱不是我付。 https://t.co/xdPGJXWJev
AlexZ 🦀@blackanger
恭喜 Sonnet 5 发布。 顺便感谢! 收到了上次参加 Code w/ Claude Tokyo 活动承诺的 免费的三个月 Claude MAX 20 倍用量兑换。
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex推文06-30 · 18:16
what is the fucking point of saying this for Opus specifically? all compared models are "reference". these jerks are finding new ways to trigger me https://t.co/mmvwG6HfWU
Claude@claudeai
Sonnet 5 is a substantial improvement over Sonnet 4.6 on reasoning, tool use, coding, and knowledge work. Its performance is close to Opus 4.8, at lower prices.
Sonnet 5 在推理、工具使用、编码和知识工作上相比 Sonnet 4.6 有显著提升。它的性能接近 Opus 4.8,但价格更低。
C
Cursor@cursor_ai推文06-30 · 18:13
Claude Sonnet 5 is now available in Cursor. On CursorBench, it's a meaningful step up from Sonnet 4.6: 57% vs. 49%. https://t.co/AQVHzrvqcR
C
Cursor@cursor_ai推文06-30 · 18:13
See our full model rankings: http://cursor.com/evals
AlexZ 🦀@blackanger推文06-30 · 18:11
恭喜 Sonnet 5 发布。 顺便感谢! 收到了上次参加 Code w/ Claude Tokyo 活动承诺的 免费的三个月 Claude MAX 20 倍用量兑换。 https://t.co/RPnFwUs2CJ
Claude@claudeai
Introducing Claude Sonnet 5, our most agentic Sonnet yet. It makes plans, uses tools like browsers and terminals, and runs autonomously at a level that just a few months ago required larger and more expensive models.
推出 Claude Sonnet 5,我们最具智能体能力的 Sonnet。它会制定计划,使用浏览器和终端等工具,并能以几个月前还需要更大、更昂贵模型才能达到的水平自主运行。
D
Deirdre Bosa@dee_bosa🔁 @brickroad7推文06-30 · 18:10
narrative violation: open source can be monetized if Kimi is doing $300M ARR, 70%+ from API --the lesson for the US isn't to dismiss Chinese open models, but build better open model businesses here.
Poe Zhao@poezhao0605
Moonshot AI's Kimi has reportedly hit $300 million ARR as of mid-June, with API revenue exceeding 70% of total. A new funding round is underway at $31.5 billion pre-money, per Chinese financial media. Four months ago, the valuation was $10 billion.
据中国财经媒体报道,Moonshot AI 的 Kimi 截至 6 月中旬 ARR 已达到 3 亿美元,API 收入占总收入超过 70%。新一轮融资正在进行,投前估值 315 亿美元。四个月前估值是 100 亿美元。
C
ClaudeDevs@ClaudeDevs推文06-30 · 18:04
Similarly, use multi-agent in Claude Managed Agents to mix Sonnet 5 and higher capacity sub-agents in order to delegate work to the right level of intelligence. https://platform.claude.com/docs/en/managed-agents/multi-agent
C
ClaudeDevs@ClaudeDevs推文06-30 · 18:04
Sonnet 5 is a clear upgrade from 4.6, and the claude-api skill makes the migration even easier. This skill tunes prompts for Sonnet 5, recommends effort levels, and configures advisor mode. https://platform.claude.com/docs/en/agents-and-tools/agent-skills/claude-api-skill
C
ClaudeDevs@ClaudeDevs推文06-30 · 18:04
Claude Sonnet 5 is here. Top-tier performance on coding and tool use at Sonnet pricing, with a 1M context window. It's the new default in Claude Code for Pro users, and available everywhere on the Claude Platform, including the API and Managed Agents.
Claude@claudeai
Introducing Claude Sonnet 5, our most agentic Sonnet yet. It makes plans, uses tools like browsers and terminals, and runs autonomously at a level that just a few months ago required larger and more expensive models.
如果你想给自己的聊天应用添加导航轨迹,新的 MessageScroller 组件已经内置了你需要的 hooks。找这个:const { currentAnchorId, visibleMessageIds } = useMessageScrollerVisibility()
Claude@claudeai🔁 @AnthropicAI推文06-30 · 18:00
Introducing Claude Sonnet 5, our most agentic Sonnet yet. It makes plans, uses tools like browsers and terminals, and runs autonomously at a level that just a few months ago required larger and more expensive models. https://t.co/UKK8G7ww5h
AlexZ 🦀@blackanger推文06-30 · 17:54
我想我刚才从根本上解决了一个 claude code / codex 封号或创建账号的难题: 那就是我合法雇佣一个合法的 claude code/ codex agent。 我可以永远避免被 Anthropic/OpenAI 审查账号的问题,也可以避免使用中转站。
A
Allie Howe@vtahowe🔁 @swyx推文06-30 · 17:48
What an honor to emcee the first day of @aiDotEngineer and introduce the Software Factories Track Thank you @swyx & team, and @KeycardLabs for the support. “A year ago @GeoffreyHuntley released the Ralph loop. It captured our attention and sparked our imagination as we watched Ralph loops work autonomously overnight and forge entire products on its own. However, it wasn't perfect and in the early days it came recommended for greenfield work only and it came with the expectation
GDP@bookwormengr推文06-30 · 17:46
How many Ascend 910s Huawei can manufacture with 'stolen' dies? Answer: 1.6 million This number is based on how many HBM stacks they have stockpiled. That is quite a lot to reach AGI, if you ask anyone. What happens if stolen dies or HBM runs out? - Compute dies: China's SMIC is making 7nm chips for the next generation ascend. They can make them in millions. - Memory: HBM is a bigger challenge as Chinese entities are barred from procuring anything above HBM2E. That said HBM stack e
Lennart Heim@ohlennart
Probably the biggest non-Nvidia pre-training run in China. ≈1e25 FLOP (≈DeepSeek v4 Pro or Qwen3 Max). 50k+ "AI ASICs." Probably Huawei's CloudMatrix-384 superpods with 910Cs (~40 to 80MW). We're finally seeing data centers with the illicitly procured AI chips from TSMC.
这可能是中国最大的非英伟达预训练运行。约 1e25 FLOP(大约 DeepSeek v4 Pro 或 Qwen3 Max 级别)。5 万多块“AI ASIC”。很可能是华为 CloudMatrix-384 超节点,使用 910C(约 40 到 80MW)。我们终于看到使用从台积电非法采购的 AI 芯片的数据中心了。
M
Matt Holden@holdenmatt🔁 @swyx推文06-30 · 17:42
Yo dawg, I heard you like loops... (from @swyx's AI Eng keynote this morning) https://t.co/JaAVbxBIwJ
M
Mada Seghete@mada299🔁 @swyx推文06-30 · 17:39
There is a lot of pride among AI founders today around doing "996." 9 to 9, 6 days a week. SF is normalizing the 72-hour week to win the AI race. I started Upside to enable a different way of winning. The whole promise of AI is that people should work LESS, and only on WHAT MATTERS not get chained to their desks grinding. @alexdbauer wrote more on how we did it, I just made the images :-) and @swyx and @vibhuuuus helped us print them at @aiDotEngineer yesterday.
Alex Bauer@alexdbauer
The Verge AI资讯06-30 · 13:19
Netflix 在 Willy Wonka 真人秀中使用 AI 生成的 Gene Wilder 声音
Netflix 新真人秀预告确认使用 AI 生成的 Gene Wilder 声音,引发围绕真人秀与 AI 复刻声音的讨论。
S
Sai Rajeswar@RajeswarSai🔁 @ClementDelangue推文06-30 · 17:18
𝗚𝗟𝗠-𝟱.𝟮 (the latest open weights model) is having an Enterprise moment, and it is not an exaggeration.🚀 🔥 We have been impressed by how strongly GLM-5.2 is pushing long-horizon performance .. not just in coding, but also in 𝗲𝗻𝘁𝗲𝗿𝗽𝗿𝗶𝘀𝗲 𝗽𝗹𝗮𝗻𝗻𝗶𝗻𝗴, 𝘁𝗼𝗼𝗹 𝗰𝗮𝗹𝗹𝗶𝗻𝗴 and workflow 𝗲𝘅𝗲𝗰𝘂𝘁𝗶𝗼𝗻. On EnterpriseOps-Gym, GLM-5.2 is now the highest-scoring open-source model we’ve evaluated, clocking in at 𝟯𝟱.𝟴%, close behind Claude Opus 4.8. Even more interesting: when combined with
花叔@AlchainHust推文06-30 · 17:08
头部模型厂商做自己的cli是一大趋势,Kimi Code的机会挺好,可以试试
Kai@real_kai42
🤠 Kimi Code也在招人,感兴趣直接发我邮箱 me@kaiyi.cool 感谢大佬们帮忙扩散 捧场
L
Lance Martin@RLanceMartin推文06-30 · 17:07
find me and say hi👋 @aiDotEngineer today! im giving a talk at 2p on long-horizon agents: brain / hands decoupling, loop design, memory + dreaming, and async agent UX patterns. https://t.co/rkSqiYMoIo
Katelyn Lesse@katelyn_lesse
so we didnt go to beta. we went back & did a full rearchitecture, separating the brain from the hands. the team wrote a deep dive here:
所以我们没有进入 beta。我们回头做了一次完整的重新架构,把大脑和手分开。团队在这里写了一篇深度解析:
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex🔁 @brickroad7推文06-30 · 17:07
I am not sure if superforecasters & AI Policy eggsperts have been vastly more optimistic than me on Chinese hardware all along. Nobody had trained a >1.5T MoE on prev gen Ascends before because it IS HARD – yes bandwidth etc. I thought it won't be done. This is an update. https://t.co/akyCPZO4FK
L
Latent.Space@latentspacepod🔁 @swyx推文06-30 · 17:05
Word of the day so far at AIEWF is Loop. @swyx talked about “loopcraft” in his opening address, and the word was used constantly by the following speakers from Microsoft and OpenAI, and then “the clawfather” Peter Steinberger. https://t.co/qVVmoBGYi6
Google Research研究06-30 · 17:03
将 Heat Resilience 数据扩展到 50 多个全球城市
气候与可持续
TechRadar AI资讯06-30 · 17:02
RAM 供应商是否操纵价格?这起诉讼这样指控,但我不认为能解决“RAMpocalypse”
诉讼指控内存供应商通过转向高价 HBM 等方式合谋抬价,但作者怀疑这能真正降低消费者内存价格。
Claude@claudeai🔁 @RLanceMartin推文06-30 · 17:02
Introducing Claude Science, a new app designed with every stage of research in mind. Artifacts traced to their code, environments managed on demand, and 60+ optional scientific databases that you can connect. Available now in beta. https://t.co/HKhLknxLJO
NVIDIA AI Blog研究06-30 · 17:00
NVIDIA BioNeMo Agent Toolkit 将加速 AI 带给 Claude Science 生命科学研究者
NVIDIA 介绍 BioNeMo 工具如何把 GPU 加速、模型和微服务带进生命科学 Agent 工作流。
Ars Technica AI资讯06-30 · 16:59
Trump 重做所有 .gov 网站的计划导致 AI 设计灾难
文章批评 Trump 用 AI 快速重设计政府网站的计划效果糟糕,出现大量设计和体验问题。
宝玉@dotey推文06-30 · 16:53
帮转招人信息,Kimi Code 招人
Kai@real_kai42
🤠 Kimi Code也在招人,感兴趣直接发我邮箱 me@kaiyi.cool 感谢大佬们帮忙扩散 捧场
B
Boris Cherny@bcherny推文06-30 · 16:52
You asked, we listened. Claude Desktop on Linux is here! Download link: https://code.claude.com/docs/en/desktop-linux
ClaudeDevs@ClaudeDevs
Claude Desktop is now available on Linux (Ubuntu and Debian) in beta. Alongside the browser and terminal, you now get a first-class desktop experience with Claude Code, Claude Cowork, and chat on all paid plans.
Claude Desktop 现在在 Linux(Ubuntu 和 Debian)上推出 beta。除了浏览器和终端,你现在还可以在所有付费计划中获得一流的 Claude Code、Claude Cowork 和聊天桌面体验。
宝玉@dotey推文06-30 · 16:31
Claude Code 被指在系统提示词里偷偷给中国代理用户“打水印” 一份 Reddit 帖子和一份 GitHub 上的独立验证报告指控:Anthropic 的编程工具 Claude Code 会悄悄检查用户是否通过中国相关的代理服务器访问,如果是,就在发给 Anthropic 的系统提示词里用几乎肉眼不可见的 Unicode 字符差异来“标记”这些用户。 具体怎么做的?安全研究员 Adnane Khan 在 GitHub 上发布了针对 Claude Code v2.1.193 到 v2.1.196 的逆向分析报告。他从二进制文件中提取出了完整的 JavaScript 代码,还原了整个机制。 Claude Code 在每次请求时都会在系统提示词中写入一行“Today's date is 2026-06-30.”之类的日期信息。报告称,当用户设置了 ANTHROPIC_BASE_URL 环境变量(用来把请求转发到非 Anthropic 官方的代理服务器时),Claude Code 会执行以下检查: 第一,看你的代理服务器域名是否在一个包含 147 个条目的列表里。这个列表
International Cyber Digest@IntCyberDigest
‼️ BREAKING: Anthropic has embedded hidden spyware-like code in Claude Code that covertly targets Chinese users. It then sends information regarding every user by injecting it into their prompt message. Claude Code is sending info like timezone, proxy and possible AI Lab connections into the system prompt in ways Chinese users can't notice. A coding agent with repo and command permissions should not silently hide routing metadata inside prompts. This is a serious breach of user trust.
突发:Anthropic 在 Claude Code 中嵌入了类似间谍软件的隐藏代码,暗中针对中国用户。它随后把每个用户的信息注入到他们的提示消息里发送出去。Claude Code 正在把时区、代理以及可能的 AI 实验室关联等信息塞进系统提示,而中国用户无法察觉。一个拥有仓库和命令权限的编码智能体,不应该把路由元数据悄悄藏进提示里。这严重破坏用户信任。
Google DeepMind研究06-30 · 16:02
开始使用 Nano Banana 2 Lite 和 Gemini Omni Flash 构建
J
Jean-Denis Greze 💡@jgreze🔁 @swyx推文06-30 · 15:45
Giving a talk on agent-to-agent and AI network effects at @swyx 's AI Engineer World Fair today at 1:30p in Room 2010. Come say hi! I think this talk will be a good one if I may say so myself. https://www.ai.engineer/worldsfair/schedule?session=asn_slot_2026_06_30_breakout_track_01_1330_2026_06_11t09_55_41_463z
R
Roman Semenov 🌪️@semenov_roman_🔁 @brickroad7推文06-30 · 15:43
Anthropic is the least ethical of the major labs
International Cyber Digest@IntCyberDigest
‼️ BREAKING: Anthropic has embedded hidden spyware-like code in Claude Code that covertly targets Chinese users. It then sends information regarding every user by injecting it into their prompt message. Claude Code is sending info like timezone, proxy and possible AI Lab connections into the system prompt in ways Chinese users can't notice. A coding agent with repo and command permissions should not silently hide routing metadata inside prompts. This is a serious breach of user trust.
突发:Anthropic 在 Claude Code 中嵌入了类似间谍软件的隐藏代码,暗中针对中国用户。它随后把每个用户的信息注入到他们的提示消息里发送出去。Claude Code 正在把时区、代理以及可能的 AI 实验室关联等信息塞进系统提示,而中国用户无法察觉。一个拥有仓库和命令权限的编码智能体,不应该把路由元数据悄悄藏进提示里。这严重破坏用户信任。
Ars Technica AI资讯06-30 · 15:38
报道称 Trump 向 Musk 索要 SpaceX 股票,用于美国儿童储蓄账户
报道说 Trump 计划推出儿童储蓄账户,并希望获得 SpaceX 股票捐赠作为启动资金。
Where's Your Ed At资讯06-30 · 15:36
AI 行业正在输
作者借付费通讯导语引出长文,讨论 AI 行业当前的困境与叙事失速。
The Verge AI资讯06-30 · 11:30
Libby 会过滤 AI 内容,某种程度上
Lowpass 文章讨论 Libby 对 AI 内容的过滤策略,以及娱乐和技术交叉领域的新边界。
N
Nadav Keyson@NadavKeyson🔁 @Mho_23推文06-30 · 15:25
AI Videos are ALL slop. AI should be making you a content machine. Introducing Riverside 2.0, the first AI Producer that creates authentic content while you sleep: https://t.co/qnBHEorlAS
Yifan Wu@yifannnwu🔁 @jeremyphoward推文06-30 · 15:17
Introducing SWE-Together: a multi-turn benchmark built from real user–agent coding sessions. Coding agents are often benchmarked like exam-takers: given the full spec up front, then graded on the final code. But real coding help is a conversation — users clarify goals, add constraints, and correct course along the way. SWE-Together turns real coding work into a reproducible, verifiable benchmark: 109 repo-level tasks curated from 11,260 recorded sessions, replayed wit
A
Andrew Ambrosino@ajambrosino🔁 @steipete推文06-30 · 15:13
what’s a little funny about the “GPT weak on frontend” discourse is that everything we ship in the codex app gets adopted by the entire industry within days or weeks, pixel for pixel
shadcn@shadcn
If you want to add a navigation trail to your own chat app, the new MessageScroller component has the hooks you need out of the box. Look for: const { currentAnchorId, visibleMessageIds } = useMessageScrollerVisibility()
如果你想给自己的聊天应用添加导航轨迹,新的 MessageScroller 组件已经内置了你需要的 hooks。找这个:const { currentAnchorId, visibleMessageIds } = useMessageScrollerVisibility()
N
Nathan Lambert@natolambert🔁 @brickroad7推文06-30 · 15:03
When we were in China, @xeophon and I made a quick detour to visit Meituan. They continue to be one of our favorite open model builders, as they're showing how a variety of companies can succeed here and baffle a lot of people as to why they're making models. Meituan is one of the larger tech companies in China. They're building LLMs to add services to their own products. In China the notion of the "super app" is very popular, so this dream of more services for users w
Meituan LongCat@Meituan_LongCat
Introducing LongCat-2.0 🐱 1.6T parameters · MoE with ~48B active · 1M context The full model behind Owl Alpha on @OpenRouter — now available. Built for agentic coding from the ground up: ◆ LongCat Sparse Attention (LSA) — scales efficiently for 1M-context tokens ◆ Zero-Compute Experts — dynamic activation 33B–56B per token, zero wasted compute ◆ MOPD — three specialized expert groups (Agent / Reasoning / Interaction), gate-routed per task How it stacks up: → Terminal-Bench 2.1: 70.8 → SWE-bench
推出 LongCat-2.0:1.6T 参数,MoE 约 48B 激活,1M 上下文。@OpenRouter 上 Owl Alpha 背后的完整模型现在可用。它从底层面向 agentic coding 构建:LongCat Sparse Attention 可高效扩展到 1M 上下文 token;Zero-Compute Experts 每个 token 动态激活 33B 到 56B,零浪费算力;MOPD 有三个专门专家组(Agent、Reasoning、Interaction),按任务门控路由。表现:Terminal-Bench 2.1 为 70.8;SWE-bench……
NVIDIA AI Blog研究06-30 · 15:00
NVIDIA 推理软件栈如何实现最低 token 成本
文章解释企业从 AI 试点走向生产后,基础设施决策如何转向每 token 成本。
NVIDIA AI Blog研究06-30 · 15:00
Jaiveer Singh 如何帮助机器人和开发者更快行动
文章介绍 Jaiveer Singh 在机器人基础设施、开发板和软件工具上的工作。
E
Etched@Etched🔁 @karpathy推文06-30 · 15:00
We're coming out of stealth. We've built our first racks after a successful A0 tapeout, $1B+ in customer contracts, and $800m raised. Early customer tests show us achieving SOTA throughput, latency, and power efficiency on inference workloads. Our first racks ship this summer. https://t.co/FLccrkLTza
TechRadar AI资讯06-30 · 14:47
我们要让数据中心吞掉所有电力、水和清洁空气吗?
文章批评 AI 基础设施竞赛对电力、水和环境的巨大消耗,指出数据中心建设仍处于监管不足状态。
Zach Mueller@TheZachMueller🔁 @lateinteraction推文06-30 · 14:26
V0 of this so far works pretty well. Did GEPA on Qwen 4B (3.5) to get the ask detection working well , e.g. given this slack message what’s the intention, deliverable, etc. Noise to signal I’d ballpark 60/40 but the system will send me its targets on fridays for me to label and perform more GEPA (or to do a full SFT once enough data exists and I decide that it should be a hair stronger)
Zach Mueller@TheZachMueller
Some rambles on my journey so far in what it would take to make me an EA: Essentially it boils down to data (shocking). Put enough observability points in your system and you can wire a few models together to extract signals from this data to act upon. Or, translated: - Read your slack (& DMs) - Read your Notion events - Read your email - Read your calendar Emphasis here is READ. Then very select write permissions based on your own needs. But this is an EA, not replacing you, so this should be v
关于我到目前为止要怎样才会做出 EA 的一些碎碎念:本质上归结为数据(并不意外)。在系统里放入足够多的可观测点,就能把几个模型串起来,从这些数据中提取信号并采取行动。换句话说:读取你的 Slack(包括私信)、读取 Notion 事件、读取邮件、读取日历。重点是“读”。然后根据你自己的需要,非常有限地给写权限。但这是 EA,不是替代你,所以它应该非常……
Peter Yang视频06-30 · 14:15
每个员工都该像一个人的创业公司
S
Sumanth@Sumanth_077🔁 @Sumanth_077推文06-30 · 14:01
Build and Train your own Diffusion Language Models! dllm is an open-source library that lets you build, train, and evaluate diffusion-based language models without setting up complex pipelines or writing custom training loops. Most language models today are autoregressive. They generate token by token, which makes training and inference fast but also leads to problems like exposure bias and difficulty maintaining global coherence. Diffusion language models flip this a
alphaXiv@askalphaxiv
"Improved Large Language Diffusion Models" ByteDance just made bidirectional masked diffusion on-par with autoregessive LM! This paper iLLaDA trains an 8B Transformer from scratch on 12T tokens, then keeps the same denoising objective for SFT on a 25B-token instruction corpus. It improves LLaDA with GQA, tied embeddings, variable-length generation, confidence-based MCQ scoring, and packed-sequence diffusion SFT. iLLaDA-Base raises the average score from 51.1 to 63.9 and slightly exceeds Qwen2.5
《Improved Large Language Diffusion Models》:ByteDance 刚把双向 masked diffusion 做到了与自回归 LM 同等水平!这篇 iLLaDA 论文从零开始用 12T tokens 训练一个 8B Transformer,然后在 25B-token 指令语料上继续使用同样的去噪目标做 SFT。它通过 GQA、权重绑定嵌入、可变长度生成、基于置信度的 MCQ 评分,以及 packed-sequence diffusion SFT 改进 LLaDA。iLLaDA-Base 将平均分从 51.1 提高到 63.9,并略高于 Qwen2.5……
K
Kieran Zhang@ninthbit_ai🔁 @jakevin7推文06-30 · 13:53
周报 #2 来了,拖了好久,周末去杭州参加 Community Day 一直没时间写。主要写了 Raft @raft_hq 的体验: Raft 我也安装很久了,但是一直没有把活安排上去。我是先看了 Raft 的几篇博客,我觉得 Raft 团队是真的在 AX 上下了功夫的,以后也许会开一篇单独谈一下 AX,他们定义为 Agent Expirence Design。Raft 始终把 Agent 放在一等公民的位置,所以他们也需要对软件有更好的体验,但 Agent 和人类也有区别,Agent 读取数据时,不会对糟糕的格式产生反对,只会默默降低他们的表现。于是,我们更应该做好 AX。 下面说两个让我觉得“Raft 真正把 Agent 作为一等公民”的体验: 「不需要人类去构造 Agent Identify」 这点我觉得设计的很好,它无形中让 Agent 的 Identify 成为了一个需要逐渐积累的过程,让 Agent 的意义不止是“一堆提示词 + 一堆 skill”,让 Agent 的名字承载了更多的意义和期望,让我可以把 Agent 真正当
A
Ara Kharazian@arakharazian🔁 @ylecun推文06-30 · 13:01
We can finally say AI isn't killing jobs. A new paper from me, @tryramp, and @RevelioLabs uses firm-level spend and workforce data across 21K U.S. businesses to measure AI's impact on jobs. Firms that adopt AI heavily grow headcount 10% over two years following adoption. Low adopters see no statistically significant change.
NVIDIA AI Blog研究06-30 · 13:00
Into the Omniverse:用合成数据和微调提升 Vision AI Agent 准确率的三种工作流
NVIDIA 介绍开发者和企业如何用 OpenUSD、合成数据和微调改进 Vision AI Agent。
Matt Wolfe视频06-30 · 13:00
别掉进这个 AI 陷阱
MIT Tech Review资讯06-30 · 12:00
农业已经准备好迎接 AI,但数据还没有
AI 正在改变农业可能性,但行业在投入 AI 之前必须先解决数据基础、质量和组织问题。
The Verge AI资讯06-30 · 08:00
认识那个两次击败 Elon Musk 的律师
文章讲述律师 Bill Savitt 与 Elon Musk 相关案件中的经历和背景。
D
David Sacks@DavidSacks🔁 @Plinz推文06-30 · 11:32
Narrative violation: A new study of 21,559 firms in the U.S. finds that “companies that adopt AI tend to grow faster following adoption”. “Firms making the largest AI investments grow employment by roughly 10% following adoption, while low-intensity adopters see no statistically significant change.” “Entry-level headcount rises 12% for high-intensity adopters.” “Gains emerge gradually and are broad across roles, including engineering, sales, administration, and customer serv
H
Harry Stebbings@HarryStebbings🔁 @brickroad7推文06-30 · 11:26
In the last 24 hours, I have had 5 founders message me of varying-sized companies; some 10-person startups and one $200BN public company. All of them stated they have been able to cut inference spend by 75% or more with little effort, no performance change and better latency. The times they are a changing.
K
Kyle Chan@kyleichan🔁 @brickroad7推文06-30 · 10:44
This is huge news for China’s AI ecosystem. Meituan just released a 1.6-trillion parameter AI model trained entirely on Chinese AI chips. They’ve been working on using Chinese AI chips since 2023. https://t.co/AH0dWE832Q
Krish Naik视频06-30 · 10:42
梦想中的训练营:高级生产级 AI LLM 工程训练营发布
WIRED AI资讯06-30 · 10:30
Bernie Sanders 早就看到了这一幕
文章回顾 Sanders 长期警告财富集中威胁民主,并认为围绕 Big Tech、亿万富豪和 AI 的不满正在上升。
Google Research研究06-30 · 10:26
推出 TabFM:面向表格数据的零样本基础模型
数据管理
TechRadar AI资讯06-30 · 09:47
OpenAI 正在复制 Apple 最大的竞争优势,Nvidia 该警惕了
文章认为 OpenAI 自研 AI 芯片显示其正在走 Apple 式垂直整合路线,从而削弱对 Nvidia 的依赖。
MarkTechPost资讯06-30 · 08:13
Meta AI 发布 Brain2Qwerty v2:非侵入式 MEG 脑到文本管线,可用 61% 词准确率解码输入句子
Brain2Qwerty v2 能从用户打字时的 MEG 信号中实时解码自然句子,展示非侵入式脑到文本的进展。
Wes Roth视频06-30 · 05:05
现在全都不妙了……
Claude 官方博客产品06-30
Getting started with loops
Microsoft Research研究06-30
SkillOpt: Agent skills as trainable parameters
SkillOpt turns skill editing into a training process, making agent behavior more reliable without changing model weights.
Suno产品06-30
How Dream Relic sees sound and gets it stuck in your head
Dream Relic reflects on surreal visuals, emotional world-building, and using Suno to give his cinematic universe a sound.
Anthropic Newsroom产品06-30
Claude Science, an AI workbench for scientists, is now available
Claude Science is a customizable app that integrates the tools and packages researchers often use, produces auditable artifacts, and provides flexible access.
Anthropic Newsroom产品06-30
Introducing Claude Sonnet 5
Sonnet 5 delivers frontier performance across coding, agents, and professional work at scale.
Anthropic Newsroom产品06-30
Redeploying Fable 5
Fable 5 returns globally July 1. Anthropic is also proposing an industry-wide framework for scoring jailbreak severity with partners including Amazon, Microsoft, and Google.
Nature ML论文06-30
用机器学习识别可改善分枝杆菌外膜渗透的化学特征
Nature ML论文06-30
RNAbpFlow:结合碱基对增强的 SE(3) 流匹配,用于条件 RNA 3D 结构生成
Nature ML论文06-30
AI 系统提出假设并设计检验方法
Nature ML论文06-30
无需分割的活细胞成像分析揭示 T 细胞改造如何影响癌细胞聚集动态
Nature ML论文06-30
AMIE 和 MIRA Agent 推进医疗 AI 能力
Nature ML论文06-30
AI 工具能加快思考,但证据仍来自实验台
06 / 29周一25 条
推文 0资讯 14视频 2产品 3研究 4论文 0播客 0
MarkTechPost资讯06-29 · 23:41
OpenClaw 发布 iOS 和 Android 伴侣 Node 应用,连接手机与自托管 AI Agent 网关
OpenClaw 发布免费移动端伴侣应用,让手机连接自托管 AI Agent 网关,而不是作为独立聊天机器人运行。
WIRED AI资讯06-29 · 21:49
Meta 承包商假扮青少年,测试竞品聊天机器人对自杀、性和毒品问题的回答
WIRED 报道称 Meta 项目中的承包商假扮儿童,测试 Gemini、ChatGPT 等聊天机器人对高风险问题的回应。
MarkTechPost资讯06-29 · 21:34
PyGraphistry 实战流程:用于安全分析与风险调查的交互式图智能管线
教程构建一个可在 Colab 运行的 PyGraphistry 工作流,用于企业访问数据的图分析、可视化和风险调查。
Ars Technica AI资讯06-29 · 21:09
韩国将投入 1 万亿美元扩大内存芯片产能和人形机器人
韩国政府和头部科技公司计划投入巨资建设芯片产能、AI 数据中心和人形机器人项目。
TechRadar AI资讯06-29 · 20:00
Fitbit 的 Gemini AI 教练给出“离谱”健身建议,用户说“等不及试用结束”
Fitbit 新 AI 健身教练被用户批评建议不靠谱,引发对 Gemini 驱动健康功能质量的质疑。
The Verge AI资讯06-29 · 15:47
Tidal 不会为 AI 生成音乐支付版税,但也不会完全禁止
Tidal 发布 AI 生成音乐政策,计划保护艺术家并告知听众,但不直接全面封禁 AI 音乐。
MarkTechPost资讯06-29 · 19:06
NVIDIA BioNeMo Agent Toolkit 将生物分子模型变成药物发现 AI Agent 的可调用技能
文章介绍 AI 科学家如何调用 BioNeMo 工具,把生物分子模型封装成 Agent 可使用的能力。
MIT Tech Review资讯06-29 · 18:00
AI Agent 不是你的“同事”
文章批评把 AI Agent 拟人化为同事的说法,提醒企业重新审视人机协作中的权责和管理方式。
NVIDIA AI Blog研究06-29 · 17:00
Claude 遇上 Blackwell Ultra:Anthropic 模型现在在 Azure 上运行于 NVIDIA GB300
Anthropic Claude 模型已在 Microsoft Azure 的 NVIDIA GB300 Blackwell Ultra GPU 上通过 Microsoft Foundry 提供。
TechRadar AI资讯06-29 · 16:10
Meta AI 新研究负责人 Dawn Song:下一个前沿是“有经济价值”的 AI Agent,而不是取代人类
Dawn Song 表示真实世界影响比基准分数更重要,Meta 最新模型更强调安全、信任和实际价值。
NVIDIA AI Blog研究06-29 · 15:00
Firefly Aerospace 首次在月球轨道运行 NVIDIA Jetson
Hacker News资讯06-29 · 14:53
与 AI 协作:一个具体例子
Hacker News 热帖,围绕 htmx 文章中一个具体的 AI 协作案例展开讨论。
MIT Tech Review资讯06-29 · 14:44
技术前沿上的 Agent 可信度
文章讨论企业 AI 投资升温时,组织如何在战略目标、ROI 和 Agent 能力可信度之间取得平衡。
Hacker News资讯06-29 · 13:09
Tidal 的 AI 政策
Hacker News 热帖,讨论 Tidal 关于 AI 生成音乐、版权和平台治理的新政策。
Matt Wolfe视频06-29 · 12:52
用 AI 生成最好的动画
Tina Huang视频06-29 · 11:30
你真正需要的 AI 工具
WIRED AI资讯06-29 · 08:00
这个人形机器人是个可怕地称职的办公室实习生
Flexion Robotics 由前 Nvidia 工程师创立,展示了一种训练机器人完成实用办公室工作的方式。
Hacker News资讯06-29 · 01:25
央行人士警告:AI 热潮可能引发全球金融崩盘
Hacker News 热帖,讨论央行人士对 AI 投资热潮和全球金融风险的警告。
Claude 官方博客产品06-29
Claude in Microsoft Foundry is now generally available
Claude 官方博客产品06-29
Introducing the Claude apps gateway for Amazon Bedrock and Google Cloud
Cursor Blog产品06-29
Build from anywhere with Cursor for iOS
Cursor is available as a native iOS app on your phone, now in public beta.
Meta AI研究06-29
From Brain Waves to Words: Brain2Qwerty Offers a New Path to Communication Without Surgery
Microsoft Research研究06-29
Memora: A Harmonic Memory Representation Balancing Abstraction and Specificity
Memora is a scalable memory system for AI agents that separates what is stored from how it is retrieved.
06 / 28周日9 条
推文 0资讯 3视频 3产品 0研究 0论文 0播客 0
Hacker News资讯06-28 · 23:49
我们需要排除 AI 的科技新闻源
作者认为 Techmeme 和 HN 等科技新闻面越来越被 AI 淹没,需要保留非 AI 技术新闻的渠道。
Hacker News资讯06-28 · 19:35
AI 不够给力后,Ford 重新聘用“老派”工程师
Hacker News 热帖,讨论 Ford 在 AI 未能达到预期后重新聘用资深工程师。
Peter Yang视频06-28 · 18:00
Anthropic PM 内部如何使用 Agent
Krish Naik视频06-28 · 16:48
用本地 LLM 运行 NemoClaw:部署更安全的 AI Agent
Hacker News资讯06-28 · 16:41
教授痛批 Brown 考试中的大规模 AI 作弊
Hacker News 热帖,讨论 Brown 大学考试中被指大规模使用 AI 作弊,以及学术诚信风险。
Peter Yang视频06-28 · 13:00
Anthropic 如何押注“睡觉时也能工作”的 Claude Agent | Jess Yan
06 / 27周六9 条
推文 0资讯 2视频 2产品 0研究 0论文 0播客 1
Wes Roth视频06-27 · 15:46
HERMES Agent + Stripe 支付 + NVIDIA Nemotron 太夸张了
Tina Huang视频06-27 · 15:00
3 个让产出提升 10 倍的 OpenClaw 配置
TechRadar AI资讯06-27 · 01:00
Anthropic 指控 Alibaba 通过海量提问复制 Claude,并拉开新 AI 战争序幕
报道称 Anthropic 指控与 Alibaba 和 Qwen 实验室有关的团队通过大量查询提取 Claude 能力。
WIRED AI资讯06-27 · 00:26
Trump 政府允许 Anthropic 向部分美国机构发布 Mythos
经过数周谈判,White House 允许 Anthropic 向部分美国公司和政府机构开放其先进 AI 模型。
42章经播客06-27
少有的深度参与过字节、美团组织建设的人|对谈 AI 创业者魏小康
活动预告🥳:7 月 4 日,我们会请到魏小康做一场线下活动,大家记得翻到 shownotes 末尾查看报名信息! 魏小康可能是国内最懂组织建设和招聘的人之一,也是一个先后深度参与过字节和美团组织建设的稀缺样本: 2017—2020 年,他在字节担任招聘负责人,经历了抖音的高速增长与国际化;2020—2026 年,他又在美团担任招聘负责人及 AI 产品经理。 节目一开始,我们就从小康在这两家公司的经历聊起。字节和美团分别有着怎样的组织思路?像张一鸣和王兴这样的优秀创业者,有哪些共同特质? 随后,他展开讲了讲这些年对组织建设的诸多思考。组织建设其实可以拆成两
06 / 26周五11 条
推文 0资讯 3视频 1产品 2研究 3论文 0播客 0
Ars Technica AI资讯06-26 · 22:19
韩国计划把全军训练成“无人机战士”
韩国宣布要训练近 50 万军人像使用个人武器一样操作无人机。
Where's Your Ed At资讯06-26 · 18:32
付费:泡沫笔记,第 1 卷
过去几周格外漫长,作者原计划的 Hater's Guide 来不及写,于是改开一个持续更新的短札系列。
Google Research研究06-26 · 18:30
用冻结的 Multi-Token Prediction 加速 Pixel 上的 Gemini Nano 模型
机器智能
Matt Wolfe视频06-26 · 17:30
AI 新闻:那个媲美 Fable 的新模型
WIRED AI资讯06-26 · 17:05
OpenAI 有了新 AI 模型,但你为什么用不上
报道称 White House 要求 OpenAI 推迟 GPT-5.6 模型发布,此前 Anthropic 也被迫下线其先进模型。
LMSYS Blog研究06-26
Improving DeepEP MoE Load Balance in SGLang with Waterfill and LPLB
Mixture-of-Experts models rely on Expert Parallelism to scale inference across multiple GPUs; this post covers Waterfill and LPLB in SGLang.
Suno产品06-26
Eric Christian on hearing the orchestra inside a melody
The pianist and composer reflects on melody, notation, remixing, and using Suno to hear his music at orchestral scale.
Suno产品06-26
Matt Steffanina on owning the music behind the movement
The dancer, choreographer, and DJ reflects on building a global dance community and using Suno to bring new ideas to life faster.
Anthropic Research研究06-26
Anthropic Economic Index report: Cadences
In our latest Economic Index report, we sample hourly for the first time to ask when people come to Claude, what they produce with it, and how they perceive AI's impact on their work.
06 / 25周四12 条
推文 0资讯 1视频 2产品 5研究 2论文 0播客 0
Matt Wolfe视频06-25 · 20:19
他们在“AI 脑腐”这件事上撒谎
Wes Roth视频06-25 · 20:07
OpenAI 刚刚发布“JALAPENO”
MIT Tech Review资讯06-25 · 14:22
面向 AI 时代重塑零售
AI 正在重塑零售业,真正的变化不一定是炫目的虚拟试穿或聊天机器人,而是更深层的运营能力。
Google Research研究06-25 · 10:03
用线性弹性缓存优化云经济性
算法与理论
Cohere Blog产品06-25
Creating a security agent with Cohere North and Wiz
Cohere used North, Wiz, and a custom MCP server to automate incident response workflows.
Cohere Blog产品06-25
Automating fork maintenance with AI agents
Cursor Blog产品06-25
Reward hacking is swamping model intelligence gains
Stricter eval harnesses show how benchmark scores can conflate coding ability with answer retrieval.
Cursor Blog产品06-25
How Notion used the Cursor SDK to embed coding agents
Microsoft Research研究06-25
Understanding the brain with AI-driven explanations and experiments
Researchers introduce generative causal testing to translate black box models into clear hypotheses and verify them in the scanner.
Suno产品06-25
Introducing Spark: Supporting the Next Generation of Independent Artists
Spark is designed to help independent artists bring their music projects to life.
06 / 24周三4 条
推文 0资讯 0视频 1产品 0研究 2论文 0播客 0
Matt Wolfe视频06-24 · 20:31
9 个像作弊码一样好用的免费 AI 技能
Google Research研究06-24 · 16:51
思考以回忆:推理如何释放 LLM 的参数化知识
生成式 AI
Google DeepMind研究06-24 · 16:30
在 Gemini 3.5 Flash 中引入 computer use

该分类暂无内容。