NewsletterThe Batch· 06-14 · 07:17

Seedance 一鸣惊人、英伟达 AI 辅助芯片设计、让机器人不再遗忘

Seedance Makes A Splash, Nvidia's AI-Guided Chip Designs, Helping Robots Not Forget

打开原文约 62 分钟读

Dear friends,

We’ve been working on AI Andrew, an AI companion shaped by my personality. I invite you totry it out!

Many people are trying to understand what AI means for their work, learning, and careers. I’ve frequently enjoyed conversations with people on these topics. If you’d like to have a conversation on this, you might find AI Andrew can be a helpful thought partner and maybe even a friend — someone you can speak with about AI concepts, project ideas, career decisions, and whatever else is on your mind.

My team has been iterating on AI Andrew for many months, using an error analysis process to find circumstances where it says things that I would not say and debug our agentic harness to try to close the gap. My communication style has been shaped over the years by thousands of interactions. I’d never before tried to codify this in an agentic workflow. This turned out to be hard and is still a work in progress.

Reflecting on my beliefs about how to communicate has been an interesting exercise. I believe in:

I am still learning how to have better conversations that support others in pursuit of their goals. We used a large mix of techniques in our harness, including RAG and many other tools, a mix of small and large models, guardrails, extensive evals, short- and long-term memory, and offline agentic loops that automatically propose improvements to the system.

To be clear, AI Andrew still has gaps! For example, an internal tester recently got it to hallucinate having climbed mountains that, sadly, I have not climbed, and it also occasionally gives advice that I question.Nonetheless, many users have reported gaining insights from talking to AI Andrew, and I hope you will find it (him?) a friendly companion that you can speak with about both personal and professional matters.

If you want to try it out, pleasetell me(in avatar form) what’s on your mind!

Keep building,

Andrew


A MESSAGE FROMDEEPLEARNING.AI

Go beyond using LLMs to understanding how they work!In _Transformers in Practice_, you’ll learn how transformers generate text, process context, and run efficiently using attention, KV caching, and quantization. Earn a certificate as a DeepLearning.AI Pro member.Enroll Now

News

U.S. to Evaluate Upcoming Models

The U.S. government said it will evaluate cutting-edge models before they’re available to the public, a sharp reversal of the White House’s earlier hands-off policy.

What’s new: The National Institute of Standards and Technology (NIST), an office of the U.S. Department of Commerce,announcedthat a new multi-agency task force will assess national-security risks posed by AI models prior to their deployment. Leading U.S. AI companies agreed to submit models for evaluation prior to release. In addition, the White House isconsideringan executive order that would require AI models to gain approval before they can be deployed.

How it works:NIST said the tests will focus on demonstrable risks to cybersecurity, biosecurity, and chemical weapons. The administration did not disclose details of its agreements with AI companies or any controls it expects to impose on models in light of test results.

Behind the news:The abrupt policy change marks a major departure from the Trump Administration’s focus on removing Biden-era regulatory barriers to AI innovation. It comes roughly one month after Anthropic attracted the government’s attention by announcing that its Claude Mythos Preview model, which is not yet widely available, could exploit vulnerabilities in widely used software.

Why it matters:The White House’s shift from laissez-faire to pre-release scrutiny of AI models reflects a dawning reality that AI models have become powerful enough to pose immediate risks to national security. Requiring AI developers to test advanced models prior to public availability could give the government advance warning of potential issues and motivate AI developers to manage them proactively. It would also enable the government to decide which models are fit for wider distribution, and which must be withheld or altered (for reasons that may not be transparent). AI companies aren’t yet required to submit new models for government testing, and those who have agreed to do so have agreed voluntarily. However, officials are considering an executive order that would make such testing mandatory.

We’re thinking:A standardized battery of benchmark tests, applied comprehensively and according to consistent procedures, would be beneficial to the AI industry, but we think the right way to come up with these tests would be via the free market, rather than be imposed by government. Further, requiring government tests ahead of release would slow down U.S. developers, putting them at a competitive disadvantage relative to their peers in other countries, and potentially help them thwart open-source competitors through regulatory capture.


OpenAI Challenges Speech-to-Speech Leaders

An update of OpenAI’s speech-to-speech model lets developers tune the tradeoff between speed and reasoning.

What’s new:OpenAIintroducedthree new audio models in its Realtime API. GPT-Realtime-2 is a speech-to-speech model with configurable reasoning effort. GPT-Realtime-Translate translates speech between more than 70 input languages and 13 output languages, and GPT-Realtime-Whisper transcribes speech into text.

How GPT-Realtime-2 works:GPT-Realtime-2 handles audio in and audio out as an end-to-end process — including reasoning — rather than separate speech-to-text, text-generation, and text-to-speech steps.

GPT-Realtime-2 performance:GPT-Realtime-2 led some independent benchmarks for conversational dynamics and multi-turn instruction following, but it trailed on the Artificial Analysis Speech Reasoning leaderboard. The time required to generate audio ranged from 1.12 seconds at minimal effort to 2.33 seconds at high effort, which yields the model’s best reasoning scores — generally slow for real-time interactions, which benefit from latency lower than 500 milliseconds.

Yes, but:The two models ahead of GPT-Realtime-2 on the Artificial Analysis Speech Reasoning leaderboard are also faster.

Why it matters:Voice agents generally have focused on relatively simple interactions because reasoning often comes at the cost of a snappy response. GPT-Realtime-2 offers not only high performance but also control over that tradeoff (minimal reasoning for faster turn-taking, xhigh for interactions that can wait). This flexibility expands the range of tasks voice agents can handle without resorting to text processing.

We’re thinking:It's exciting to see that GPT-Realtime-2 implements preambles similar to the pre-responses we describedhere!


China shut down Meta’s attempt to acquire agentic technology that originated within its borders, a blow to further technical interchange and investment between China and the U.S.

What’s new:China’s cabinet-level regulator in charge of economic planning and developmentblockedMeta’s proposed acquisition of Manus, a Singapore-based startup that was founded in China and offers a popular AI agent. Meta and Manus unwound the deal, which was worth as much as $2.5 billion. Beyond quashing Meta’s plans to offer agentic products and features, the action upended an emerging strategy for launching AI startups built in China.

How it works:Meta’s purchase of Manus wasviewedas a sign that Manus, having relocated to Singapore and closed its business in China,had maneuvered itself successfully beyond Bejing’s purview. But the government asserted its authority over strategically important technology developed in China by Chinese engineers. Startups founded in China responded by rolling back plans to move elsewhere to seek international investments or partnerships.

Behind the news:For more than a decade, the U.S. and China have viewed advanced technology as a strategic arena tied to economic influence, military power, and national security. Earlier disputes over espionage, intellectual property, and technology transfer escalated into sweeping government intervention. The U.S. blacklisted the Chinese communications-technology company Huawei as a security risk in 2019 and imposed increasingly stringent export controls on semiconductors beginning in 2022. Meanwhile, Beijing set conditions on foreign companies seeking access to the Chinese market and imposed rules to reduce its reliance on Western technology. Numerous Chinese startups have attempted to sidestep the superpower rivalry byincorporating in Singaporeand elsewhere. China’s decision to block the Meta-Manus deal strikes a blow to that strategy.

Why it matters:The tightening of China’s control over AI startups raises tensions amid an already tense situation between China and the U.S. This week, leaders of the two countries will meet to discuss geopolitical concerns, including AI. An agreement may permit technology and ideas to flow more easily between the two countries (and from China to Singapore and elsewhere in the region). But an ongoing stalemate could drive both countries to withdraw further from free exchange and harden defenses of their own national security and economic interests.

We’re thinking:Beijing’s regulators appear to be asserting authority over any strategically important company whose technology, talent, or operations originated in China. That would sharply narrow the path of founders and investors who hope to attract Western capital or pursue international acquisitions.


AI Mammogram Diagnosis Under Real-World Conditions

Introduced in 2020, Google’s AI system for detecting breast cancer in mammograms still hasn't been used to diagnose current patients. Two studies evaluated how well it would integrate with protocols at UK clinics.

What’s new:In ateston real-world data, Google’s breast-cancer detection system identified slightly more cancers with fewer false positives than examinations by the first of two expert doctors. More significantly, it identified a quarter of cancers that human doctors missed but became apparent later. In a companionstudy, the system performed about as well as a second expert (who considered the first’s opinion). However, some doctors reported distrust in the system’s output. The studies were conducted by Christopher J. Kelly, Marc Wilson, and colleagues at Google, Imperial College London, University of Surrey, Royal Surrey National Health Service Foundation Trust, and several National Health Service Breast Screening Centres.

How it works:Google’ssystemuses three convolutional neural networks that were trained on a mammographydatabaseto produce embeddings, determine potential cancerous regions, and classify the probability of cancer.

Tests and results:In the two studies, the AI system helped to identify more cancers, and to identify them faster and earlier, in a typical UK diagnostic process.

Behind the news:Efforts to use AI for breast cancer detection began with earlier computer-aided detection (CAD) systems in the 1990s and 2000s, but the field accelerated in the mid 2010s as deep-learning models trained on large mammography datasets began outperforming older methods. In 2020, researchers at Googleshowedthat an AI system could match or exceed expert radiologists in screening mammograms while reducing both false positives and false negatives. In late 2022, Googlelicensedthe system to iCAD, which offers a breast-imaging platform, for deployment in real-world clinics. In 2023, Google and iCADexpandedtheir partnership into a 20-year worldwide commercialization agreement aimed at using Google’s AI as an independent “second reader” of 2D mammography. The partnership currently aims to secure regulatory approval for potential deployment in breast-cancer screening systems that use double-reading workflows.

Why it matters:Around 2.3 million women are diagnosed with breast cancer annually worldwide, and 760,000 don’t survive. Early diagnosis is critical. Yet the diagnostic system is overburdened. In the UK, for instance, a consultant breast radiologist has only four hours available weekly to look at the 5,000 scans they must read annually to maintain their certification. These studies show that AI can ease diagnostic workloads and improve outcomes by helping to prioritize scans or serving as a default co-reader. But they also highlight a need to build trust in the technology among doctors. This may require educating physicians in how AI systems work and making the systems’ output more explainable.

We’re thinking:As AI systems find their way into medicine, they raise important questions about the steps needed to build trust in the technology, and what checks and balances will yield the best outcomes. Developers can talk directly with doctors about what they need to gain trust in an AI system's output.

这篇还没有中文全文

该条目暂未提供中文翻译。标题/摘要已自动中译;本系统只对人工挑选的内容生成全文翻译。

挑中后 → markitdown 取正文 → 精翻 → 此处切换为译文