← Cohere Blog
产品Cohere Blog· 06-13 · 16:17

「未来工作」之争有个证据问题

The future of work debate has an evidence problem

打开原文约 23 分钟读

You’ve probably heard that AI is coming for your job. The reality is more complicated than any single number suggests.When researchers talk about AI’s impact on the labor market, one of the most discussed concepts is “exposure”. An occupation’s exposure score is an estimate of how many of its core-work-tasks could plausibly be performed by an AI system, faster, cheaper, or just as well. The idea is straightforward: take a list of occupations, list the tasks each one involves, and ask how many of those tasks an AI _could_ handle.

Exposure scoring exists because empirical evidence about AI's actual labor market impact is inconclusive. The adoption of AI is moving faster than our current labor statistics systems can capture – historically, the effects of new technologies take years to show up clearly in employment and wage data. In the meantime, researchers turn to theoretical measurement: estimating, on the basis of what AI can plausibly do, which jobs are most likely to be affected. Exposure scores are one way to generate a working signal in the absence of harder evidence.

The most widely cited version of this estimate comes from a 2023 paper, “_GPTs are GPTs_,” by Eloundou et al. Their headline findings: 80% of the U.S. workforce has at least 10% of their occupational tasks exposed to large language models, and 19% have 50% or more. These numbers have traveled widely – they have been cited by the IMF, the OECD, referenced in U.S. Senate proposals, and built upon by research institutions across multiple countries. The figure below maps this distance across recent AI labor market research:

Like all empirical tools, the _GPTs are GPTs_ scores are a bounded instrument, and the distance between what they were designed to answer and what they are being asked to support deserves attention.

What do these scores measure?

The GPTs are GPTs scores measure the technical feasibility of a GPT-4-era model, evaluated against the U.S. Department of Labor’s occupational taxonomy, for tasks with verifiable outputs that can be completed faster with AI assistance. That is a specific and answerable question, and the paper addresses it carefully.

But that specificity matters for three main reasons:

  1. The scores reflect a model from early 2023. Since then, frontier AI capabilities have improved substantially, with one index estimating a roughly 26 percentage point gap between the model represented by the _GPTs are GPTs_ scores and current AI capabilities.
  2. The scores are built on an American occupational taxonomy that does not transfer cleanly onto other labor markets, even with translation.
  3. The scores model work as a bundle of discrete, scorable tasks, which captures what can be itemized but not the judgment, relationships, and context that often constitute the most consequential parts of a job.

These are all limitations the original authors acknowledged, and recent work begins to formalize. But what happens when the scores travel beyond those boundaries _?_

What happens when these scores drive policy?

Policymakers are under pressure to act. They need to know which workers will need support, which industries are at risk, what kinds of interventions are justified and when to enact them. The _GPTs are GPTs_ scores have become a primary input to those discussions, appearing in government reports, think tank policy briefs, and U.S. Senate proposals.

But the questions policymakers are asking require more information than static exposure scores alone can provide. A score calculated against a 2023 model, using an American taxonomy, treating work as a bundle of discrete tasks, is being used to inform decisions about workers in 2026 and beyond, in labor markets outside the U.S., doing jobs that involve far more than itemizable tasks. The limitations don’t disappear when the scores cross those boundaries. They compound.The scores driving the future of work debate reveal which work appears to be automatable. However, this is a representation of one possible future under one set of assumptions. When work is viewed through a different lens, different definitions of exposure emerge.

This should also prompt us to ask, _what’s missing?_ Which workers, regions, and futures of work are not represented by these figures? Notably, the dataset contains no discrete categories for data workers—the actual labor that powers AI systems. The labor of these workers is structurally embedded in every LLM whose capabilities O*NET tasks are being evaluated against, yet they remain outside the policy debate those scores enable.

A recent ILO review of AI exposure research concluded plainly: the most widely used indicators tell us something meaningful, but not everything we need to know about who is at risk and why.

The research community is responding

Researchers have not been standing still. A growing body of work is emerging which directly addresses the gaps in static exposure scoring.

Together, these approaches point toward a richer evidence base, one that uses the original scores as a starting point rather than a final answer.

What we’re calling for

The future of work debate is asking three distinct questions:

  1. whether AI capabilities will advance meaningfully,
  2. what that means for economic outcomes, and
  3. what the optimal policy responses are.

These questions are related, but not equivalent. We have the most evidence for answering the first (what AI can do). The questions about what happens when capable AI meets a working economy, and what we decide to do about it remain. Two groups in particular can take steps towards broadening this debate;

Policymakersshould treat exposure scores as one signal among several. The most robust policy interventions are those whose value does not depend on any single forecast being correct: strengthening worker protections, investing in reskilling infrastructure, and building the institutional capacity to respond as impacts materialize. Workers should be engaged as partners in that process. They have direct knowledge of how their work is changing, and what they want the future of work to look like.

Researchersshould prioritize building the evidence base that policy decisions actually require. That means measurement tools that update alongside AI capabilities, that extend beyond U.S. labor markets, and that connect directly to the questions policymakers are trying to answer. It also means treating workers as epistemic partners throughout the research process, not just as data sources. The goal is research that closes the distance between what we can measure and what policymakers can act on.

The 80% exposure figure describes technical feasibility under a particular set of assumptions at a particular moment in time. It is not a forecast, and it is not a mandate. The future of work will be shaped by decisions that researchers, policymakers, and workers make together. The evidence base those decisions rest on should be up to the task.

_For the full analysis and recommendations, read our complete report._

Your occupation is in this data. The U.S. Department of Labor maintains O*NET, a public database that catalogs every U.S. occupation as a list of tasks—the same task list that underpins most AI exposure estimates, including _GPTs are GPTs_. See how O*NET tasks have been re-rated, dropped, or rewritten since GPT-4 and tell us whether any of it reflects how your work has actually changed.

这篇还没有中文全文

该条目暂未提供中文翻译。标题/摘要已自动中译;本系统只对人工挑选的内容生成全文翻译。

挑中后 → markitdown 取正文 → 精翻 → 此处切换为译文