
Self-replicating and improving thinking machines are a familiar trope in science fiction. From The Matrix’s “programs” that digitally cage humans, to The Terminator’s Skynet sending cyborgs to hunt the last survivors, artificial intelligence that can build itself has long been portrayed as an apocalyptic threat.
Those pyrotechnic narratives may be science fiction, but the underlying mechanism — AI autonomously developing new, better AI — is considered one of the greatest risks inherent in our headlong rush to build more powerful models.
The White House’s National Security Memorandum on AI, the EU AI Act, and public statements from the leading frontier AI companies highlight it as a key concern. Prominent researchers such as Yoshua Bengio, Geoffrey Hinton, and Andrew Yao have all warned that autonomous replication and improvement could pose an existential threat to humanity within our lifetimes. Despite all the warnings, it’s a goal that the AI industry is pursuing with increasing verve.
Why are serious people worried about automated AI research?
AI R&D today is labor-intensive. Much of the work is painstaking and iterative — adjusting hyperparameters, running thousands of tests, comparing model outputs, and finding clever ways to get more performance out of the same hardware. The frontier labs are locked in a race to stay ahead, offering incredible salaries to attract top talent. Meta has reportedly dangled $100m signing bonuses to lure key researchers away from rivals.
But what if companies didn’t have to compete for human researchers at all? What if they could run researchers — millions of them, in parallel?
That’s the scenario Tom Davidson, a senior fellow at the non-profit Forethought Research, spends his time analyzing. Suppose AI companies create AI systems that can do research at the level of a top human scientist. They could then run thousands — or millions — of copies to accelerate their own R&D. For comparison, Davidson says there might be only “double digits” of top human AI researchers driving the biggest innovations today. It would be like jumping from a single family doing all AI research, to an entire city the size of New York focusing on improving models 24/7. In theory, each generation of AI could design the next, in a compounding loop of improvements. Capabilities could soar.
This kind of compounding progress is called an intelligence explosion. OpenAI’s early Preparedness Framework from 2023 set out how it assesses the risks of each new model, explicitly warning of a scenario in which a sufficiently capable AI “improves itself, which makes the system more capable of more improvements,” triggering a runaway loop. If that happens the resulting “concentrated burst of capability gains could outstrip our ability to anticipate and react.”
The danger, then, isn’t just that things move faster. It’s that they move too fast — so fast that regulatory guardrails, safety checks, and basic societal adaptation can’t keep up. Research based on interviews with AI researchers estimates that the pace of progress could jump by a factor of two to 20. Other estimates, such as Davidson’s own, suggest even more dramatic possibilities — such as a thousand-fold leap in capabilities over a single year.
A burst of progress on that scale would compress the timeline for addressing every other AI risk. Whether your primary concern is job displacement, authoritarian misuse, misalignment, or something else entirely, the time available to understand and mitigate those risks could vanish. Davidson puts it bluntly: “This crucial period — when the most extreme risks from AI are emerging — [is] where it’s most important to go slowly and cautiously.” Speeding up so much would be a “big game changer in the bad direction.”
And speed isn’t the only issue. If AI systems take over the work of designing their successors, they could theoretically evolve in ways that are increasingly difficult for humans to track. Today’s models are already difficult to interpret — as superforecaster Jared Leibowich puts it, “on a fundamental level, we see that [AI] works, but we don’t know how it works,” increasing the risk if we get self-improving AI before gaining insight into its inner workings.
Researchers rely on workarounds like “chain of thought” reasoning — internal scratchpads where the model writes out its logic in plain English — to get a glimpse of what an AI is doing. But there are already signs that models can hide their intentions or pursue goals different from those they were trained for. If future AI systems start communicating primarily with each other, in machine-optimised code or entirely new languages, the odds of humans being able to maintain meaningful oversight shrink even further.
And researchers and forecasters think this scenario is plausible in the near future. A 2023 survey of almost 3,000 AI researchers gave a 50% chance that AI systems could accomplish every job better and more cheaply than humans by 2047 — and that the AI could autonomously download and fine-tune an open-source LLM as soon as 2028.
In short, automating AI R&D might bring the field to its most pivotal moment — while simultaneously making it harder for humans to understand or control what happens next. And because companies stand to gain a decisive advantage by automating first, there are major incentives to take humans out of the loop.
But how close are we really to AI that could automate AI R&D?
Thankfully these future scenarios remain just that for the moment — more plausible than fictional doomsdays starring Keanu Reeves or Arnold Schwarzenegger, but not yet that close to reality.
“We are very far from even having AI models that can do fairly simple software engineering tasks autonomously,” says Ege Erdil, co-founder of Mechanize, a startup attempting to create training datasets that will enable the complete automation of software engineering. “Fully automating AI R&D would be dramatically, vastly more difficult than that.”
Today’s frontier models still struggle with the basics of multi-step reasoning, navigating complex environments, and producing reliably correct output. They’re less like elite coders and more like undergraduates learning to program — helpful in narrow domains, but in constant need of close supervision by more experienced human coders.
One of the most rigorous tests of current capabilities is METR’s RE-Bench, designed to evaluate AI systems on stand-alone R&D tasks. The idea is simple: pit AIs against humans on a range of hard problems, then compare performance over different amounts of time. Current models perform better than humans for the first two hours — but after that, they fall behind. None can sustain progress over the kind of long, open-ended problem-solving that real-world researchers face daily.
And even this test is generous to the AI. RE-Bench problems don’t demand deep familiarity with complex codebases or require stitching together large contexts. They’re easily checkable and self-contained — meaning the benchmark “is an early warning sign, because there’s a big gap between this benchmark and what would be needed for AI R&D,” according to Charles Foster, who works at METR.
Still, AIs can already complete tasks that would take a human a couple of hours — and they’re improving fast. AI companies are actively racing to develop more capable, tool-using agents with larger context windows and better coding performance — and the companies aren’t hiding their ambitions. In May, Google DeepMind announced AlphaEvolv, an AI coding agent designed to improve algorithms — including training the very models it runs on. (Disclosure: The author’s partner works at Google DeepMind.) During the recent launch, OpenAI’s Sebastien Bubeck said that GPT-5 “foreshadows a recursive self-improvement loop” where the previous model helps generate new data to train the next generation, while Anthropic CEO Dario Amodei predicted earlier this year that AI could be writing all code within a year, and eventually eliminate the need for human engineers entirely.
And the progress is already visible. Tools like ChatGPT agent and Claude Code are incremental but meaningful steps toward that goal. These agents aren’t built solely for AI R&D, but improvements in their general capabilities make them increasingly relevant — Google said last year that over 25% of new code was being generated by AI. At Microsoft, it’s between 20–30%. The meteoric growth of coding assistant startups like Cursor — one of the fastest growing startups ever — shows that not only do companies want AI coders, they are willing to pay for them. And it’s not limited to software: in pharmaceuticals, 75% of large biopharma companies are already using AI to accelerate research.
And even if the outward signs of model capabilities slow, there is a chance they still get better at improving themselves. “We’ve entered a new phase where progress in chatbots is starting to top out but progress in automating AI research is steadily improving,” tweeted then OpenAI safety researcher Stephen McAleer, who recently left for Anthropic.
One concern is that the most powerful models may not be publicly visible at all — a key premise of AI 2027, a fictional but supposedly realistic portrayal of how an AI intelligence explosion could lead to the end of humanity.
Could companies be hiding significantly more advanced internal systems? The researchers I spoke to thought that was unlikely — for now. Most believe that AI companies’ best internal models are only a single release ahead of what the public can access. Companies may quietly use a model while refining it for launch, but it’s still the best model they have. Market pressures make secrecy costly — since valuations depend on investor belief in holding a technical edge, companies can’t afford to hide their best work. As Erdil put it: “The reason they don’t release it is that they don’t have it.”
But that might not last. Davidson notes that if internal AI systems start driving the bulk of future R&D progress, incentives might shift. At that point, keeping newer and more capable models private — to maintain a strategic advantage over rivals — might become the norm. And the outside world could be left with little visibility into just how advanced AI systems have become.
“This phase is interesting because progress might be harder to track from the outside,” added McAleer in his post. “But when we get to the next phase where automated AI researchers start to automate the rest of the economy the progress will be obvious to everyone.”
What other bottlenecks could stop an intelligence explosion in its tracks?
Suppose AI companies do succeed in building systems that can handle most of the cognitive labor involved in AI R&D. Does that mean an intelligence explosion is inevitable?
Not necessarily. There are still “speed limits” that some researchers think will slow down that trajectory.
First, AI needs enough compute and energy to power that cognitive labor — to test research ideas, train new models, and run AI agents once they exist. Currently, both chips and concentrated energy sources are in short supply. “If you just automate the research effort…it’s useful, but by itself, it’s not going to drive this enormous capabilities progress,” says Erdil.
Davidson agreed, saying he thought limited compute was “the strongest objection that could hopefully stop the software intelligence explosion fairly early on.” The largest AI companies are currently locked in a global race to secure chips and power for massive data centres, but even with aggressive scaling, there are limits to how quickly compute can grow.
But what if AI finds a way around the compute bottleneck? That’s the scenario Davidson finds more concerning. A sufficiently capable AI might find ways to dramatically boost software efficiency — designing smarter training curricula, streamlining experiments, or compressing models without major performance loss. In theory, this kind of software-only explosion could light the fuse for rapid, recursive capability gains.
A second debate centres on data. Even with abundant compute and smart algorithms, AI systems still need information — especially if they’re trying to model the messy, unpredictable real world. Princeton professor Arvind Narayanan has argued that the need for real world data will prove to be a huge speed limit. A model might reason flawlessly through a hypothetical, yet make no meaningful progress if it’s reasoning from the wrong assumptions — simply because it lacks crucial information about how the world actually works. Erdil finds this concern persuasive. “I think there’s just this enormous amount of richness and detail in the real world that you just can’t reason about it. You need to see it,” he said on a recent podcast.
Davidson, however, is less sure this constraint will hold. In the specific context of AI R&D, he points out, labs control both sides of the equation: they build the models and produce relevant training data, primarily examples of how humans conduct AI research. That tight feedback loop could make it easier for AI systems to generate the kind of rich, iterative data they need to improve — sidestepping the bottleneck.
What if they succeed?
Transformational AI holds extraordinary promise. It could unlock cures for disease, better quality of life, and unprecedented economic growth. Used wisely, it might be one of the greatest tools humanity has ever created.
But navigating that transition safely will take time. That’s something an intelligence explosion would steal away. Once AI systems begin autonomously improving themselves, the world could change faster than institutions or even human understanding can keep up.
That’s what keeps superforecaster Leibowich up at night. “Companies are clearly stating that they want to [automate AI R&D],” he told Transformer. “And this is the thing that scares me the most, because I’m afraid they might succeed.”