GPT-5 is no slowdown
Transformer Weekly: GPT-OSS, Cruz on export controls, and the AGI-pilled UK
Welcome to Transformer, your weekly briefing of what matters in AI. If you’ve been forwarded this email, click here to subscribe and receive future editions.
Top stories
When OpenAI released GPT-5 yesterday, the general response was … underwhelming, at least for a model everyone’s been waiting for since GPT-4 wowed everyone two years ago.
The new product — which OpenAI describes as a system that intelligently routes requests to different models depending on what’s required — got rave reviews from early testers.
But observers quickly noticed that despite the years of anticipation, GPT-5 isn’t much better than anything else already on the market.
It performs about the same as o3 and ChatGPT Agent on a bunch of coding benchmarks, barely beats Claude Opus 4.1 on another, and significantly underperforms Grok 4 on ARC-AGI-2, which focuses on tasks easy for humans but hard for AI.
GPT-5’s most impressive benchmark result was on successfully carrying out software tasks that typically take a human two hours and seventeen minutes in 50% of attempts, a new record for a language model. But even this was below some expectations.
Some immediately jumped on this underwhelming performance from the AI industry’s poster child as evidence AI progress is slowing.
“We are seeing the plateau: just scaling up is coming to an end,” said Meta’s François Fleuret.
“We have reached a point of diminishing returns,” Gary Marcus noted.
“I don’t wanna read anything about exponential progress anymore,” said one AI influencer.
Even I was firmly unimpressed, calling it a “surprisingly incremental release.”
But I think both I and others were too hasty. Yes, GPT-5 isn’t a big leap forward. But that does not tell us that AI progress is slowing down. Previous leaps forward in AI have come from huge scale ups in compute — which simply didn’t happen here.
As Miles Brundage points out, “lots of takes on GPT-5 are implicitly based on the false belief that it is based on a much larger base model.”
Rohan Pandey, who worked on GPT-5 at OpenAI, confirmed this. “GPT-2 -> GPT-3 -> GPT-4 were all ~100x scaleups in pretraining compute,” he said. “GPT-5 is not.”
Sam Altman’s tweets about the launch suggest a model that is based on a massive increase in compute is in the works, but just wasn’t the focus for this product.
“The main thing we pushed for is real-world utility and mass accessibility/affordability,” Altman said. (The company certainly succeeded on that front — the new model is shockingly cheap.)
Yet even despite aiming for efficiency, GPT-5 is still a good model — and its improvements, while arguably incremental, still put AI capabilities on an exponential course.
“Capability improvement trends are rapid, and GPT-5 is somewhat above-trend,” said METR CEO Beth Barnes.
We’re still on track for models able to do multiple-day-long tasks in the next couple years, in other words. But we are not on track to “safely handle” them, Barnes said.
So if you thought GPT-5’s underwhelming launch was a reason to relax about AI’s pace … think again.
The discourse
OpenAI gave external evaluators including METR a lot more access to test the model than usual, Barnes said:
“Due to increased access (plus improved evals science) we were able to do a more meaningful evaluation than with past models … They should get a bunch of credit for sharing sensitive information with us.”
Regarding safety, the company introduced “safe-completions,” which it says “teaches the model to give the most helpful answer where possible, while still maintaining safety boundaries.”
China summoned Nvidia over chip “backdoors” in an effort to signal they don’t like talk of chip location verification, analyst George Chen said:
“The recent summons of Nvidia serves as a warning for Nvidia’s future products rather than a sign that the Chinese government found any loophole in H20.”
Timothy Lee thinks “keeping AI agents under control doesn't seem very hard:”
“In the AI safety community, discussions of AI risk typically focus on ‘alignment’ … In my view, a more promising approach is to just not cede that much power to AI agents in the first place.”
Sam Winter-Levy and Nikita Lalwani explored what AI might mean for nuclear deterrence:
“Even in the face of AI-driven technological change, nuclear deterrence should remain strong. This does not mean, however, that AI poses no risks to global nuclear stability.”
RAND researchers Michael S. Chase and William Marcellino said the US and China should cooperate on AI risk reduction:
“The emergence of AGI could also create incentives for risk reduction and cooperation. We argue that both will not only be possible but essential. The United States and China will both want to avoid miscalculation and misunderstandings that could lead to an unwanted war. Neither will be able to manage alone the risks of AGI misuse.”
Tyler Cowen thinks the UK government is AGI-pilled:
“They seemed highly intelligent and informed about AI and had good attitudes … at Downing Street, I didn't have to shake people.”
Policy
Michael Kratsios said the US was exploring “software or physical changes you could make to [AI] chips themselves to do better location-tracking.”
He said he hadn’t had personal conversations with Nvidia or AMD about it yet.
BIS reportedly has a huge backlog of license applications due to internal dysfunction, delaying shipments of H20s to China.
Sen. Ted Cruz said he’s “still listening and weighing the merits” of export controls on H20s.
“Chip makers argue that having the world use American chips benefits us. Others argue denying [China] American chips benefits us. And I think it’s a difficult issue. There’s a balance to be reached.”
Trump announced 100% chip tariffs, but said companies with manufacturing operations in the US are exempt — including TSMC.
Sens. Hickenlooper and Capito reintroduced a bill directing NIST to develop voluntary guidelines for AI system evaluators.
Sen. Todd Young thinks there’s a “very strong possibility” his Future of AI Innovation Act becomes law, noting that Sen. Cruz has promised to “work directly with committee members on AI-related legislation in coming weeks and months.”
The Act would formalize the Commerce Department's AI standards work.
Sen. Ted Budd — chair of the Commerce Committee's Science, Manufacturing, and Competitiveness panel — wrote to Howard Lutnick to ask about the risks of DeepSeek models, how he plans to use CAISI to identify these risks, and whether DeepSeek and other companies were using export-controlled chips.
Sens. Young, Cassidy, Cornyn, Blackburn, Husted and Curtis also signed the letter.
Two Chinese nationals were arrested for illegally shipping Nvidia H100 AI chips to China.
They bought over 200 H100s. Nvidia said the “case demonstrates that smuggling is a nonstarter,” which … okay!
Relatedly, Business Insider reported that the Chinese military has tried to buy H100s and H20s.
The General Services Administration added OpenAI, Google, and Anthropic to a list of approved AI vendors.
OpenAI said it’s making its tech available to government agencies for $1. Anthropic’s reportedly planning to do the same.
Gov. Ron DeSantis announced plans for new AI regulations in Florida, saying “I’m not one to say we should just turn over our humanity to AI.”
Illinois banned AI from providing mental health services.
A federal judge struck down California's law restricting AI-generated deepfakes during elections.
China is reportedly exploring merging various semiconductor equipment companies into a “single state-backed giant.”
The US and China both tried to woo Asian countries to join their side in the AI race at this week’s Asia-Pacific Economic Cooperation meeting.
APEC ministers adopted a very bland joint statement on AI at the meeting.
The US is trying to make sure an American remains in charge of the International Telecommunication Union, seemingly due to concerns about the ITU’s AI standards-setting work.
Influence
Former Trump national security adviser Robert O'Brien failed to disclose that Nvidia was his client in an op-ed criticizing AI export controls.
Americans for Responsible Innovation called for Congress to hold hearings on “large-scale smuggling” of advanced AI chips.
Microsoft and Meta reportedly held separate dinners with China hawks to gain support on AI and copyright issues.
American and Chinese researchers once again gathered to call for “urgent international cooperation to ensure advanced AI systems remain controllable and aligned with human intentions and values.”
The Frontier Model Forum released a report on third-party assessments.
A coalition of organizations urged the FTC to investigate Meta's investment in Scale AI as a "de facto vertical acquisition."
An open letter to OpenAI called for transparency on its corporate restructuring.
The AI Whistleblower Initiative called on AI companies to publish their whistleblowing policies.
Nathan Lambert launched ATOM, a campaign to “regain [America’s] global leadership in open source AI.”
Wesley Hodges & Daniel Cochrane outlined a “federalist approach to AI policy,” calling for a “dual charter system for consumer-facing frontier AI companies and services.”
A Punchbowl News survey found that only 22% of lobbying leaders on Washington’s K Street expect Congress to pass a federal framework this year.
Industry
OpenAI released its first open-weight models since 2019. People are very impressed with their performance.
The company fine-tuned the models to “intentionally maximize their bio and cyber capabilities,” and concluded that the release “may contribute a small amount of net-new biorisk capabilities, but does not significantly advance frontier capabilities.”
As Casey Newton sensibly noted, “all these small accelerations may soon add up to something big.”
The models were quickly jailbroken, of course.
They’re hosted on lots of platforms, including AWS — the first time its models have been available there.
OpenAI is reportedly in talks to arrange a secondary stock sale which would value the company around $500bn.
It also announced huge bonuses — up to “the mid, single-digit millions” — for around a third of its staff this week.
ChatGPT reached 700mn weekly active users.
Anthropic launched Opus 4.1, which is better at coding than its predecessor.
Meta’s reportedly got a new team called TBD Lab, focused on building the next version of Llama led by Jack Rae.
Meta has reportedly scraped revenge porn websites to train its AI models.
Microsoft is reportedly taking a “more careful approach” in its Grok 4 Azure rollout after the MechaHitler fiasco.
Google DeepMind announced Genie 3, an AI world model that creates interactive 3D environments in real time.
Apple has reportedly formed a team called “Answers, Knowledge and Information,” tasked with “creating a new ChatGPT-like search experience.”
Google agreed to pause non-essential AI workloads during power demand spikes to protect electrical grids.
Palantir topped $1bn in quarterly revenue for the first time.
Broadcom announced a new high-bandwidth networking chip which it says could help with decentralized training.
SoftBank has increased its stakes in Nvidia, TSMC, and Oracle.
AMD refused to give guidance on how many MI308 chips it would sell to China now that export controls on them might be lifted.
Foxconn reportedly plans to start making AI servers in Ohio.
Meta acquired AI audio startup WaveForms.
Pimco and Blue Owl will reportedly lead $29bn in debt financing for Meta’s data center projects.
Apollo Global said it’s buying a majority stake in Stream Data Centers.
CoreWeave's $9bn takeover of Core Scientific is facing revolt from some Core Scientific shareholders.
Interconnect provider Amphenol plans to buy the connectivity and cable unit of CommScope for $10.5bn.
Reflection AI is reportedly in talks to raise $1bn to build open-source AI models.
Runway is reportedly in talks to raise $500mn at a $5bn+ valuation. Luma AI is reportedly aiming to raise $1.1bn at a $3.2bn valuation.
German AI company n8n is reportedly raising at a $2.3bn valuation.
AI-driven drug discovery company Chai raised $70mn from OpenAI and others.
Moves
Stephanie Mertz Patel, formerly of the Senate Commerce Committee, is joining the Information Technology Industry Council to work on AI.
Annika Olson joined Americans for Responsible Innovation as director of government affairs.
Mustafa Suleyman is reportedly poaching Google employees for Microsoft. Over two dozen have joined.
But Anthropic has a “quiet edge” in the recruiting war, reports WSJ, hiring engineers 2.68 times as quickly as its losing them, well ahead of OpenAI, Meta and Google. The company attributes the success to its safety focus.
Cognition offered buyouts to around 200 employees who joined from Windsurf last month.
Two former TSMC staff were arrested for allegedly stealing its technology. Tokyo Electron fired an employee in Taiwan too.
Derek Robertson is leaving Politico.
Best of the rest
Greg Burnham of EpochAI said we shouldn’t update much on AI systems winning gold medals at the IMO, because the questions this year were fairly easy.
Wired has a piece on a NIST AI red-teaming study from last year, which never published its results “for fear of clashing with the incoming administration.”
Ryan Hauser’s interview with Lauren Wagner and Matt Mittelsteadt on the failed moratorium on state-level AI regulation is worth a read.
Google's AI-based bug hunter "Big Sleep" found 20 security vulnerabilities in open source software.
Google DeepMind researchers said we need “a new ethics for a world of AI agents.”
“Capital expenditures on AI data centers [are] likely around 20% of the peak spending on railroads” in the 18th century, according to an analysis from Paul Kedrosky.
Noah Smith notes that with a lot of the funding for such capex coming from private credit funds, an AI bust could “cause a financial crisis.”
Lots of interesting energy-related stuff this week:
Reddit claimed it was the most cited domain in AI-generated answers.
Perplexity launched an integration with Truth Social.
ElevenLabs launched an AI music generation service with commercial-use rights.
The WSJ’s got a big piece on how Disney’s thinking about AI (in short, it’s deeply uncertain).
Grok's new "spicy" video setting instantly generates nude deepfakes of Taylor Swift — even without explicit prompting.
A new survey found that experts believe it’s at least “50% likely that computers capable of subjective experience will exist by 2050.”
Thanks for reading; have a great weekend.
Very useful thanks