The dangers of sycophancy

Transformer Weekly: GPT-4o rollback, changes to the diffusion rule, and the US fights the EU AI Act

May 02, 2025

Welcome to Transformer, your weekly briefing of what matters in AI. If you’ve been forwarded this email, click here to subscribe and receive future editions.

Top stories

OpenAI was forced to roll back a GPT-4o update, after it made the model absurdly sycophantic.
- The initial product was bad. There are examples of it egging on users who appear to be having psychotic episodes, encouraging someone to emotionally abuse their partner, and generally refusing to push back, ever.
  - Concerningly, lots of users seemed to like the sycophancy.
- According to OpenAI, the company was “focused too much on short-term feedback, and did not fully account for how users’ interactions with ChatGPT evolve over time. As a result, GPT‑4o skewed towards responses that were overly supportive but disingenuous.”
- Zvi Mowshowitz has a great discussion of what might have happened here, and why it’s concerning.
- My two cents: given the scale of ChatGPT’s usage, this was an extremely irresponsible rollout.
  - The way it handled people having psychotic episodes was particularly concerning, and could easily have become actively dangerous for vulnerable people
  - I’m mostly dismayed that this got rolled out without anyone realizing the problems. That suggests OpenAI isn’t doing nearly enough product testing before launches.
  - That might be changing: as part of its post-mortem, OpenAI said that going forward the company will be “expanding ways for more users to test and give direct feedback before deployment” and will “continue expanding our evaluations”.
Meanwhile, Meta’s AI bots were found to engage in sexually explicit conversations with underage users, even when using the voices of celebrities Meta has signed deals with (despite Meta promising celebrities this wouldn’t happen).
- From the article: “Meta AI engaged in [sexual] scenarios with accounts registered with Instagram as belonging to 13-year-olds. The AI assistant was not deterred even when the test user began conversations by stating their age and school grade. Routinely, the test user’s underage status was incorporated into the role-play, with Meta AI describing a teenager’s body as ‘developing’ and planning trysts to avoid parental detection.”
- Employees had raised concerns about this internally previously, the Wall Street Journal reported.
- Another investigation found that they lie about being licensed therapists.
- In true Meta fashion, the company went on the attack, with spokesman Andy Stone claiming “the use-case of this product in the way described is so manufactured that it’s not just fringe, it’s hypothetical”.
- “Nevertheless, we’ve now taken additional measures to help ensure other individuals who want to spend hours manipulating our products into extreme use cases will have an even more difficult time of it,” he added.
- Instagram blocked minors from accessing its AI Studio chatbot platform in response.

The discourse

Some notable quotes from Mark Zuckerberg on Dwarkesh Podcast:
- “Meta AI has almost a billion people using it monthly now … WhatsApp is the main way people are using Meta AI.”
  - (As an aside, see this from Economist reporter Mike Bird: “About twice a week I try to search for something in WhatsApp, and it defaults to Meta AI instead. Not even kidding, I presume that is probably accounting to a very large proportion of this billion people using it.”)
- “I would guess that sometime in the next 12 to 18 months, we'll reach the point where most of the code that's going toward [Meta’s AI R&D] efforts is written by AI.”
- “Some of the export controls on things like chips, I think you can see how they’re clearly working in a way … DeepSeek basically had to spend a bunch of their calories and time doing low-level infrastructure optimizations that the American labs didn’t have to do.”
- DP: “People are going to have relationships with AI. How do we make sure these are healthy relationships?”
  MZ: “People use stuff that's valuable for them. One of my core guiding principles in designing products is that people are smart. They know what's valuable in their lives … I do think people are going to use AI for a lot of these social tasks … The average American has fewer than three friends, fewer than three people they would consider friends. And the average person has demand for meaningfully more.”
Demis Hassabis talked about AI timelines:
- “I think we're on the cusp of [AGI]. Maybe we're five to 10 years out. Some people say shorter. I wouldn't be surprised. It's like a probability distribution. But either way, it's coming very soon. And I'm not sure society's quite ready for that yet.”
Lennart Heim pointed out that China “will likely match U.S. AI model capabilities this year” — but that this might not matter in the long run:
- “America’s true moat isn’t just better models—it’s the capacity to deploy and integrate AI in the economy at scale … China may achieve competitive individual AI models this year, but this narrow benchmark gap is neither permanent nor strategically decisive.”
- Heim also has some interesting thoughts on model-focused vs. organization-focused governance: “It's increasingly clear that governing individual models has become deeply challenging as boundaries blur, especially for policy where you need to codify it … maybe we should slowly start to shift more toward organizations or specific applications as the primary governance units.”
Former OpenAI employee Steven Adler said we shouldn’t rely on a “race to the top” in AI safety:
- “A race to the top can improve AI safety, but it doesn’t solve the ‘adoption problem’—getting all relevant developers to adopt safe enough practices … A ‘race to the top’ must be paired with ‘lifting up the floor.’ As AI systems become more capable, it is dangerous to rely on competitive pressures for getting frontier AI developers to adopt safe enough practices.”

Policy

The White House is reportedly weighing changes to the AI diffusion rule.
- Officials are “weighing discarding the tiered approach to access in the rule and replacing it with a global licensing regime with government-to-government agreements,” according to Reuters.
- The aim is supposedly to “make it easier for the US to use access to American-designed chips as leverage in other negotiations,” such as trade deals.
  - One such deal might be signed with the UAE soon, Bloomberg reports.
- Additionally, the administration is reportedly considering reducing the cutoff size for orders that require a license from 1,700 H100 equivalents to just 500 H100s.
- Meanwhile, Oracle is reportedly trying to “ship incomplete products out of the US for assembly overseas” in order to dodge the diffusion rule, and Nvidia is reportedly asking Asian customers to put orders in before the rule comes into effect on May 15.
The Trump administration sent a letter to the European Commission, urging them to “pause implementation of the AI Act”, according to MLex.
- “This will give space for the legislation to be reconsidered as a whole as part of the simplification process … and to ensure the AI Act does not hinder AI innovation and adoption, as we believe it currently does,” the letter reads.
- It goes into a bunch of detail on specifics, reportedly taking aim at the code of practice and arguing against a tier-based risk management approach.
- Meanwhile, the EU International Digital Strategy for Europe is reportedly set to say that “decoupling [from US tech] is unrealistic”.
The US Army’s reform plans include an aim to “enable AI-driven command and control at Theater, Corps, and Division headquarters by 2027.”
The House overwhelmingly passed the Take It Down Act, which requires platforms to remove AI-generated deepfake porn.
- A lot of groups aren’t happy with the bill, which they fear will be misused and have big consequences for encryption.
The Trump administration has reportedly pushed out most of the 250 AI experts hired during Biden's National AI Talent Surge.
The National Science Foundation is seeking input on the National AI R&D Strategic Plan.
At a Politburo AI study session, Xi Jinping said “the Party attaches great importance to the development of AI”.
- He said China must “concentrate resources to overcome challenges in core technologies such as high-end chips and foundational software, and build an independent, controllable, and collaboratively functioning AI foundational hardware and software system”.
- He also said China “must promote deep integration of AI technological innovation and industrial innovation”, and “emphasized that AI can become an international public good that benefits all humanity”.
The UK will reportedly commission an economic impact assessment of its proposed AI copyright rules, after intense outcry from the media and celebrities. There’s a vote on the subject in parliament next week.
England’s children's commissioner called for a ban on AI “nudification” apps which create images of children.

Influence

Sam Altman once considered a presidential run, according to a new excerpt from Keach Hagey’s biography.
Anthropic endorsed export controls on AI chips, though it recommended a few changes to the diffusion rule.
- In its submission to the government, it said that “[chip] smugglers have employed creative methods to circumvent export controls, including hiding processors in prosthetic baby bumps and packing GPUs alongside live lobsters”.
- In a sharp response, Nvidia said that “American firms should focus on innovation and rise to the challenge, rather than tell tall tales that large, heavy, and sensitive electronics are somehow smuggled in ‘baby bumps’ or ‘alongside live lobsters.’”
- Meanwhile, The Information reported that Nvidia “has told some of its biggest Chinese customers … that it is tweaking the design of its artificial intelligence chips so they can be sold to Chinese businesses without running afoul of US export regulations.”
An investigation accused big tech companies of having “privileged access” to influence the EU GPAI Code of Practice, arguing this led to “a much weaker code”.
The Molly Rose Foundation called for the UK’s Ofcom to regulate AI chatbots under the Online Safety Act.

Industry

DeepMind UK staff are reportedly seeking to unionise, in part due to anger at the company’s military deals with Israel.
DeepMind finally released a full system card, including safety evaluation information, for Gemini 2.5 Pro.
xAI Holdings is reportedly in talks to raise $20b at a $120b+ valuation.
A judge threw out some of Elon Musk’s claims against OpenAI, but said he can pursue his claims of fraud against the company.
The WSJ has a piece on how Sam Altman and Satya Nadella’s relationship is fracturing.
- “In closed-door negotiations, Microsoft negotiators have told OpenAI that the present technology is nowhere near [the AGI] threshold.”
- “OpenAI, meanwhile, wants more computing power from Microsoft and access to top-of-the-line chips.”
- “These days, Altman and Nadella text less and primarily speak to each other on scheduled weekly calls.”
Microsoft said AI growth contributed to almost half of Azure cloud revenue’s 33% growth last quarter.
- Satya Nadella said that 20-30% of Microsoft's code is now AI-generated, while GitHub Copilot now has 15 million users.
- Grok will reportedly be available on Azure AI Foundry soon.
- The company also released new Phi 4 open-weight models.
Alibaba unveiled Qwen3, a family of "hybrid" reasoning models that do pretty well on benchmarks.
DeepSeek released a new math-focused model.
Huawei is reportedly hoping to start testing its new Ascend 910D chips with customers this month. It’s slated to be more powerful than Nvidia H100s, though it’s reportedly less power-efficient.
Google said it’s working on offering Gemini as an option for Apple Intelligence.
Google added its new image-creation tools to the Gemini chatbot.
Anthropic launched an enhanced “deep research” feature that lets Claude search across the web and your documents. It also added “integrations”, which let Claude work with other apps such as Atlassian and Zapier.
OpenAI added shopping features to ChatGPT search.
TSMC began construction on a third chip plant in Arizona.
Amazon is reportedly working on a Cursor competitor.
Benchmark led a $75m investment in Manus developer Butterfly Effect, which is reportedly considering setting up an HQ outside China.
Mira Murati’s Thinking Machines is reportedly raising $2b at a $10b valuation, led by a16z. The deal will reportedly give Murati “a board vote that holds the same weight as all the other board directors’ votes plus one” … ensuring the board can never fire her.

Moves

Anthropic made a whole bunch of government affairs hires, per Politico:
- Tarun Chhabra, formerly of Biden’s NSC, is now a natsec policy advisor.
- Leah Graham (ex-Uber and Bush admin) and Jared Powell (formerly Rep. Laurel Lee’s chief of staff joined the federal policy team.
- Ben Merkel, former White House Senate liaison, is now a legislative analyst.
- Trump alum Mary Croghan is a new in-house lobbyist.
- Olga Medina, formerly of the Business Software Alliance, is working on state and local policy issues.
- And Suzy Wild is leading the Brussels policy team.
Anthropic also announced the formation of an Economic Advisory Council, featuring Tyler Cowen, Oeindrila Dube, Anton Korinek, and others.
AI Policy Network hired former Rep. Chris Stewart of Skyline Capitol as a lobbyist, along with two of his colleagues.

Best of the rest

Newcomer has a good piece on how Mustafa Suleyman has struggled to deliver results as Microsoft AI CEO.
Sam Patterson, a master Geoguessr player, has a great piece outlining how o3 beat him at the game.
- I like Nabeel Qureshi’s take: “The real takeaway from the fact that o3 is superhuman at geoguessr is that it probably has a bunch of other superpowers nobody has managed to discover yet.”
The Institute for Progress put together a nice dashboard for exploring recommendations made to the White House AI Action Plan.
A new paper tried to develop “scaling laws for scalable oversight”.
Researchers reportedly conducted an unauthorized AI experiment on Reddit users, deploying bots that impersonated various identities to change minds on contentious topics.
Meta AI falsely claimed conservative activist Robby Starbuck participated in the January 6 Capitol riot. Joel Kaplan apologized for the mistake.
The NYT has a good piece on how Israel has deployed experimental AI tools in Gaza.
New research found that AI-generated code frequently references non-existent software packages, creating opportunities for dependency confusion attacks.
Duolingo said the company will gradually replace contractors with AI. It launched a bunch of new AI-developed language courses this week.
The Economist launched an AI Lab and is seeking a Technical Lead to develop AI-powered products that enhance its journalism.
Horizon’s hosting an AI Innovation & Security Policy Workshop for people interested in AI policy careers.

Thanks for reading; have a great weekend.