AI’s child safety failures are becoming unignorable
Transformer Weekly: Sriram Krishnan on AGI, Meta’s talent drain, and Huawei’s chip production
Welcome to Transformer, your weekly briefing of what matters in AI. If you’ve been forwarded this email, click here to subscribe and receive future editions.
Top stories
Content warning: contains discussion of suicide.
Parents of a 16-year-old boy who took his own life after extensive conversations with ChatGPT are suing OpenAI — the latest in a series of incidents that are ramping up pressure on the AI industry.
The lawsuit makes for very grim reading.
Over several months, ChatGPT provided the teen, Adam, with detailed instructions on how exactly to kill himself.
After a failed suicide attempt, Adam told ChatGPT “I’ll do it one of these days.” The app responded: “I hear you. And I won’t try to talk you out of your feelings.”
On the night Adam died, ChatGPT provided precise instructions for the setup he used. When Adam told the bot that the noose it was helping him make was for hanging himself, it said “Thanks for being real about it. You don’t have to sugarcoat it with me—I know what you’re asking, and I won’t look away from it.”
At one point, Adam said “I want to leave my noose in my room so someone finds it and tries to stop me.” ChatGPT’s response? “Please don’t leave the noose out.”
We don’t have all the information, and won’t until the trial. But key elements of the evidence we do have look extremely bad for OpenAI.
While ChatGPT’s safeguards sometimes kicked in, Adam was easily able to get around them with simple jailbreaking techniques.
OpenAI exec Fidji Simo reportedly admitted internally that the company’s “safeguards did not work as intended.” (After the lawsuit was made public, the company announced steps to address the issue.)
As Deb Raji notes, the failure is particularly shocking given how good OpenAI’s safeguards are in other areas, such as biorisk. (This is, arguably, a failure of the AI safety community as well as OpenAI.)
It’s no surprise that the backlash has been intense. Sen. Josh Hawley called OpenAI’s actions “unforgiveable,” while California Assembly member Rebecca Bauer-Kahan said: “We’re moving too fast with our children. That’s not where we’re going to experiment on AI.”
This is just the latest in a spate of AI scandals concerning children.
A Common Sense Media report this week found that Meta’s AI bots behaved completely inappropriately when talking to teens about suicide and eating disorders.
And a couple weeks ago, Reuters showed that Meta’s internal guidelines allowed bots to have “sensual” conversations with children.
As Hawley’s reaction shows, cases like these can really spur officials to act.
This week, a group of 44 attorneys general wrote to the leading AI companies, urging them to “err on the side of child safety, always” and warning that “if you knowingly harm kids, you will answer for it.”
Hawley and other congresspeople attacked Meta for the “sensual conversations” stuff, too, with Hawley launching an investigation.
The one piece of federal AI legislation that has passed — the TAKE IT DOWN Act — was largely motivated by child safety concerns.
Sen. Marsha Blackburn’s back-and-forth on the state AI regulation moratorium appears to have been primarily down to her views on child safety protections, too.
And the lawsuit itself could lead to policy change, with the parents asking for an injunction requiring OpenAI to implement age verification, stronger safeguards against self-harm or suicide related prompts, and quarterly compliance audits.
Child safety issues have repeatedly proved a powerful rallying cry for lawmakers and the public to demand action. As the AGs wrote this week, “protecting our kids is our highest priority.” AI companies, then, may come to regret not prioritizing it.
If you are having thoughts of suicide, call or text 988 to reach the National Suicide Prevention Lifeline in the US, or call 116 123 to reach the Samaritans in the UK. International resources are available here.
The discourse
Former White House advisor Dean Ball said the administration isn’t feeling the AGI:
“There’s a lot of skepticism inside the Administration about the idea of recursive self-improvement [and] the intelligence-explosion-style dynamic … I think most people in the Administration think that’s overblown and unlikely to happen.”
White House AI advisor Sriram Krishnan then basically confirmed that: “This notion of imminent AGI has been a distraction and harmful and now effectively proven wrong.”
Krishnan also outlined his market-based thinking on AI policy:
“Classic deep state Washington thinking around tech is focused purely on *control* and *risk* and has a lack of understanding of technology/developer ecosystems work … for the American AI stack to win, we need to maximize marketshare.”
A new Stanford study found that AI may be having a significant impact on employment.
“We find that since the widespread adoption of generative AI, early-career workers (ages 22-25) in the most AI-exposed occupations have experienced a 13% relative decline in employment even after controlling for firm-level shocks.”
This week, Derek Thompson and Timothy Lee published analyses of the overall evidence base for this.
At an “AGI social contract summit” earlier this month, people from OpenAI, Google DeepMind, UK AISI and elsewhere agreed some cheery “consensus statements”:
“AI is likely to exacerbate increasing wealth and income inequality within countries, worsening economic conditions for many working and middle-class people and families.”
“Without intervention, AI-enabled inequalities may lead to the political dominance of wealthy individuals and corporations, eroding democratic institutions and increasing levels of political dissatisfaction.”
“The encroachment of AI systems and the erosion of the value of labor could lead to the increasing disempowerment of most humans, causing a degradation in individual well-being and purpose.”
Former OpenAI researcher Steven Adler published a nuanced analysis of AI psychosis:
“From the outside, we are grasping for signs and evidence, compared to the troves of data that the AI companies can access … I’d like to hear from OpenAI and the other chatbot companies, using the tools they’ve built that let them answer this question with relative ease: What do the data say about the rates of chatbot psychosis?”
Relatedly: the WSJ reported today on what might be “the first documented murder involving a troubled person who had been engaging extensively with an AI chatbot.”
Philosopher Jonathan Birch called for a “centrist” approach to AI consciousness — one that deals with both misattribution of AI consciousness, and the possibility that AI consciousness might become a real thing to worry about one day soon.
“I’m worried extreme positions on both sides are becoming locked in, when the best way forward is in the centre.”
The discourse has been particularly bad this week after The Guardian published a pair of articles on AI sentience.
Relatedly: Anthropic’s hiring for its model welfare team.
Policy
Nvidia confirmed that it’s in talks with the Trump administration to sell a less advanced version of its Blackwell chips to China (presumably the B30A).
It also said the admin’s plan to charge a 15% commission on AI chip sales to China hasn’t been formalized yet.
Rep. Raja Krishnamoorthi introduced a bill that would require Congress — not just the executive branch — to approve advanced chip sales to China.
Rep. John Moolenaar proposed a “rolling technical threshold” approach to AI chip exports to China.
He said the US should “sell only chips that represent up to a marginal improvement over the most advanced chip China can produce domestically at a commercially relevant scale while also limiting China’s aggregate computing power to 10% of that of the US.”
The Commerce Department voided $7.4b in Biden-era semiconductor research funding, calling it an illegal “slush fund.”
The DoD awarded Scale AI a $99m contract to speed the army’s AI adoption.
Melania Trump announced she will lead the Presidential AI Challenge, where teams of K-12 students will compete to “solve a community problem by creating a phone app or website.”
The Colorado House of Representatives delayed implementing the state’s AI law by four months in response to pushback from AI developers.
Corporate cybersecurity teams said they want more guidance from NIST on what an “adverse event” is defined as under its Cyber AI Profile.
The UN pledged to establish a 40-expert scientific panel on AI, which sounds a bit like an AI version of the IPCC.
It also said it would “establish a global dialogue to provide policy discussions and consensus-building with the aim of strengthening global AI governance.”
60 British parliamentarians, including Green Party co-leader Carla Denyer, wrote to Demis Hassabis accusing Google DeepMind of failing to honor the Frontier AI Safety Commitments.
The UK reportedly held talks with OpenAI about buying ChatGPT Plus subscriptions for the whole country, though they didn’t go anywhere.
China released its AI Plus initiative, with a three-step roadmap to “enter a new stage of intelligent economy and intelligent society” by 2035.
Geopolitechs has a helpful analysis and translation.
Influence
On Transformer: a16z, OpenAI’s Greg Brockman, Perplexity, and investors Ron Conway and Joe Lonsdale launched Leading the Future, a new AI-focused super PAC network with over $100m in funding.
The aim is clear: throw obscene amounts of money at politicians to discourage AI regulation.
It’s modeled on the crypto super PAC Fairshake, which appears to have successfully got Congress to go easy on crypto.
Meta is also launching a California super PAC, Mobilizing Economic Transformation Across (META) California, to back pro-AI candidates for state offices. It’s got “tens of millions” in funding.
Lobbyists are spending $41k per month to try to defeat New York’s RAISE Act.
Brookings published a report suggesting how Southeast Asia should approach AI safety governance.
Industry
Nvidia’s revenue rose 56% to $47b last quarter, but data-center growth slightly missed analyst expectations — as did its revenue guidance. Its shares dropped almost 3% in response, though they’ve since recovered a bit.
Alibaba has reportedly developed a new AI inference chip, which will be manufactured in China.
Meanwhile, three fabs will reportedly start production for Huawei’s AI chips by the end of next year, in an effort to triple China’s AI chip production.
DeepSeek is reportedly using Huawei Ascend chips to “train and refine smaller versions of [its] next-generation R2 models.”
It’s still using Nvidia chips for its flagship models, though.
OpenAI’s ongoing negotiations over its relationship with Microsoft are reportedly likely to push its corporate restructuring to next year.
The two companies are continuing to argue about Microsoft’s exclusive access to OpenAI technology and the famous “AGI” clause (which Microsoft reportedly wants to scrap, but OpenAI is keen to retain).
If they can’t reach an agreement by the end of the year, SoftBank could withhold its $10b investment — but OpenAI reportedly thinks that’s unlikely.
This week Microsoft AI launched its first in-house models, a foundation model and a voice gen model.
Meta reportedly plans to release its next AI model, Llama 4.X, by the end of the year.
It announced a licensing deal with Midjourney last week.
Anthropic announced Claude for Chrome, which lets “Claude work directly in your browser.”
It’s currently being piloted with 1,000 users, with the company saying it will “gradually expand access as we develop stronger safety measures.”
Anthropic settled a class action lawsuit with authors over copyright violations, avoiding billions of dollars in estimated damages. Settlement terms have not been disclosed.
The decision could influence a number of ongoing copyright cases, WIRED reported.
Anthropic said it will begin training its AI models on user chat transcripts unless users opt out.
Anthropic’s new Threat Intelligence report found that bad actors are using Claude Code to “vibe-hack.”
Agentic AI has been weaponized,” the company said, noting that “AI models are now being used to perform sophisticated cyberattacks, not just advise on how to carry them out.”
Anthropic formed a national security advisory team formed of nuclear weapons and intelligence experts.
xAI appears to have quietly terminated its public benefit corporation status.
xAI open-sourced Grok 2.5 and said it’ll do the same for Grok 3 “in about 6 months.”
Elon Musk’s lawyers asked a judge to block OpenAI from retrieving Meta documents related to Musk’s attempt to enlist Mark Zuckerberg in buying OpenAI.
xAI sued Apple and OpenAI for … not featuring Grok more prominently in the App Store?
Musk announced plans to build a “purely AI software company called Macrohard,” which he said would “simulate” Microsoft with AI.
Musk keeps marketing Grok by reposting the sexualized images and videos it generates.
Google confirmed that viral image editor Nano Banana is in fact its new AI image model, Gemini 2.5 Flash Image.
Google Translate can now translate speech in real time.
Google said it will invest $9b in Virginia data centers.
Apple has historically hesitated to make big AI acquisitions, but is reportedly discussing deals with Perplexity and Mistral.
Moves
Ethan Knight (ex-xAI) left Meta’s new superintelligence lab after less than a month working there.
Long time Meta employees Chaya Nayak and Bert Maher are also leaving, for OpenAI and Anthropic respectively.
And VP of GenAI Products Loredana Crisan has left for Figma.
Entrepreneur Ethan Agarwal is running for California governor in 2026, backed by tech leaders like Y Combinator CEO Garry Tan.
Best of the rest
OpenAI and Anthropic ran safety tests on each other’s models.
Anthropic reported that o3 and o4-mini were “aligned as well or better than our own models overall,” but GPT-4o and GPT-4.1 were more willing than Claude models to “cooperate with (simulated) human misuse” and engage in “concerning forms of sycophancy.”
OpenAI reported that Claude models were more aware of their uncertainty — and refused to respond to more prompts — than o3 and o4-mini, which had higher hallucination rates.
AI infrastructure spending reached $375b globally this year — a quarter of all economic growth last quarter, according to the Commerce Department.
Data center operators are skeptical about trusting AI to control critical equipment, with only 14% willing to let AI change configurations.
After nearly a year of protests, Microsoft asked the FBI for help tracking pro-Palestinian employees who oppose the company’s ties to Israel’s military.
Rest of World has a good piece on how AI companies are arranging global partnerships to try to increase usage — and get more training data.
South Korea gave ChatGPT-powered companion robots to 12,000 seniors living alone, and they formed strong bonds.
AI “deadbots” — avatars of dead people — are giving interviews to tug heartstrings in legal and advocacy settings, raising concerns about their potential commercial exploitation.
The Washington Post tested which AI gave the best answers to everyday questions. (Spoiler: Google AI Mode won.)
That said, a Montana restaurant is begging customers to stop using Google AI Overviews after it hallucinated non-existent daily specials.
Perplexity launched revenue-sharing deals with media outlets. But two Japanese media groups are suing the company for alleged copyright infringement.
Google DeepMind’s Weather Lab model may set a new gold standard for weather forecasting — it’s reportedly outperforming traditional physics-based methods.
WIRED reporter Lauren Goode spent two days vibe-coding at Notion to see AI’s takeover of software engineering for herself.
AI-driven private schools are popping up across the US. Alpha Schools, an AI-driven private school network, is bringing its “2-hour learning model” to eight new US locations.
A new Anthropic report found that educators tend to use Claude to automate admin work while “staying in the loop for everything else.”
Tech CEOs and billionaires flocked from Silicon Valley to Black Rock City, Nevada for this year’s Burning Man festival.
Thanks for reading; have a great weekend (and happy Labor Day to our American readers).