Anthropic’s piracy could make its copyright battle existential
Anthropic could have to pay billions in damages — and others may follow
For the companies developing the large language models that have captured the world’s attention, copyright suits go with the territory.
Outraged writers, authors and other creatives feel determined that copyright or IP protections must offer some form of protection against their work being used to train AI models that may threaten their future livelihoods.
The most high-profile of these cases is The New York Times vs OpenAI and Microsoft, which is working its way through the American legal system. But a less prominent case involving Anthropic might be the first to make a substantive ruling, potentially costing the company billions of dollars unless it can avoid a trial it claims is an existential threat to its business.
Anthropic, which in relative terms has pockets less deep than other AI giants, says that Bartz v. Anthropic PBC could cause “irreparable injury” if it is even allowed to come to trial, let alone if it loses. The claim that the case could be existential is a stark one, especially to Anthropic’s critics: if your business would collapse without ‘stealing’ copyrighted work, they argue, then what kind of business is it?
What is peculiar is that on some of the most fundamental legal questions, Anthropic has scored some huge wins for itself and the AI industry — and yet it seems alarmed that the relatively limited grounds on which the case continues are enough to sink it.
At its core, the case is simple: a small group of authors has challenged Anthropic for violating their copyright. Their argument was that training AI on books is intrinsically a violation of copyright, even if the books were legitimately purchased and digitized — and it’s an argument they lost.
Anthropic pushed for summary judgment, a process in which a jury trial can be bypassed by a judge considering the legal issues while assuming all the facts support the other party. Because Anthropic wanted summary judgment, the judge considered the facts in a way that helped the authors as much as possible.
Despite this, Judge William Alsup found in Anthropic’s favor when a book had been legitimately processed. Buying a book, cutting off its covers, scanning and digitizing it for use in a company library, and using that to train an AI, all qualify as “fair use,” he ruled earlier this year.
AIs, he reasoned, were ‘learning’ from books and using them to produce something distinct. “Authors cannot rightly exclude anyone from using their works for training or learning as such. Everyone reads texts, too, then writes new texts,” he wrote. The authors said that what was permissible for humans shouldn’t be allowed for AIs, and he once again rejected that argument.
“Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — to turn a hard corner and create something different,” Alsup wrote.
This is the fundamental issue at dispute between authors and AI executives, and it was decided thoroughly in favor of the latter. But Anthropic did not get everything its way, because of how it obtained some of the books.
While Anthropic eventually bought print copies of books and digitized them, it previously downloaded pirated copies in their millions. In 2021, as the order details, its cofounders obtained seven million books this way, from the Books3 and LibGen datasets. As Alsup wrote, Anthropic “could have purchased books, but it preferred to steal them to avoid ‘legal/practice/business slog,’ as cofounder and chief executive officer Dario Amodei put it.”
Alsup said the issue of the stolen books would have to come to a full trial in front of a jury, which would decide on the nature of the violation and any damages that were due — which would also require disclosure from Anthropic as to how many pirated books were used in training its models.
Anthropic is now trying to delay this trial — currently scheduled for December — as it attempts to appeal the summary judgment, having also demanded a stay on the case while that appeal is considered. Alsup last week scathingly denied that motion, essentially arguing that Anthropic should have to go through a trial before it gets to appeal.
“[Anthropic] seeks a sweeping rule, as a matter of law, that Anthropic was entitled to pirate all the copyrighted works it wanted and to keep them indefinitely so long as any part of the trove was further copied and used to train an LLM,” Alsup notes, adding this is fundamentally inconsistent with US copyright law. These are strident words from a judge that otherwise decided many aspects of the case in Anthropic’s favor.
Anthropic’s insistence that the case could prove fatal to its business may be a matter of legal tactics — it is typically necessary to show “irreparable” damage in order to secure a stay on a case, but outside experts and Alsup share skepticism on that front.
“The mere sending of a class notice and possible reputational harm does not seem like a strong argument,” says Professor Edward Lee, of Santa Clara Law, believing that Anthropic’s reasoning as to why the case must be delayed is overblown.
Lee, though, thinks if Anthropic lost at trial, it could lead to truly huge damages. Somewhat cheekily, he asked Anthropic’s Claude to help model how much the company might have to pay if it lost, by conducting thousands of simulations of the case and its possible outcomes based in part on the range of likely damages and precedent. Lee’s results estimated a 95% probability of damages in excess of $1bn, and a 59% probability of damages above $10bn.
The higher end of those predictions would indeed amount to an existential threat to a company that as of March had raised just under $15bn, and expects to burn about $3bn this year through its regular operations.
Jury damages are often reduced significantly on appeal, even if the ruling is not overturned, but companies are required to find the cash for a bond while the appeal proceeds — which could be a significant cost.
“Many of the potential awards would put the company in a bind in coming up with the funds to post a bond in that amount even to appeal the judgment,” says Lee. “The worst-case scenario is they’d have to file for bankruptcy or seek an infusion of funding.”
It has been noted that Amodei told staff the company would accept investment from Gulf states just days after Alsup said the case would go to trial, a major reversal for the firm. Anthropic turned down money from Saudi Arabia as recently as last year, while Amodei had written that only democracies should be able to influence frontier AI.
It is then possible, if unlikely, that this case could prove fatal for Anthropic — but because of its piracy, rather than because of the core conceit of AI training. If Alsup’s interpretation of the law stood, the fundamental model of AI companies would be sound: buying a single copy of a book, or access to read an article, would be enough for training purposes.
However, there is a twist. Many if not most of Anthropic’s rivals have allegedly made similar use of pirated material — meaning painful, though likely not fatal, damages could be industry-wide across the big frontier labs.
In his denial of Anthropic’s motion to stay, Judge Alsup noted that the company would only face paying out extraordinarily large damages if a jury had found its wrongdoing to be on a similar scale.
“If Anthropic loses big,” he concludes, “it will be because what it did wrong was also big.”
Disclosure: James Ball is the author of multiple books which appear in the LibGen dataset “stolen” by Anthropic. He is not party to any lawsuits against Anthropic or any other AI company.
James Ball is a journalist and author. He is soon to begin undertaking doctoral research in AI regulation at University College London.