The AI copyright question has no easy answers
Nobody can agree on how copyright should apply to AI — or if copyright is fit for purpose at all.
By Chris Dorrell
“Only one thing is impossible for God: To find any sense in any copyright law on the planet,” Mark Twain famously said. With the rise of AI, God’s life is only getting more difficult.
Over 50 cases of copyright infringement have been brought in the US against tech firms for scraping copyrighted material without the permission of creators, covering music, literature, photography, journalism and other creative endeavors.
Adding to the law’s complexity, disputes about copyright almost inevitably get entangled with broader notions of creativity, authorship, and the nature of human expression.
Copyright exists to promote artistic and cultural production, but it does so through competing means. On the one hand, it seeks to incentivize the production of creative work by effectively guaranteeing artists a reward for their labor through the right to distribute their own work. On the other, the laws should not be so tight as to stand in the way of cultural dissemination, which could block new forms of creativity.
In the US, and the Anglosphere more broadly, the concern with public utility is the most important justification for copyright’s existence. Article I of the US Constitution, the basis for both copyright and patent law, guarantees authors and inventors the “exclusive right to their respective writings and discoveries” in an attempt to “promote the progress of science and useful arts.”
AI clearly raises questions about the technical aspects of copyright — but some scholars argue that it also undercuts the very foundations of the law itself. So can copyright cope with the age of AI?
The cases so far
Each area of creative endeavor has its own nuances in copyright law, but the essential dispute is the same across the board: Big Tech companies have used copyrighted material to train their AI models, which could then compete with the creatives whose work was used to train them.
There are broadly two areas where Big Tech firms could fall foul of copyright law: either in the training process, or as a result of what the AI model produces.
Regarding training, defendants across the board have appealed to the idea of ‘fair use’. Fair use is an exception to copyright law which permits copyrighted materials to be reproduced without permission in a range of circumstances. Classic examples include teaching, criticism or research, but it is judged on a case-by-case basis (part of the reason why God finds copyright law such a challenge).
To determine whether something is fair use, judges consider four factors.
The purpose and character of the use, such as whether the new work is commercial or not-for-profit, and whether it is ‘transformative’ (that is, adding new meaning or expression).
The nature of the copyrighted work, such as whether it is fact or fiction. Creative works receive stronger protections than factual ones.
The substantiality of the portion taken. Taking an entire work is more likely to infringe than just taking sections.
The impact of the use upon the potential market, particularly whether the use is likely to deprive the original creator of an income.
Judges apply the four factors at their own discretion. None of them is determinative.
Bartz v. Anthropic and Kadrey v. Meta are two of the most high-profile cases to have been decided so far. Both involve authors whose work had been used without permission to train large language models.
Anthropic and Meta argued that although copyrighted texts had been copied during training, the models did not retain or reproduce expressive content. Instead, they analyzed patterns in language to produce statistical weights used to generate new text.
In both cases, the judges mostly ruled in favor of the tech firms, albeit with a number of caveats. The judges applied transformative-use reasoning to the training process, concluding that the LLMs were doing something genuinely original with the copyrighted inputs, not simply copying or regurgitating. Judge William Alsup in the Anthropic case said the output was “spectacularly” transformative.
But from here, the cases diverged. Anthropic had used pirated copies as well as legitimately acquired books to train Claude, and Alsup argued that the pirated texts did not fall under the doctrine of fair use. (Anthropic later agreed a $1.5b settlement with the authors.) But so long as the texts were acquired legitimately, Alsup put great weight on the transformative nature of the training.
“Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different,” he wrote.
The analogy between AI training and human learning meant that Alsup did not put significant weight on the potential for LLMs to harm the market for the very products on which they had been trained, an important dimension of fair use cases. He argued it would be like complaining that teaching school children to write might end up creating an explosion of books.
In contrast, Judge Vince Chhabria in the Meta case was clearly nervous about the possible market impact. He explicitly criticized Alsup’s school children analogy, arguing that “when it comes to market effects, using books to teach children to write is not remotely like using books to create a product that a single individual could employ to generate countless competing works with a miniscule fraction of the time and creativity it would otherwise take.”
“No matter how transformative LLM training may be, it’s hard to imagine that it can be fair use to use copyrighted books to develop a tool to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works that could significantly harm the market for those books,” he wrote.
But the plaintiffs brought such weak evidence of possible market dilution, he said, that it didn’t affect his final judgment. “This ruling does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one.”
“Market dilution will often cause plaintiffs to decisively win the fourth factor — and thus win the fair use question overall,” Chhabria said.
The ruling suggests that the most important part of future legal disputes will be proving that AI models pose a commercial threat to creatives. This will differ immensely between industries and even genres, making it difficult to draw any firm conclusions about the overall impact. Nevertheless, the Copyright Alliance, an advocacy group which represents creatives, said this “basically provided a roadmap” for plaintiffs looking to bring forward cases against Big Tech firms.
Andres Guadamuz, a reader in intellectual property law at the University of Sussex and editor-in-chief of the Journal of World Intellectual Property, wasn’t so sure.
“While competition and market impact are relevant to the fourth fair-use factor, copyright is not meant to insulate creators from competition or preserve markets in perpetuity,” he said. “Market harm may feature at the margins, but it’s unlikely to become the dominant subject of future litigation,” he predicted.
Guadamuz believes that courts will instead focus on a narrower question under the first fair-use factor: whether training constitutes genuinely transformative use, or whether it amounts to direct substitution. Inevitably this question has a bearing on the potential market impact, but it is conceptually distinct.
A smaller case, Reuters v. Ross Intelligence, provides an example. There, a legal-tech startup trained its AI on Reuters’ legal summaries, known as headnotes, to produce a competing legal-research search engine. The judge found that Ross’s use was not transformative precisely because of its aims. “Ross took the headnotes to make it easier to develop a competing legal research tool,” he wrote, “so Ross’s use is not transformative.”
The transformative vs. substitutive question will be central in the most prominent ongoing case, New York Times v. OpenAI. As well as the concerns about large-scale scraping of copyrighted material, the New York Times has evidence that ChatGPT can regurgitate its articles almost verbatim under specific circumstances, potentially drawing readers away from the newspaper.
“This shows that they were trained on those articles, and they didn’t just learn the facts in them or the general style of newspaper reporting. They actually learned the specific expression of the Times reporters who wrote them. That’s a stronger case for infringement than a lot of the other plaintiffs have had,” said James Grimmelmann, professor of digital and information law at Cornell Law School.
It is difficult to generalize based on such a small number of cases, but a few points seem important. AI models which offer something genuinely new will find it much easier to win the first factor, which will put pressure on plaintiffs to find reliable evidence that the models will cause significant market harm. Given how little we know about how AI will be used, this could be a real challenge. But where models effectively replicate existing products or individual styles, plaintiffs will have a much stronger case in both the first and fourth factor.
It could be years before the technical questions are resolved in each industry. Rather than waiting for cases to be determined in lengthy legal battles, many firms, including the New York Times, have signed licensing deals with tech firms. But will copyright itself cope with the sheer pace of change?
The end of copyright?
Copyright emerged in the 18th century to address a specific market failure in the wake of the printing press: creative works are expensive to produce but cheap to copy, so without legal protection, authors couldn’t recoup their investment. If society wanted creative works to exist, copyright was necessary to incentivize it.
But AI inverts this logic, by making creative works almost free to produce, not just copy. Mark Lemley, director of Stanford Law School’s Program in Law, Science, and Technology, has argued that copyright could become obsolete in this context. “We need copyright only if we think we won’t get enough creation without it,” he wrote in the Science & Technology Law Review. “That may no longer be a worry in the world of generative AI.”
The new market failure, then, is how to get people to pay attention to human-made work of aesthetic value — and how to continue to incentivize its production.
“The thing that I’m worried about is flooding human attention with slop that distracts people from the more interesting and important things we can see,” said Grimmelmann.
It is possible that the market will take care of things. A 2023 paper found that people tend to prefer AI-created artworks when they think a human created them, suggesting a general bias against AI creation. Similarly, a Harvard University survey showed that 62% of respondents said they would value an artwork less if it was produced by AI. Indeed, it is easy to imagine how increasingly artificial artworks could enhance demand for authentic human creativity.
Artists may also find other, uniquely human, products to sell. This is already happening in the music industry: many artists get so little from streaming services that, in a strict commercial sense, their music basically acts as an advertisement for live shows and merchandise. A 2018 Billboard report showed that the top 50 acts in the world received around 80% of their revenue from touring, compared to just over 8% from streaming.
It is, however, possible — and plausible — that the market will fail. While people say they prefer human-generated products, the popularity of AI image and video generation tools suggests otherwise. And it is hard for an expensive human to compete with an infinite feed of free AI-generated content.
It is not obvious what the solution is, but it is doubtful whether copyright can adequately address the challenges. The fact is that there has been a fundamental change in the nature of creative production. It is plausible that we would still be having similar debates about AI even if the models had been trained only on licensed content.
Copyright came into existence in a world still getting to grips with the mass reproduction of texts, largely replacing the patronage system through which artists had previously been remunerated. Grimmelmann suggested there was no reason why such a major shift couldn’t happen again.
“We now have AI technologies that could make creation and innovation much cheaper. We’ll see what the quality of those things is, but it’s certainly possible that we have AI systems that can enable creativity of the sort which copyright wants to encourage,” he said. “If that’s true, then what comes after copyright could be as different from copyright as copyright was from patronage.”
Chris Dorrell is a freelance journalist.






Hi, I attended the AI vs. creatives hearing put on by California state legislators last month at Stanford. The creatives brought a rock solid case of course, but the senators and assemblymembers really fawned over the tech companies. Almost as if they had stock in them? And of course they're all really special because...Stanford. But I digress. The elected leaders really seemed to suggest that creatives owe it to the country to give up our work for sake of capitalism. They didn't say it in so many words, but that's is what they meant. I am still appalled. If the US is good at anything, it's watering down robust laws until they are meaningless. Without copyright, there is no democracy.