A federal judge has ruled that training Claude AI on copyrighted books—even without a license—was transformative and protected under fair use. But storing millions of pirated books in a permanent internal library? That crossed the line.
In this episode of The Briefing, Scott Hervey and Tara Sattler break down this nuanced opinion and what this ruling means for AI developers and copyright owners going forward.
Watch this episode on YouTube.
Show Notes:
Scott: What happens when an artificial intelligence company trains its models on millions of books? Some purchased, some pirated. In a closely watched ruling, a federal judge held that training the AI was fair use, likening the process to how a human learns by reading. But keeping pirated copies of those books in a permanent digital library, well, that crossed the line. I'm Scott Hervey, a partner with the law firm of Weintraub Tobin, and I'm joined today by my partner and frequent Briefing contributor, Tara Sattler.
We are going to break down the recent fair use ruling in the lawsuit over Claude AI, that's Anthropic’s AI, and explore what it means for the future of AI training on today's installment of the briefing. Tara, welcome back to The Briefing. Good to have you.
Tara: Thanks, Scott. I always enjoy being here with you.
Scott: Always enjoy having you. This one is a much-awaited decision because we have a number of these cases that are swirling around, challenging the process by which AI companies train their large language models. One of these cases involved the Anthropic AI Claude. Why don't we jump jump into this one, Tara, maybe you could give us some of the background of this particular case.
Tara: Absolutely. In 2021, Anthropic PVC, a startup founded by former OpenAI employees, set out to create a cutting-edge AI system, and that system would eventually become Claude. Like other large language models, Claude was trained on a vast amount of textual data, books, articles, websites, and more. But unlike many of its competitors, Anthropic took a controversial shortcut.
Scott: Right. Instead of licensing books or building a clean data set, Anthropic downloaded millions of copyrighted works from pirate sites like Books 3, Library genius, and the pirate, Library Mirror. In total, Anthropic downloaded over seven million pirated books, including works by authors Andrea Barth, Charles Graber, and Kirk Wallace, Johnson. Anthropic also purchased millions of print books, scanned them, and then created a digital central library of searchable files.
Tara: So the plaintiff sued, alleging that Anthropic infringed their copyrights by copying their works without permission. First, by downloading them from the pirate sites, and then by using them to train Claude, and finally, by keeping digital copies of the books in its internal library for potential future use.
Scott: All right, so let's now... So as we know, the lawsuit was filed, and Anthropic eventually moved for summary judgment based on fair use only. And in its ruling on Anthropic's motion, Judge Al up of the Northern District of California issued a very detailed and nuanced opinion. The opinion splits Anthropic's conduct into three key uses. The first is using the books to train the AI or the large language model, scanning and digitizing legally purchased print books, and thirdly, downloading and keeping pirated books in a permanent digital library. Each of these uses was evaluated under the Copyright Act's Four-Factor Fair-Use Test.
Tara: Right. Let's walk through how the judge applied the four fair use factors in each use. For anyone who needs a refresher, here are the statutory factors for fair use under Section 107 of the Copyright Act.
Scott: If you need a refresher, you're not listening to this podcast often enough. Go ahead, Tara.
Tara: Okay, so we'll refresh anyway. First is the purpose and character of the use, including whether it is commercial and whether it's tran...