Crypto was a wonderful Wild West of anarchic financial innovation, absurd idiocy and scamming. Lots of scamming. Then regulators came along and made everything a lot more sensible and boring. Given how fast Generative AI has developed — from computer science theory to high school cheating scandals in but a few years — it was inevitable that the lawsuits would quickly follow.
On Friday, the comedian Sarah Silverman joined authors Christopher Golden and Richard Kadrey in class action copyright lawsuits, claiming OpenAI and Meta had stolen material from their books to train their Large Language Models (LLMs). They allege the LLMs were trained on their books through pirated online libraries, such as Library Genesis and Z-Library. (No, I haven’t used them for years, don’t ask.)
LLMs work by processing enormous quantities of written language, by trawling the internet and learning how to recreate it. Ask ChatGPT for a recipe, and it will assemble a response informed from reading other relevant information — namely, other online recipes. Sometimes, it’ll produce completely original instructions. Other times though, it’ll be starkly close to a recipe by Jamie Oliver. Similarly, as the lawsuit claims: “If a user prompts ChatGPT to summarize a copyrighted book, it will do so.” And the users don’t have to buy the book to get that summary.
The scope of the lawsuits extends beyond these individual books though, and challenges the legality of LLMs, any service that trawls the web necessarily uses, summarizes and reproduces copyrighted materials, generally. But LLMs are uniquely reliant upon this, given the sheer quantity of information they need to train upon; it would be functionally impossible to sift through everything to ensure it’s only reading in the public domain.
Were the court to side with Silverman, it wouldn’t help artists, but it would kill much of the Western innovation in the space and guarantee Chinese technological superiority.
Thankfully, that outcome seems deeply unlikely. Silverman’s goal is likely a settlement, and I’m not convinced her underlying claims are sound.
Choose a random book on your shelf and ask ChatGPT to summarize it. If it produces a perfectly accurate summary, that does not mean it read the book. Rather, the internet is awash in book summaries — through paid services like Blinkest, Reddit posts, Wikipedia pages, Google Books blurbs and Spectator book reviews. If you could produce a convincing summary based on secondary sources, so can an LLM; and far more convincingly.
This isn’t to say LLMs haven’t been fed copyrighted material — ChatGPT can produce full passages from Harry Potter books when prompted — but a celebrity biography is fairly predictable, and among the most quoted, referenced books online.
The AI fight over intellectual property, however, is only just starting. Artists want their credit. Writers want their cut. Publications want linked. And AI companies have a huge amount of money. Ironically, it’s only a matter of time before AI companies start flinging copyright suits against each other, claiming one stole the code of the other, despite their models intrinsically relying on copyright violations.
There’s no consistent, coherent copyright framework for LLMs — and it’s only a matter of time before regulation tries to change that. Perhaps that will prove to be a good thing, but it probably won’t. In the meantime, the Wild West continues — and the LLMs will keep reading freely.
 
				




 
				 
				 
				 
				 
				 
				





