Anthropic, an artificial intelligence (AI) startup backed by major tech companies like Amazon, is currently facing a significant legal challenge that could have far-reaching implications for the AI industry. On Monday, the company was served with a class-action lawsuit filed in a California federal court, accusing it of massive copyright infringement. This lawsuit, which involves high-profile authors, could potentially reshape how AI companies handle copyrighted material in their training processes.
The Accusations Against Anthropic
The lawsuit was brought forward by three well-known authors: Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson. These authors allege that Anthropic has built its multibillion-dollar business by unlawfully using their copyrighted works, as well as those of countless other authors. Specifically, they accuse Anthropic of downloading pirated versions of their books, making unauthorized copies, and then using these copies to train its large language models (LLMs), including its flagship model, “Claude.”
Anthropic, which was founded by former executives from OpenAI, has received significant investment from tech giants like Google and Salesforce, further amplifying the stakes of this lawsuit. The authors’ lawsuit claims that an essential component of Anthropic’s business model relies on the large-scale theft of copyrighted materials. They argue that the company has systematically downloaded pirated copies of books, many of which are readily available on illegal websites, and used them without permission to train its AI models.
The plaintiffs allege that Anthropic’s actions are not just incidental but are central to its business operations. The lawsuit explicitly states, “An essential component of Anthropic’s business model—and its flagship ‘Claude’ family of large language models—is the large-scale theft of copyrighted works.” This accusation is a critical part of the plaintiffs’ case, as they argue that the very foundation of Anthropic’s AI technology is built upon the illegal use of their copyrighted materials.
The Launch of Claude 3.5 and Rising Legal Troubles
This lawsuit comes on the heels of Anthropic’s launch of its most advanced AI model to date, Claude 3.5 Sonnet, which debuted in June. Claude has quickly become one of the most popular AI chatbots, rivalling other leading models like OpenAI’s ChatGPT and Google’s Gemini. The rapid rise of these AI models has been accompanied by growing concerns about the sources of the data used to train them, particularly when that data includes copyrighted material.
Anthropic’s legal troubles are not new. In October of last year, the company was also sued by Universal Music, which accused it of “systematic and widespread infringement” of copyrighted song lyrics. This lawsuit, filed in a Tennessee federal court, was joined by other major music publishers, including Concord and ABKCO. The lawsuit highlighted several instances where Anthropic’s AI chatbot, Claude, generated lyrics that were nearly identical to copyrighted songs, including Katy Perry’s “Roar” and Gloria Gaynor’s “I Will Survive.”
According to Universal Music’s lawsuit, when a user asked Claude for the lyrics to “Roar,” the chatbot provided an almost verbatim copy of the lyrics, violating Concord’s copyright. Similarly, when asked about the lyrics to “I Will Survive,” Claude produced a close replica of the original lyrics, infringing on Universal’s rights. The lawsuit argued that in building and operating its AI models, Anthropic has unlawfully copied and disseminated vast amounts of copyrighted works, comparing the company’s actions to historical precedents involving other technologies like the printing press, photocopiers, and web crawlers.
The lawsuit against Anthropic is a significant development in the ongoing debate over the legality of using copyrighted material in AI training. It raises important questions about the responsibilities of AI companies in sourcing their data and the potential consequences of failing to adhere to copyright laws.
The Broader Implications for the News and Publishing Industries
The lawsuit against Anthropic is part of a broader trend within the news and publishing industries, which are increasingly concerned about the impact of AI-generated content on their business models. As AI technology advances and becomes more prevalent, many media outlets and news organizations are struggling to maintain sufficient revenue from advertising and subscriptions, which are essential for funding their news-gathering operations.
In response to these challenges, some media organizations have begun to take legal action to protect their intellectual property. For example, in June, The Center for Investigative Reporting, the oldest nonprofit newsroom in the United States, filed a lawsuit against OpenAI and its lead backer, Microsoft, accusing them of copyright infringement. This lawsuit followed similar actions by other major publications, including The New York Times, The Chicago Tribune, and The New York Daily News.
The New York Times, in particular, has been at the forefront of this legal battle. In December, the newspaper filed a lawsuit against Microsoft and OpenAI, alleging that their AI models had unlawfully used its journalistic content in training ChatGPT. The Times is seeking “billions of dollars in statutory and actual damages” for what it describes as the “unlawful copying and use” of its valuable content. This lawsuit is one of the most high-profile cases in the ongoing debate over the use of copyrighted material in AI training.
Similarly, The Chicago Tribune, along with seven other newspapers, filed a lawsuit in April, accusing OpenAI of copyright infringement. These lawsuits highlight the growing concerns within the news industry about the impact of AI-generated content on their ability to generate revenue and protect their intellectual property.
Outside of the news industry, a group of prominent U.S. authors, including Jonathan Franzen, John Grisham, George R.R. Martin, and Jodi Picoult, filed a lawsuit against OpenAI last year, alleging copyright infringement. These authors argue that their works were used without permission to train AI models like ChatGPT, raising serious questions about the ethics and legality of using copyrighted material in AI development.
Partnerships Between AI Companies and Media Outlets
While some media organizations are pursuing legal action against AI companies, others are choosing to collaborate with them. On Tuesday, OpenAI announced a partnership with Condé Nast, one of the world’s leading media companies. This partnership will allow OpenAI’s AI models, ChatGPT and SearchGPT, to display content from Condé Nast’s prestigious publications, including Vogue, The New Yorker, Condé Nast Traveler, GQ, Architectural Digest, Vanity Fair, Wired, and Bon Appétit.
In addition to this partnership, OpenAI has been actively seeking other collaborations with media outlets. In July, Perplexity AI, another AI company, introduced a revenue-sharing model for publishers following a series of plagiarism accusations. This program, known as the “Publishers Program,” was quickly adopted by several major media outlets, including Fortune, Time, Entrepreneur, The Texas Tribune, Der Spiegel, and WordPress.com.
In June, OpenAI announced a multi-year content deal with Time magazine, which will allow OpenAI to access Time’s current and archived articles, some dating back more than 100 years. This partnership will enable OpenAI to display Time’s content in response to user queries within ChatGPT and to use these articles to enhance its AI products.
Similarly, in May, OpenAI secured a partnership with News Corp., the parent company of several prominent publications, including The Wall Street Journal, MarketWatch, Barron’s, and the New York Post. This deal grants OpenAI access to current and archived articles from these publications, which will be used to train its AI models. In the same month, Reddit announced a partnership with OpenAI, allowing the company to use Reddit content for AI training.
These partnerships highlight the complex and evolving relationship between AI companies and the media industry. While some organizations are taking a hardline approach by pursuing legal action to protect their intellectual property, others are opting to collaborate with AI companies to find mutually beneficial solutions. As the use of AI-generated content continues to grow, the legal and ethical challenges surrounding copyright infringement are likely to become even more prominent.
The Future of AI and Copyright Law
The lawsuits against Anthropic, OpenAI, and other AI companies underscore the urgent need for clear legal guidelines regarding the use of copyrighted material in AI training. As AI technology continues to advance, it is becoming increasingly important for companies to navigate the complex legal landscape surrounding copyright law.
The outcome of these lawsuits could have significant implications for the future of AI development and the protection of intellectual property. If the courts rule in favor of the plaintiffs, AI companies may be forced to change their practices, potentially leading to stricter regulations on the use of copyrighted material in AI training. On the other hand, if the courts side with the AI companies, it could pave the way for more widespread use of copyrighted content in AI development, raising concerns about the future of copyright protection.
As the legal battle between AI companies and content creators continues to unfold, it is clear that the intersection of AI and copyright law will remain a critical issue for the foreseeable future. Both sides will need to navigate this complex and rapidly evolving landscape to ensure that the rights of content creators are protected while also allowing for the continued growth and development of AI technology.