NYTimes Calls on OpenAI and Microsoft to Compensate Data Sources

Artificial Intelligence, News

The New York Times has filed a lawsuit against OpenAI and its close collaborator, Microsoft, claiming that they have violated copyright law by using Times content to train generative AI models.

The lawsuit, which was filed in U.S. District Court in Manhattan, asserts that numerous articles from The Times were utilized to train AI models, such as the ones behind OpenAI’s highly popular ChatGPT and Microsoft’s Copilot, without obtaining consent from The Times. The Times urges OpenAI and Microsoft to take immediate action to eliminate the models and training data that contain the objectionable material. They should also be held accountable for the significant legal and financial consequences resulting from the unauthorized copying and use of the Times’ highly valuable works.

The Times complaint emphasizes the importance of news organizations being able to produce and safeguard their independent journalism. It highlights that no computer or artificial intelligence can replace the role of human journalism if this void is created. “The reduction in journalism output will have a significant impact on society, resulting in substantial costs.

Generative AI models learn from examples to create a wide range of content, including essays, code, emails, articles, and more. Companies like OpenAI gather a vast number of examples from the web to enhance their training sets. There are a few examples that are available in the public domain. Some other options may not be as freely available or may come with certain restrictions that require citation or specific forms of compensation.

Vendors assert that the fair use doctrine offers broad protection for their web scraping practices. Copyright holders have expressed their disagreement as they take measures to prevent OpenAI, Google, and other entities from scanning their websites for training data. This has resulted in hundreds of news organizations implementing code to protect their content.

There has been a surge in legal disputes between vendors and suppliers, with the Times being the most recent case.

In July, actress Sarah Silverman became involved in two lawsuits that claim Meta and OpenAI used her memories to train their AI models. In a different legal case, numerous novelists, such as Jonathan Franzen and John Grisham, allege that OpenAI acquired their work as training data without their consent or awareness. There is an ongoing case against Microsoft, OpenAI, and GitHub over Copilot, an AI-powered code generation tool. The plaintiffs claim that their IP-protected code was used in its development.

Although The Times is not the pioneer in suing generative AI providers for alleged intellectual property violations related to written works, it stands out as the largest publisher involved in such a lawsuit so far. It is also one of the first to emphasize the potential harm to its brand caused by “hallucinations,” or fabricated information generated by AI models.

The Times complaint highlights several instances where Microsoft’s Bing Chat (now called Copilot), which utilizes an OpenAI model, allegedly provided inaccurate information attributed to the Times. This includes the presentation of results for “the 15 most heart-healthy foods,” with 12 of them not being mentioned in any Times article.

The Times also raises concerns about OpenAI and Microsoft potentially creating competition for news publishers, including the Times, by offering access to information that would typically require a subscription. This could potentially impact the Times’ business by making certain information more readily available. appointment. In addition, the monetization and removal of affiliate links that The Times relies on to generate commissions.

As mentioned in the Times complaint, generative AI models have a tendency to simply repeat the training data they were given. This means that they can reproduce results from articles almost word-for-word. In an unintended occurrence, OpenAI inadvertently enabled ChatGPT users to access paid news content without any restrictions.

The lawsuit claims that OpenAI and Microsoft are exploiting the Times’ significant investment in journalism by utilizing its content without compensation. It alleges that these actions aim to develop products that can substitute for the Times and attract its audience.

The lawsuit filed by publishers earlier this month against Google focuses on the impacts on the news subscription business and publishers’ web traffic. In this case, the defendants, similar to The Times, claimed that Google’s AI experiments, such as its AI-powered Bard chatbot and Search Generative Experience, redirect content away from publishers, readers, and ad revenue through methods that hinder competition.

The editors’ claims are highly credible. According to a recent model by The Atlantic, integrating AI into search engines like Google could potentially result in a 75% response rate to user queries without requiring them to click through to the website. According to the publishers, they anticipate a significant decrease in their traffic, potentially up to 40%, as a result of the lawsuit filed by Google.

Instead of engaging in legal battles, certain media outlets have opted to establish licensing agreements with providers. In July, the Associated Press reached an agreement with OpenAI, and this month, Axel Springer, the German publisher that owns Politico and Business Insider, also reached a similar deal.

The Times states that it made efforts to establish a licensing agreement with Microsoft and OpenAI in April, but unfortunately, the negotiations did not yield a successful outcome.