OpenAI has formally challenged a U.S. court order compelling it to disclose 20 million anonymized ChatGPT conversations to The New York Times, escalating a high-stakes legal battle over copyright and user privacy. The AI company filed its objection this week in New York federal court, arguing the demand is a “speculative fishing expedition” that disregards long-standing privacy protections and is almost entirely irrelevant to the underlying copyright lawsuit.
Quick Take: The Legal Flashpoint
Here are the key facts in the escalating discovery dispute between OpenAI and The New York Times:
- The Order: A U.S. Magistrate Judge ordered OpenAI to produce 20 million randomly sampled, “de-identified” ChatGPT user conversations from December 2022 to November 2024.
- The Lawsuit: The New York Times (NYT) sued OpenAI and Microsoft in December 2023, alleging massive copyright infringement by using millions of its articles to train AI models like ChatGPT.
- OpenAI’s Objection: Filed on November 12, 2025, the motion argues the order is an “invasion of user privacy” and that 99.99% of the chats have no connection to the NYT’s claims.
- The NYT’s Rationale: The newspaper argues the chats are necessary to prove its case and to rebut OpenAI’s defense that the NYT “hacked” the chatbot to manufacture evidence of infringement.
- The Deadline: OpenAI faces a reported deadline of November 14, 2025, to comply with the magistrate’s order, though its new objection seeks to have a District Judge overturn it.
- The Original Demand: The NYT’s initial request was for 1.4 billion user chats, which was later negotiated down to the 20 million sample.
The Privacy Standoff: A ‘Baseless Lawsuit’ or ‘Crucial Evidence’?
The core of the legal showdown, now in its eleventh month, has shifted from a theoretical debate over “fair use” to a tangible, granular fight over the private data of millions of users. The OpenAI challenges court order marks a critical moment where the competing interests of intellectual property and digital privacy have collided.
The New York Times, along with other publishers in the consolidated lawsuit (MDL No. 3143), alleges that OpenAI’s models were illegally trained on their content and that ChatGPT can “regurgitate” their articles verbatim, creating a direct competitor.
To prove this, the NYT convinced Magistrate Judge Ona Wang that it needs to see real-world user data. The newspaper’s lawyers argue these logs are the only way to determine how often ChatGPT reproduces copyrighted content and to counter OpenAI’s accusation that the NYT used deceptive “hacking” prompts to create the examples of infringement cited in its original complaint.
OpenAI’s public and legal response has been scathing. In a blog post published Wednesday, OpenAI’s Chief Information Security Officer (CISO), Dane Stuckey, framed the demand as a direct threat to its users.
OpenAI’s legal filing echoes this, stating that “anyone in the world who has used ChatGPT in the past three years must now face the possibility that their personal conversations will be handed over to The Times to sift through at will.
The NYT, however, maintains that user privacy is not at risk. A spokesperson for the newspaper countered that the court’s order already provides sufficient protection.
Why 20 Million Chats? The ‘Hacking’ Defense
The dispute over the 20 million chats is intrinsically linked to one of OpenAI’s key legal defenses.
In February 2024, OpenAI filed a motion to dismiss parts of the lawsuit, claiming the NYT had “hacked” ChatGPT to generate its evidence. OpenAI alleged the NYT used “deceptive prompts” that violated the chatbot’s terms of service to “induce the model to regurgitate” content.
The NYT has fired back that it simply used the tool as any user would and that the ease with which it could produce infringing content was, in itself, proof of the underlying copyright violation.
The newspaper’s lawyers now argue that to defeat OpenAI’s “hacking” defense, they must be allowed to see a large, random sample of ordinary user chats. The goal is to show a jury that even without “deceptive prompts,” users are regularly encountering NYT-copyrighted material, proving the infringement is systemic and not a manufactured anomaly.
The Scale of the Data
OpenAI argues that the 20 million-chat sample is a statistical absurdity for finding relevant evidence.
- The Claim: OpenAI’s filing states that “99.99%” of the transcripts are entirely unrelated to the NYT’s claims.
- The Alternative: The company says it offered “several privacy-preserving options” to the NYT, such as running targeted searches for chats that might contain text from an NYT article. According to OpenAI, these offers were rejected.
- The Original Request: The battle began with the NYT seeking an order to preserve all user chats and API data, which it estimated at 1.4 billion conversations, before narrowing the request to the 20 million-chat sample.
The Court’s Rationale and the Precedent
In issuing the original order, Magistrate Judge Ona Wang was not persuaded by OpenAI’s privacy arguments. She ruled that the company’s “exhaustive de-identification” process, combined with a legal protective order limiting access to the NYT’s outside counsel and experts, would be sufficient to protect user privacy.
This issue is not unique to OpenAI. Legal experts note that in a similar copyright lawsuit brought by music publishers against AI company Anthropic, a judge ordered the production of 5 million chat conversations.
The challenge, however, lies in the definition of “de-identification.” Privacy advocates have long warned that even “anonymized” data can often be “re-identified” by cross-referencing other data points. OpenAI’s CISO, Dane Stuckey, alluded to this, noting the chats contain “highly personal” information that users “have no connection” to the lawsuit.
What to Watch Next
The immediate flashpoint is the November 14, 2025, deadline. OpenAI’s new objection effectively appeals Magistrate Judge Wang’s order to the senior District Judge overseeing the case, who will now have to rule on whether the order stands.
- The District Judge’s Ruling: The senior judge can uphold, overturn, or modify the magistrate’s order. This decision will set a major precedent for data discovery in all AI-related litigation.
- Compliance vs. Contempt: If the order is upheld and OpenAI continues to refuse, it could face sanctions for contempt of court.
- The ‘Fair Use’ Battle: This entire discovery fight is just a precursor to the main event: the summary judgment motions and potential trial over whether training AI on copyrighted data constitutes “fair use” under U.S. law.
This skirmish over 20 million private chats has transformed the NYT v. OpenAI case. It is no longer just a landmark copyright dispute; it is now a frontline battle over the privacy rights of every individual who interacts with generative AI.






