Google CEO Sundar Pichai has raised concerns that OpenAI may have breached YouTube’s terms and conditions while training its text-to-video model, Sora.
In an interview with Nilay Patel, editor-in-chief of The Verge, Pichai acknowledged the potential violation, expressing sympathy for content creators whose work may have been used without permission.
Potential Breach of Terms and Conditions
During the interview, Patel asked Pichai if he believed OpenAI had violated YouTube’s terms. Pichai confirmed, stating, “That’s right. Yes, that’s right.” He emphasized that YouTube is still investigating the matter to understand how OpenAI trained Sora.
Sundar Pichai says he believes OpenAI’s Sora breached YouTube’s terms and conditions and he is sympathetic to creators whose content is being used to train AI models pic.twitter.com/mF1D6XjYf8
— Tsarathustra (@tsarnick) May 20, 2024
“We have terms and conditions, and we would expect people to abide by those terms and conditions when you build a product, so that’s how I felt about it,” Pichai added.
OpenAI’s Sora: A Revolutionary but Controversial Model
In February, OpenAI introduced Sora, a groundbreaking model capable of generating high-quality videos from simple text prompts. However, the company has been vague about the specific data sources used for training Sora.
OpenAI’s CTO, Mira Murati, mentioned to The Wall Street Journal that Sora was trained on “publicly available data and licensed data,” but she was less certain about whether data from platforms like YouTube and Instagram was included.
“You know, if they were publicly available to use, there might be data. But I’m not sure. I’m not confident about it,” Murati said.
YouTube’s Stance on Data Usage
YouTube CEO Neal Mohan also weighed in on the issue, telling Bloomberg’s Emily Chang that using YouTube videos to train Sora would constitute a “clear violation” of the platform’s terms of service. He stressed the importance of respecting creators’ rights and adhering to YouTube’s policies.
“It does not allow for things like transcripts or video bits to be downloaded, and that is a clear violation of our terms of service,” Mohan stated.
Broader Implications for AI Companies
OpenAI’s situation with YouTube highlights the challenges AI companies face when sourcing data for training their models. Amazon-backed AI startup Anthropic, for example, has resorted to using self-generated data to avoid similar issues.
This controversy isn’t the first for OpenAI regarding content and creators. Recently, actress Scarlett Johansson expressed her dismay over OpenAI’s new virtual assistant, which sounded strikingly similar to her voice. Johansson had declined an offer from OpenAI CEO Sam Altman to voice the model.
“We believe that AI voices should not deliberately mimic a celebrity’s distinctive voice — Sky’s voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice,” OpenAI clarified in a blog post.
The allegations from Sundar Pichai and the ongoing scrutiny from YouTube illustrate the complex landscape of data usage in AI development. As AI technologies continue to evolve, ensuring ethical practices and respecting content creators’ rights will be crucial to maintaining trust and compliance within the industry.
The Information is Taken from Business Insider and Live Mint