A few weeks back, Wall Street Journal reporter Joanna Stern sat down for a chat with OpenAI CTO Mira Murati, for a brief but wide-ranging conversation about the company’s text-to-video model, Sora.
Stern and Murati cover the basics like how text-to-video AI models work, some of the big picture concerns about putting people out of work and spreading misinformation, and so forth. Yet it still features a few intriguing, eye-opening exchanges.
One immediately notable observation, to which even Stern herself calls attention, is how frequently Murati uses the word “eventually” to describe her work at OpenAI.
Is there a way to fix obvious problems in Sora outputs? There will be, eventually.
Will users ever get more actual control over the specific elements of the image beyond just basic prompts? Sure, eventually.
When will Sora be released to the public? Eventually. But does that mean before the 2024 presidential election? Perhaps, maybe.
Does generating videos require exponentially more processing power than other kinds of AI outputs? Yes, but they’re hoping to bring this number down… wait for it… eventually.
Obviously, technology takes time to develop, and Sora is undeniably a cutting-edge and innovative product, far smoother and more naturalistic than any other mainstream text-to-video applications. It’s unfair to expect every feature to work perfectly while it’s still in beta.
More intriguing, however, than Murati’s answers in this roughly 10-minute sit-down are the questions with which she refuses to engage. While she’s very open to discussing Sora’s shortcomings as an app and even concerns about the potential dangers to our society and political system it poses, Murati declines to provide even basic insight into how Sora was trained, and what specific collections of data it has been fed. She won’t even concede that photos and clips from Shutterstock were used to train Sora, even though the partnership between the stock photo library and OpenAI has been public knowledge since last summer.
Just about the only thing Murati will say regarding the materials used to train Sora are that they were either “publicly available” or “licensed.” But of course, anything on the internet is “publicly available,” if you want to get technical. Stern asks specifically about YouTube and Facebook videos – publicly available media that nonetheless represent the work of scores of individual human creators – but doesn’t get a clear answer. (Murati actually says she doesn’t know whether Sora was trained using YouTube videos, which strains credulity, considering her high-level position and experience. If she doesn’t know, WHO DOES?)
Stern’s video is certainly not the first time that issues around AI and copyright have been raised, not only by journalists but also by the legal system. In late 2023, The New York Times filed suit against OpenAI and primary investors Microsoft for training ChatGPT and other programs using its published news articles, which it identifies as copyright infringement. In a blog post response to the suit, OpenAI argues that using copyrighted material to train a new AI model constitutes “fair use.” (Murati’s particular turn of phrase – that OpenAI relied on “publicly available internet materials” for training – echoes this blog post in a way that feels purposeful.)
Whether or not this actually constitutes “fair use” remains a very open question for the courts to decide. Numerous studies and surveys have already pointed out that AI apps have a pretty significant plagiarism problem, raising questions about the degree to which training material is repurposed or reinterpreted as opposed to simply being copy-and-pasted. One 2023 paper demonstrated that ChatGPT can be tricked into providing specific personal information that it was fed, like email addresses and phone numbers. That New York Times lawsuit includes evidence of entire stories being spat out by ChatGPT essentially verbatim.
This is not a minor “we’ll sort it out eventually” kind of issue. Whether or not training an AI app counts legally as “fair use” could become an existential question for not just OpenAI, but all companies developing similar apps. Scooping up vast troves of information from the internet – regardless of its copyright or ownership status – is the only practical way to make Large Language Models (LLMs) and other generative AI tools like Dall-E and Midjourney work properly.
OpenAI conceded this themselves in a brief submitted to a select committee in the UK’s House of Lords. They note that “because copyright today covers virtually every sort of human expression – including blog posts, photographs, forum posts, scraps of software code, and government documents – it would be impossible to train today’s leading AI models without using copyrighted materials.” And because it would be financially impractical to pay off every copyright holder on the internet for the legal license to their work, essentially this means OpenAI needs to either borrow or steal other people’s content in order to function.
As the world waits for the court system to render some verdicts on the actual legality of all this, OpenAI and other companies developing these apps essentially have free reign to market and promote their products. As Murati teases in the Wall Street Journal video, Sora’s going to be made available to the public pretty soon, hopefully by the end of this year. It’s an extended multi-year grace period to sort of figure things out on the fly as an entire ecosystem is built up around these apps and their outputs.
Actual human creators, though, receive nowhere near this level of compassionate understanding around their work. Even when independent creators really are working within the established rules regarding “fair use,” they could still be hit with a strike by YouTube’s onerous Content ID system, which frequently makes mistakes and errors and heavily favors the rights of IP owners over their platform’s actual users. As the Electronic Frontier Foundation pointed out in 2020, Content ID is such a looming concern for YouTubers, it in large part dictates what kinds of videos we see online by discouraging any and all forms of engagement with popular culture and identifiable brands.
Hardly a week goes by without a new headline about an internet creator having their work demonetized, removed, or disappeared from the internet entirely because of a copyright claim or strike. Just this week, it happened to independent journalist and Channel 5 mainstay Andrew Callaghan, who had a feature-length documentary about unhoused people living in tunnels under Las Vegas pulled from YouTube without warning.
Most pundits and experts agree that the future of internet content will, at least in some aspects, become a competition between automated digital creators and their flesh-and-blood human counterparts. If that’s really the case, shouldn’t we at least insist on a level playing field, where all creators – AI or otherwise – have to follow the same set of rules? If the robots can hoover up any content they want in order to produce their new work, why shouldn’t people be allowed to do the same? And if human reactors and vloggers and essayists and other creators have to follow exacting, specific, very detailed rules about when they can use third-party content, shouldn’t the bots have to follow these same principles?