Sora Studio

OpenAI’s video generation tool Sora took the AI community by storm in February with smooth, realistic videos that appear light years ahead of competitors. However, the carefully planned debut left many details out, details that have been revealed by a filmmaker who had early access to create a short using Sora.

Shy Kids is a Toronto-based digital production team that was selected by OpenAI as one of a few to produce short films, primarily for promotional purposes for OpenAI, although they were given considerable creative freedom in the creation of “Air Head.” In an interview with visual effects news outlet fxguide, post-production artist Patrick Cederberg described “the actual use of Sora” as part of his work.

Perhaps the most important lesson for most is simply this: while OpenAI’s post highlighting the short films allows the reader to assume that they more or less emerged fully formed from Sora, the reality is that these were professional productions, complete with solid storyboarding, editing, color correction and post work such as rotoscoping and visual effects. Just as Apple says “filmed with iPhone” but doesn’t show the studio setup, professional lighting, and color work after the fact, Sora’s post only talks about what it allows people to do, not how they actually did it .

Cederberg’s interview is interesting and fairly non-technical, so if you’re interested, head over to fxguide and read it. But there are some interesting details here about Sora’s use that tell us that, as impressive as it is, the model is perhaps less of a leap forward than we thought.

Control remains the most desirable and also the most elusive right now. …The closest we could get was simply being hyper-descriptive in our directions. Explaining the characters’ costumes, as well as the type of balloon, was our way of maintaining consistency because from shot to shot / generation to generation, there is not yet an established feature set to have full control over consistency.

In other words, matters that are simple in traditional filmmaking, such as choosing the color of a character’s clothing, require elaborate solutions and checks in a generative system, because each shot is created independently of the others. That could obviously change, but it’s certainly a lot more work-intensive at the moment.

Sora’s results also had to be revised to eliminate unwanted elements: Cederberg described how the model routinely generated a face on the main character’s balloon for a head, or a rope hanging down. These had to be removed in post-production, another time-consuming process, if they couldn’t get the prompt to exclude them.

Precise control of time and character or camera movements is not really possible: “There is a bit of temporal control over where these different actions occur in the actual generation, but it’s not precise… it’s more of a shot in the arm.” chance,” Cederberg said.

For example, the timing of a gesture like a greeting is a very coarse, suggestion-driven process, unlike manual animations. And a shot like a pan up on the character’s body may or may not reflect what the filmmaker wants, so the team in this case rendered a composite shot in vertical orientation and then performed a cropped pan in post-production. The generated clips were also often in slow motion for no particular reason.

In fact, the use of everyday filmmaking language, such as “pan right” or “sequence shot,” was inconsistent overall, Cederberg said, which the team found quite surprising.

“The researchers, before approaching artists to play with the tool, weren’t really thinking like filmmakers,” he said.

As a result, the team performed hundreds of generations, each lasting 10 to 20 seconds, and ended up using only a handful. Cederberg estimated the ratio at 300:1, but of course, we would probably all be surprised by the ratio in an ordinary film shoot.

The team actually made a little behind-the-scenes video explaining some of the issues they encountered, if you’re curious. Like a lot of AI-related content, the comments are quite critical of the entire effort, though not as vituperative as the AI-assisted ad we saw criticized recently.

The last interesting complication concerns copyright: If you ask Sora to give you a clip from “Star Wars,” he will refuse. And if you try to get around it with “robed man with lightsaber on retro-futuristic spaceship”, it will refuse too, since by some mechanism it recognizes what you’re trying to do. He also refused to do an “Aronofsky-type” shot or a “Hitchcock zoom.”

On the one hand, it makes sense. But it begs the question: If Sora knows what these things are, does that mean the model was trained with that content, to better recognize what’s infringing? OpenAI, which keeps its training data cards close to its vest, to the point of absurdity, as with CTO Mira Murati’s interview with Joanna Stern, will almost certainly never tell us.

As for Sora and his use in filmmaking, he’s clearly a powerful and useful tool in his place, but his place is not “creating movies out of thin air.” Still. As another villain once famously said, “that will come later.”

Enlace al artículo sobre Elon Musk

Leave a Reply

Your email address will not be published. Required fields are marked *

Verified by MonsterInsights