Stable Diffusion can now create videos, but not for everyone

TechTrick November 23, 2023

48 2 minutes read

Stable Diffusion ora può creare video, ma non per tutti

Like the passage from photography to the cinemait is now clear that the next one goal of companies working in the field of generative artificial intelligence is the creation of video content.

But if you get to the top video clips took more than fifty years (from 1826, the birth of photography, to 1878 with the first film), with AI you can think in months, and the latest company to announce a solution of this type is Stability AIthe creator of the popular Stable Diffusion imaging model.

Launched a couple of days ago, Stable Video Diffusion it is an AI tool open source based precisely on the model of the same name mentioned above, and was released in the form of two models, SVD And SVD-XT. SVD turn still images into videos a 576×1024 14 frame pixels, while SVD-XTwhich uses the same architecture, increases the frames to 24. Both can generate video between 3 and 30 frames per second.

According to an article published to accompany the announcement, SVD and SVD-XT were initially trained on a dataset of millions of videos and then optimized on a much smaller set between hundreds of thousands and about a million clips.

It’s unclear where those videos came from (according to the paper from public research datasets), but Stability already has a number of related causes to illicit use images to train your models, so we hope you took the necessary precautions.

stable-diffusion-video — Source: Stability AI

But how do these videos work and what are they like? Generating the videos simply requires a text prompt, and just like the first cutscene in the story, the duration (like all those currently created by AI) is currently brief, Of four secondsbut of good enough quality high. Or at least comparable to those of Meta, Google and AI Runway.

The limits are in the content: there must be a movement of the “camera” or in any case of the shot, they cannot show something text readable ei faces may present distortions.

But if you want to know how to try it, prepare for disappointment. In fact, Stability declares that, as happened with the first version of Stable Diffusion, Stable Video Diffusion is only available for research purposes.

This means that you can access the model only through a waiting list accessible after filling out a form in which you declare your belonging to a certain type of institution and your intention to create content for “educational or creative tools“, “design and other artistic processes” and similar. But above all not to create intentionally “factual or true representations of people or events“.

The technology is certainly exciting, and Stability AI intends to deploy it to use cases such as generating visualizations 360 degrees of objects, as well as developing other models and a tool text-to-video that shows text suggestions to models on the web. The ultimate goal seems to be marketing, and Stability, which is burning millions of dollars, plans to apply this tool to advertising, entertainment, education and more.

But there are also concerns. Problems related to copyright aside, history teaches us that these models will soon appear on the dark weband it’s not hard to imagine that the tool could be used to create deepfake, since they don’t seem to be there filters integrated Of contents.

The near future will give us an answer, so for the moment we will limit ourselves to showing you the presentation video.