Google's Imagen Video neural network was taught to create HD video from plain text
The development of the industry of AI systems for creating artwork based on abstract descriptions has taken on a pace that surprises even their creators. New applications are released every few months, demonstrating increasingly high quality performance – but also becoming less and less manageable. The troubled fate of the Imagen Video project is a case in point.
Google unveiled the first work of its new AI, Imagen Video, which is capable of synthesizing HD video at 1280×768 resolution at 24 frames per second based on simple text descriptions. It can build full 3D models of arbitrary objects to create complex scenes and show them from different angles. It can simulate the painting styles of famous artists, understands abstract concepts, and can edit video from nonexistent objects. For example, the probe’s flight through the battlefields of space pirates with unknown aliens.
The principle of Imagen Video is built on a cascade of seven levels, where the first level creates the simplest possible model of visualization of the text task. Then, depending on the task at hand, the picture is improved step by step, getting the desired effects, animation, etc. The AI was trained on a database LAION-400M, which contains about 14 million pairs of combinations of “description + video” and 60 million pairs of “text + picture”, but this is exactly its main vulnerability.
In order to cover the entire cultural base of mankind, LAION-400M is an open database, so it contains a lot of content that should not be shown due to censorship reasons. Alas, it also cannot be excluded, otherwise the AI will lose the ability to adequately recognize requests. For example, to show compassion, you need to be able to understand what pain and suffering are. Google has not yet been able to find a solution to this difficult task, so they have postponed the publication of the Imagen Video code and opening access to this AI for an indefinite period.