According to Decrypt, Midjourney, a generative image creation tool, announced plans to introduce a text-to-video model within the next few months. The company will begin training its video models starting in January, according to CEO David Holz during an 'Office Hour' Discord session. This move represents a natural progression for the platform, building upon a mature image model to stir the competitive dynamics of the generative video industry.
The Discord session notes included planned tweaks for V6 Niji, Midjourney's manga/anime generator model, and consistency fixes for the upcoming official release of Midjourney V6. The company also wrote that its to-do list calls for 'training for new video models to commence,' which could potentially be ready 'in a few months.' No further information about the model was shared by either Holz or the Midjourney team.
Midjourney's venture into video comes in the wake of releases from the competition, such as Stability AI's Stable Video Diffusion, Meta's EMU video generator, and existing models like Pika and Runway ML. Additionally, other image generators like Leonardo AI have already implemented video generation capabilities, further intensifying the race. The recent v6 update from Midjourney, boasting improved prompt following and more realistic images, is the company's most recent effort to stay relevant and competitive. If its models show some cohesion, they could gain solid ground in such a nascent field, even with models that are still far from perfect. The implications of these developments extend far beyond a corporate race for supremacy, as the creative and media industries stand on the brink of a transformative era. The ability to generate, manipulate, and interact with video content through AI opens up many possibilities, from making things easier for entertainers and advertisers to potentially reshaping how we perceive reality.