Google announced Gemini Omni, a new family of multimodal AI models starting with Gemini Omni Flash that can create high-quality videos from any combination of image, audio, video, and text inputs, with users able to edit videos through natural language conversation while characters, physics, and scene context remain consistent across edits. The model includes features like accurate physics simulation, the ability to blend multiple reference inputs into a single cohesive output, and Avatars (a digital version of the user that can be used to generate videos featuring their own voice), with all videos including Google's imperceptible SynthID digital watermark for content transparency. Gemini Omni Flash is rolling out today to all Google AI Plus, Pro, and Ultra subscribers globally through the Gemini app and Google Flow, and rolling out at no cost to users on YouTube Shorts and YouTube Create App starting this week, with future plans to support image and audio output modalities and to extend the model to developers and enterprise customers via APIs.






