Architectural Prompting: Google Omni, when AI stops generating and starts understanding space

Google Omni, what it is

Omni is not just another image generator, it is a real-time multimodal native model. In simple terms, it means that Omni can instantly process, understand and connect complex text, audio, video and visual data at the exact same time, with no intermediate steps.

For those who work with space, architecture and visual storytelling, this is not just a technical evolution: it is a quantum leap. It means being able to interact with the model by speaking to him verbally while he analyzes one of our freehand sketches, or asking him to modify the atmosphere of a video rendered in real time while we explain the project idea to him. We are no longer just giving commands to a machine; we are collaborating with an assistant who understands the visual and spatial context exactly as we see it.

An incredible paradigm shift, which opens up previously unthinkable scenarios for the management of our concepts and photo insertions.

The test: transforming a single shot into a navigable space

I wanted to immediately test the muscles of this new Omni for you. Google’s promises are always ambitious, but I wanted to see how it fares in our real test: understanding and navigating architectural space.

I did an experiment that was conceptually simple, but technically very difficult for an artificial intelligence. I started from a static image (a previously generated interior) and, without using 3D software or mapping virtual cameras, I simply traced a sinuous path in red freehand directly on the photo.

I uploaded the image to Gemini and asked it a very specific thing: “Generate me a video of an FPV drone that follows this red trail exactly within my space.”

The result? Judge for yourself:

If we take into account that the AI had to work exclusively on that single starting shot, the result is nothing short of surprising. Omni did not have a three-dimensional model or a depth map at its disposal: it had to understand the volumes, calculate the distances between the sofa and the table, deduce the perspective of the room and animate a camera fluidly following a banal graphic sign.

Conclusions: beyond static rendering

This test makes us understand something fundamental. We are moving from the generation of “perfect photographs” to the generation of truly “navigable environments”. Of course, the result is still not flawless. If you look at the video carefully, you will notice some perspective uncertainty: the sofa, for example, seems to rotate slightly in an unnatural way as the camera passes.

But pay attention to the context: this video was generated “good at first”, taking the first overall result without going to refine or refine the prompt. Furthermore, we are asking the AI for a very high interpretative effort: to invent a fluid and three-dimensional flight path within a space, having at its disposal exclusively a flat 2D image. A limit that until yesterday seemed insurmountable.

For us architects and designers, this means being able to start imagining showing clients not only the aesthetics of a project, but the experience of going through it, perhaps starting from a simple sketch or a static rendering. I will continue to push this tool to the limit in the coming weeks to discover its full potential applied to our workflow. Until next time viewing!

The weekly column “Architectural Prompting” is edited by experts Luciana Mastrolia, Giovanna Panucci and Andrea Tinazzo
>> If you are interested in these topics, also sign up to the free Linkedin Newsletter AI & Design for Technicians, we’ll talk about it here!

Architectural Prompting: Google Omni, when AI stops generating and starts understanding space

Google Omni, what it is

The test: transforming a single shot into a navigable space

Conclusions: beyond static rendering

Blueprints for tomorrow: building the future together