Google's AI Mode Gets a Major Upgrade: Multimodal Search Now Lets You Ask Questions About Photos

By: Search More Team

Posted On: 8 April

Google is continuing to push the boundaries of how we interact with technology. With the success of AI Mode among Google One AI Premium subscribers, Google has now expanded the feature, bringing its multimodal capabilities to millions more users through Google Labs. This marks a major leap forward in how we use search, with the powerful combination of visual search and artificial intelligence offering more nuanced and contextually relevant responses than ever before.

A New Era in Search: Multimodal Understanding in AI Mode

Since launching AI Mode, early users have praised the clean design, fast response time, and the mode's ability to tackle complex, open-ended questions. AI Mode isn’t just for quick queries—people are using it for in-depth tasks like comparing products, exploring how-tos, and even trip planning. What's truly remarkable is that on average, AI Mode queries are twice as long as traditional search queries. The feedback has been overwhelmingly positive, and now, Google is making it available to millions more users, starting with those in the U.S. who are part of Google Labs.

With this expansion, Google is now rolling out multimodal capabilities in AI Mode, integrating the powerful visual search abilities of Google Lens into the AI Mode experience. The combination of Gemini’s multimodal capabilities and Lens’s advanced visual search features ensures that users can ask complex questions about what they see, transforming how we engage with images and information.

What Does Multimodal Search Mean for Users?

With AI Mode’s new multimodal understanding, users can now snap a photo or upload an image and immediately ask detailed questions about it. Whether you're curious about a specific object in a picture or looking for more information about an entire scene, AI Mode can deliver comprehensive responses, providing deep insights with relevant links for further exploration.

The experience is powered by Gemini’s multimodal capabilities, allowing AI Mode to not only recognize individual objects in an image but also understand the relationships between them. This advanced understanding takes visual search to a new level, offering answers with greater nuance and context.

How It Works: A Smarter Way to Search

When you upload an image, AI Mode uses Lens to identify the objects within it. But it doesn’t stop there. Through query fan-out techniques, AI Mode issues multiple queries about the entire image as well as each object in it. This means you get more information than you would with a standard Google search, making it possible to dive deeper into the context and learn more than ever before.

For example, if you upload a picture of a bookshelf, AI Mode can identify each book on the shelf and issue queries to gather information about those specific books. It can even recommend similar books that are highly rated, providing you with a personalized, comprehensive list of suggestions. This is a major step forward from traditional search methods, as AI Mode isn’t just finding individual results—it’s understanding the entire scene and offering contextually rich recommendations.

Testing and Improving AI Mode: Your Feedback Matters

As Google Labs continues to test and improve AI Mode, the team is relying on user feedback to fine-tune the experience. By signing up for Labs, users can try out this new multimodal search capability in the Google app (available for both Android and iOS). Google encourages users to provide feedback, helping to shape the future of this transformative technology.

With the integration of multimodal search, AI Mode is set to revolutionize how we interact with information online. It’s not just about finding answers—it’s about understanding them on a deeper level. The fusion of visual search and AI will open up a new realm of possibilities for exploring the world around us in ways that were once unimaginable.