
Visual search has long been a sleeping giant in the domain of potential AR killer apps. Using your smartphone camera – or, increasingly, smart glasses – visual search can identify and contextualize real-world objects. The UX often involves visual overlays or audible intelligence.
At the heart of visual search is machine learning. Google Lens, for instance, taps into decades of image and knowledge-graph data as a training set for object recognition. Among other things, this positions visual search to ride the tailwinds of AI’s continued investment and advancement.
One way this has materialized is Google Search Live. Users can do things like point it at a fashion item that they see someone wearing on the street to find out who makes it and where to buy it. This gets closer to one of AI’s most valuable and expansive endpoints: physical AI.
In fairness, Google Lens already did all this. What differentiates Search Live is multimodal capabilities. Users can ask follow-up questions a la AI Mode. They can ask things like if the jacket comes in blue or kids’ sizes. Search Live is what you’d get if Google Lens and Gemini had a baby.
Jackets, Restaurants & Shelves
To use Search Live, there’s a Live icon under the search bar in Google’s iOS and Android apps. Users can also find it on Google Lens, where there’s a Live option at the bottom of the screen. Other access points will develop as Google looks to boost Search Live exposure and adoption.
To that end, there’s a broad range of potential use cases beyond the fashion example above. Pointing one’s phone at a restaurant, users can ask about cuisine, reviews, and reservations. Google also provides the example of intelligent instructions to build a shelf (see video).
The point in all of this is to search in intuitive ways using whatever input – or several inputs – are most natural. For jackets, restaurants, and shelves, it may be easier to point your phone than describe them. But for additional context or follow up questions, it may be easier to simply talk.
Meanwhile, Google is motivated as it can boost query volume – a precursor to monetization – when users ask several questions. Beyond quantity, it’s a matter of quality. Natural language and media-rich inputs give Google more signals to discern user intent – a lynchpin of search.
The reason this is relevant now is that Google is scaling up Live Search. Last week it announced that it’s rolling it out globally. Previously limited to the U.S. and India, Search Live will be available in more than 200 countries. This should help it scale, get smarter, and gain global traction.
Natural & Ambient
Stepping back, moves like Search Live make us double down on a longstanding prediction that visual search will be AR’s killer app. It’s already well on its way, given that Google Lens alone sees 20 billion+ visual searches per month. Another point of validation is Ray-Ban Meta sales.
Specifically, Meta’s non-display AI glasses have sold an estimated 10 million+ lifetime units. And a big selling point is multimodal AI – a flavor of visual search. Users look at objects and ask questions, like Search Live, to get audible answers. The mix of audio/visual makes it multimodal.
In fact, Ray-Ban Metas and other smart glasses are the key to unlock visual search’s widescale traction. When it’s faceworn, visual search can find its footing as a natural and ambient tool that doesn’t require holding up a phone. So as smart glasses scale up, so could visual search.
Another accelerant could be Apple. Though its Visual Intelligence function has been held back by Apple’s broader stumbles in AI, that could change. Now that it finally conceded to outsource AI and just be a high-scale consumer touchpoint, visual intelligence could ratchet up.
Meanwhile Search Live is visual search’s latest advancement, bringing the technology closer to its foundations in AI. Coupling visual inputs with audible ones, it could find a sweet spot in intuitive situational intelligence, and a high-frequency utility that exceeds web search itself.
