
Apple has stumbled in the AI race over the past two years. After a splashy demo for Apple Intelligence at WWDC in 2024, it has since stalled in rolling out the product in full. This included the gradual realization that it may not be as primed for AI as some of its competitors.
That inferior position – a role Apple isn’t used to – is mostly because it doesn’t have the means nor competitive positioning for AI model training. Contrast that with Google’s knowledge graph (which Apple now utilizes) and OpenAI’s LLMs. Apple is left without a native edge in AI.
But there’s one place it could gain that edge back: physical AI. This higher-stakes flavor of AI will develop as the technology expands from its current confines of the web, to the more expansive canvas of the physical world. And Apple’s means to that end is Visual Intelligence.
In short, though Apple doesn’t have an edge in LLM training, it has the capacity to train Visual Intelligence models. These could tap into Apple’s native strengths, including cameras and sensors in several devices, ingesting the content and contours of the physical world at scale.
Big Bets
These thoughts emerged in light of Mark Gurman’s reliable Power-On Newsletter from Bloomberg. After his previous issue focused on signs of a trio of Apple AI wearables (AirPods, pin & glasses), the latest issue examines clues that Apple is placing big bets on Visual Intelligence.
Stepping back, what is Visual Intelligence? This is Apple’s flavor of visual search – a technology that lets you point your phone (or glasses) at physical-world objects to contextualize them. It takes shape in other tools such as Google Lens, and Meta’s Multimodal AI in its Smartglasses.
Both of those existing visual search tools validate a real market. Google Lens sees 20 billion visual searches per month, while Meta’s Multimodal AI – which utilizes visual inputs and audible outputs, hence multimodal – is one of the most popular functions on a breakout device.
In fact, visual search is one of our longstanding picks for AR’s killer app. Though it has taken a while to ramp up – mostly due to its confinement to handheld form factors – it carries killer app traits, including wide-scale applicability, utility, and high-frequency… just like web search.
Back to Apple, it’s also seeing noteworthy traction. Tim Cook claimed on Apple’s Q1 earnings call that Visual Intelligence is one of the iPhone’s most popular features. All these tendencies for scale importantly align visual search with Apple’s typical mass-market entry criteria.
The Why and How
Beyond the “why” of visual search for Apple is the “how.” That brings us back to its ability to utilize all its touchpoints with the physical world. That includes billions of iPhones, and its rumored trio of first-person always-on AI wearables – again, smart AirPods, a pin device, and glasses.
That last part is a key component of any physical-AI master plan that Apple could be developing. Though the iPhone scales today, better human-centric world models will result from ambient sensing from a first-person perspective. This could prime Apple for high-impact AI models.
And that would likely be Apple’s next step with Visual Intelligence. The tool is now powered by a combination of Google Lens and ChatGPT, just as Apple Intelligence is now powered by Gemini. But Apple’s real play will be to start to develop its own visual models and visual-AI stack.
Put another way, Google’s success in the internet age has been to synthesize all the world’s information. But that was limited to digital domains. If Apple can cultivate all the world’s information in a physical sense, it can unlock its potential in the high-stakes game of physical AI.
And if it can do that, it has a good chance of redeeming its erstwhile failings in AI. Heck, it could even make up for years of Siri’s ineptitude. And that would evoke another Apple M.O.: to be late to the market… but then dominate it for years. We’ll see if history repeats in the AI era.
