Major tech companies have been racing to build data centers, the facilities where AI models are trained and run globally—and where Nvidia’s AI chips dominate. But as AI technology rapidly develops, the data center—and in turn, Nvidia—face new competition.
It all comes down to where the process of generating answers from AI models, known as “inference,” takes place. Right now, inference is largely happening in data centers.
But in the future, powerful chips from companies such mobile-chip specialist Qualcomm could move inference out of data centers and onto smartphones and personal computers.
The stakes for the booming AI inference market are high—inference already makes up around 40% of Nvidia’s data-center revenue and growing fast. It won’t be long before it overtakes training models as the principal source of AI revenue for chip players.
“There’s a higher-level battle ongoing between Nvidia and Qualcomm here,” wrote technology analyst Dean Bubley, founder of Disruptive Analysis.
The biggest change in AI this year has been the introduction of so-called reasoning models, a new technique for inference. These models break down problems step-by-step, using much more computing resources than earlier AI models—as much as 100 times more, according to Nvidia CEO Jensen Huang.
At this month’s Mobile World Congress in Barcelona, the world’s largest mobile and communications trade show, Qualcomm and Micron made their case for where inference goes from here—and Nvidia showcased how it plans to maintain its lead.
Qualcomm says the need for continuous availability, quick response times, privacy security, and lower costs mean that AI inference will inevitably shift to users’ devices.
“It makes sense to run it on the device for a lot of different reasons, and it’s going to happen,” Qualcomm Chief Financial Officer Akash Palkhiwala told Barron’s in an interview at MWC.
Carrying out inference on a mobile device, however, can drain battery life quickly. Another issue is that AI chips can process data faster than memory systems can deliver it, limiting inference performance—the so-called “memory wall”—which can create a frustrating lag for users.
Memory-chip specialist Micron Technology is working on fixes. Its newest chips for high-end smartphones will allow for up to 15% power savings compared with previous generations. Micron’s potential solution to the “memory wall” could be semiconductor architecture that enables some inference operations directly inside memory chips—known as processing-in-memory.
Qualcomm and Micron hope that consumers will buy premium smartphones equipped with their chips to handle AI inference. But the uptake could be slow—worldwide smartphone shipments are only forecast to grow 2.3% this year, according to the International Data Corporation.
Meanwhile, Nvidia says its ready-made solution can be implemented immediately. It aims to sell its chips to telecommunications companies, arguing that local wireless infrastructure is the right place to carry out AI inference. It’s close enough to the user to reduce lag, while also taking advantage of existing power supply, the company says.
It’s a hard sell. Telecommunication companies are wary of making significant investments after spending heavily on 5G infrastructure for paltry returns. However, Nvidia has multiple takers for its plan.
Last week, South Korea’s Samsung Electronics said it’s integrating Nvidia’s hardware to allow its wireless network to act like a data center. Verizon Communications earlier this year announced its “AI Connect” suite of products for businesses, which includes a partnership with Nvidia.
Most notably, Nvidia formed a close relationship with Japanese telecom-and-internet company SoftBank, which is championing the concept. The two companies have run technology trials together and jointly estimate that telco operators can earn roughly $5 in inference revenue for every $1 they invest in combined AI and wireless infrastructure.
The move beyond text-based AI to other media, such as sound and video, will be key to illustrate the advantages of carrying out inference on wireless networks, including preserving device battery, Mauro Filho, director of AI-RAN America at SoftBank, told Barron’s.
“Having the ability to transfer some of those workloads to the network enables low latency and the convenience of on-the-device [AI] simultaneously, instead of having to choose,” Filho said.
However, there are skeptics of the idea that wireless networks can ever play a major part in AI processing.
“The majority of overall compute for AI, whether training or inference, will be concentrated in large data centers, or in devices. The bit ‘in the middle’ has less power supply,” Disruptive Analysis’s Bubley wrote.
There will inevitably be a mix of inference happening in data centers, via wireless infrastructure, and on devices, but the exact proportions will determine which companies emerge as the winners.
At the moment, Nvidia still has a clear advantage.
Big technology companies are locked into using its chips for their huge investments in data centers and will want to use the same hardware for inference as they did for training AI models. Chip sales to telecoms operators would cement Nvidia’s position further against Qualcomm and other would-be challengers trying to bring AI out of the data center.