/vnd/media/media_files/2025/03/27/ZjMXKPvEmTtZ3TIya3C3.jpg)
By Menglin Cao
The rapid expansion of generative Artificial Intelligence (GenAI) and large language models (LLMs) has led to an unprecedented demand for computing power. As these AI models grow in complexity, the energy required to support them significantly strains data centres. Gartner estimates that the power required for data centres to run incremental AI-optimised servers will reach 500 terawatt-hours (TWh) per year in 2027, more than double the levels seen in 2023. This escalating GenAI power demand poses substantial challenges for data centre operations, affecting cost, performance, and sustainability.
Gartner forecasts that by 2027, nearly 40% of existing AI data centres will face operational constraints due to power availability. This situation impacts the data centres themselves and has downstream effects on their customers and end users, who may experience increased costs and reduced performance.
The increasing power requirements for GenAI are also becoming a critical constraint for IT organisations, limiting their ability to deploy GenAI-related products and applications.
Shift to On-Device GenAI Processing
The operational risks associated with data centres’ increasing power consumption will force product leaders to consider offloading and moving more AI inference workloads to endpoint devices in the future. Two strong motivators exist for on-device GenAI processing: increased responsiveness and data privacy. With the added pressure of data centre power limitations, on-device GenAI processing is becoming an even more attractive solution.
Gartner anticipates that by 2026, more GenAI queries will be processed on-device than in the cloud, signalling a significant shift in AI strategy.
Need for Redesigning Key Technologies
As the landscape evolves, product leaders must reassess their AI strategies to accommodate this shift. Evaluating the best inference approach for distributing GenAI processing workloads on-device is crucial. By embracing on-device GenAI processing, organisations can mitigate the risks associated with data centre power constraints while enhancing the overall user experience.
This strategic pivot addresses current power challenges and positions organisations better to meet future demands amid a rapidly advancing AI landscape.
The trend toward GenAI processing on endpoint devices, including smartphones, PCs, tablets, XR headsets, wearables, vehicles, robotics, and Internet of Things devices, is gaining momentum, driven by the need for improved user experiences, such as enhanced data privacy, lower latency, and faster response times. On-device GenAI processing requires very high energy efficiency due to the limited form factor of endpoint devices. Besides, endpoint devices’ operation time and battery life should not be compromised because of additional GenAI features. On-device GenAI processing will require combined, significant improvements in semiconductor, battery and AI model development.
Semiconductors: Energy-efficient chips are essential for real-time processing and lower latency. Specialised AI processors, lower-power memory chips, neural processing units or NPUs—integrated application processors and microcontroller units or MCUs—are preferred for on-device GenAI.
Wide-bandgap semiconductors, such as gallium nitride, play a crucial role in power conversion for fast chargers, significantly enhancing user experience. Fast chargers are essential for quickly recharging battery-powered endpoint devices, a key factor in user experience for GenAI on smartphones, PCs, and other personal devices, as local GenAI processing can rapidly deplete battery life.
Batteries: Most endpoint devices, such as smartphones, PCs, XR headsets or even wearables, will be battery-powered, and on-device GenAI processing will consume more energy from these devices. Enhanced energy density batteries, such as solid-state lithium-ion, will be critical for supporting longer operation times.
AI models: Tailored AI models with smaller parameter sets are needed for local processing on endpoint devices. Light LLMs, with fewer parameters, are suitable for specific tasks and sectors, reducing computational requirements and making them appropriate for endpoint devices where a standard LLM (which could be considered “heavy”) is infeasible.
As GenAI continues to evolve, its soaring power demands will redefine AI deployment strategies. The shift toward on-device processing is not just a necessity but an opportunity to enhance efficiency, reduce latency, and improve data privacy. However, this transition requires breakthroughs in semiconductors, battery technology, and AI model design to ensure optimal performance without compromising energy efficiency. Organisations that proactively adapt to this shift will be better positioned to navigate the AI-powered future, balancing innovation with sustainability in an increasingly power-constrained world.
The author, Menglin Cao, is a Director Analyst at Gartner.