AI’s Data Center Power Use Needs Addressing

Subscribe To Download This Insight

By Paul Schell | 2Q 2024 | IN-7355

Artificial Intelligence (AI)/Machine Learning (ML) workloads have emerged as a huge driver in data center energy use—in particular generative AI, with growth outpacing that of “traditional” AI. Concurrently, hyperscalers are set to increase Capital Expenditure (CAPEX) on infrastructure over the coming years to support this demand. For AI to scale across enterprise and consumer markets, the burgeoning energy requirements must be tamed.

Registered users can unlock up to five pieces of premium content each month.

Log in or register to unlock this Insight.

 

AI Growth Drives Power Demands

NEWS


As Artificial Intelligence (AI) advances in complexity, adoption, and scale, the computational resources required to train models and run inferencing workloads grows immensely. AI data centers—like those built by hyperscalers Microsoft and Google—that house the chipsets powering AI systems, are emerging as a significant consumer of energy. The Graphics Processing Units (GPUs) used to power generative AI systems consume more power than before, and processor efficiency gains over the past decade, which had allowed data center power consumption to plateau at a time when many cloud-based applications proliferated, have been wiped out by the demanding nature of these novel workloads. Thus, as generative AI applications scale and inferencing grows exponentially, power demands will increase commensurately. Evidence of this trend is already emerging from cloud providers’ own reporting:

  • Microsoft’s May 2024 Sustainability Report measures its FY 2023 performance against 2020 and reveals a 29.1% increase across Scope 1 through Scope 3 emissions, which the company links to the electricity needs of new technologies, “including generative AI,” and the construction of new data centers replete with servers, racks, and semiconductors.
  • Google’s 2023 report discloses that Scope 2 emissions, which stem primarily from their data center electricity consumption, increased by 37% over the previous year.

Meanwhile, electricity grid operators have responded by sounding the alarm over the stability of national supply at a time when many Western countries are transitioning to more intermittent energy sources like wind. We are only at the beginning of the AI journey, and many of the proposed use cases and implementations of the technology are still in their early stages or only being trialed by enterprises. All hyperscalers, including Amazon, envision significant increases in their Capital Expenditure (CAPEX) over the coming years to meet the computing needs of the AI boom, most of which will go toward the infrastructure and silicon needed for AI compute.

 

Demand Will Continue to Increase as AI Scales; Innovation Follows

IMPACT


Adoption at scale will create significantly more electricity demand than today’s level, and a simple comparison with today’s search engine energy use demonstrates the challenges ahead: the International Energy Agency (IEA) estimates that a Google search consumes 0.3 Watt-Hours (Wh) of electricity, which is dwarfed by the 2.9 Wh required by a ChatGPT prompt. The integration of generative AI into search queries, which has already started, and “behind the scenes” enterprise data pipelines, will continue to push inferencing demands, and research suggests that data center electricity use could more than double by 2030, growing to a 3% to 4% global share of power consumption, largely due to the exponential energy requirements of inferencing at scale.

A countervailing force to this trend is hyperscalers’ development of custom silicon for AI/Machine Learning (ML) workloads designed with energy efficiency front of mind. On the training side, this includes Google’s recently announced 6th generation Tensor Processing Unit (TPU) Trillium for training foundation models, AWS’s Trainium for running 100B+ models, and Microsoft’s November 2023 announced Maia 100 accelerator. On the inferencing side, Amazon offers its Inferentia accelerators, while Google and Microsoft recently announced their Arm Neoverse-based processors named Axion and Cobalt, respectively. Meta also has its own MTIA family of accelerators to power recommendation models and transformer frameworks. As well as dampening the burgeoning energy needs of generative AI with increased efficiency, custom silicon also fulfills AI companies’ desires to find alternatives to NVIDIA’s expensive and power-hungry GPUs, which often face long lead times, as well as AMD’s data center Instinct GPUs.

A second positive trend comes from the increasing use of software optimization techniques to improve the power efficiency of inferencing. In fact, software optimization is now table stakes for hardware vendors, and this applies to all form factors, from data centers to devices. Quantization reduces the computational and memory requirements of AI workloads by using lower precision data types, while maintaining most of the accuracy. Optimization tools like pruning from companies like Neural Magic, can make inference-optimized “sparse” models able to be transferred from GPUS to less energy-consumptive CPUs. Smaller models with fewer parameters, built by compressing application-specific parameters from LLMs, have emerged with lower memory and power requirements, also leading to improved AI inferencing efficiency, and central to chip vendor Intel’s enterprise AI proposition, although there is a trade-off with range and accuracy.

Exponential Growth in AI Power Consumption Needs Addressing

RECOMMENDATIONS


The entire AI ecosystem must continue to drive efficiency innovation and stay ahead of government regulations to avoid being caught off guard, and demonstrate leadership in the space to attract increasingly Environmental, Social, and Governance (ESG)-conscious investors. Chip vendors must continue to improve architectures and systems to realize efficiency gains specifically addressing bottlenecks introduced by today’s (fastest growing) generative AI workloads. This includes compatibility with smaller data types, like FP4, which is particularly useful for inferencing workloads, and improving memory bandwidth and capacity to aid the vast quantity of data movement for AI/ML. NVIDIA’s latest Blackwell chip is a good example of both of these innovations, and the NVLink interconnect’s energy efficiency over other interconnects also matters. Optimization techniques that have more recently led to breakthroughs like LLMs running on smartphones and Personal Computers (PCs) should be pushed for data center applications as much as for smaller form factors.

AI market leader NVIDIA has brought energy efficiency center stage and touts inference workload efficiency improvements of up to 25X. The company has also suggested that a new metric for energy use in data centers is needed to replace the outdated Power Use Effectiveness (PUE) scale, which compares a center’s energy use on computing to the entire infrastructure, effectively measuring how much is lost on cooling and other essential running costs not directly related to processing. Therefore, a measurement of “useful” energy output would be better suited for today’s workloads. The entire hardware industry should coalesce around a standard or metric that reveals the effectiveness of today’s AI systems. Tokens per joule, as mooted by NVIDIA, is overly focused on generative AI—"traditional” AI workloads will remain the dominant AI for some time to come. Alongside hardware and software innovation, effective benchmarks will drive efficiency gains for climate targets and ensure the stability of our electricity grids for the future.

Services

Companies Mentioned