Registered users can unlock up to five pieces of premium content each month.
Operating as One—Scale-Up in the Modern AI Data Center |
NEWS |
Frontier data center AI implementations, including for the training and large-scale inference of Large Language Models (LLMs), require bulky clusters of accelerators able to communicate at a low latency and improve their inference and training performance by increasing throughput and utilization. For example, Meta, with its widely implemented Llama family of LLMs, shared details of two 24,000 Graphics Processing Unit (GPU) clusters used for training Llama-3 (requiring over 30 million “GPU hours”), as part of its wider 350,000 NVIDIA H100 portfolio. NVIDIA, through a series of strategic acquisitions and internal R&D, has developed a scalable solution to support workload acceleration. One of the key elements is NVLink, which supports Central Processing Unit (CPU)-GPU and GPU-GPU communications.
UALink, through a newly formed consortium of semiconductor vendors from across the ecosystem, looks to challenge NVIDIA’s hegemony by providing an open alternative to NVLink’s single-vendor proprietary solution by standardizing hardware across numerous vendors. UALink is proposing a single front to rival NVIDIA in a standard that will be implemented by all members of the group, utilizing established Ethernet standards and AMD’s Infinity Fabric’s shared memory protocol for memory sharing across accelerators. The Promoter Group consists of accelerator vendors, cloud providers Google and Microsoft (which are also captive vendors that design their own data center accelerators), and, crucially, networking and switching vendors Broadcom and Cisco. These networking players, along with the Ultra Ethernet Consortium (UEC), which is also part of UALink, complete the picture needed to build supercomputers with scalable platforms like NVIDIA’s DGX and GB200 exascale systems, covering both scale-up and linked memory within a pod (of up to 1,024 GPUs for specification 1.0), and scale-out between clusters or pods.
A Unified Proposition with UEC? |
IMPACT |
NVIDIA’s high-performance GPUs are but one element of its successful proposition. Its dominance in the Artificial Intelligence (AI)/Machine Learning (ML) space is best analyzed on a systems level, and this applies to implementations from single accelerator cards all the way up to supercomputers made up of thousands of connected GPUs scaled-up within one computing node via NVLink. “Scale-outs” of several nodes connected via NVIDIA’s proprietary InfiniBand and Ethernet-based networking equipment are also important systems. The inclusion of the UEC—focused on scale-out—in UALink creates a more holistic, complementary front from which to challenge NVIDIA, as both scale-ups (of numerous accelerators operating in lock-step) and scale-outs (between such “nodes”) are needed to provide a solution like that enabled by NVIDIA’s proprietary hardware. Moreover, Broadcom’s pre-UALink announcement that its switches will be compatible with Infinity Fabric will address the switching component of the system (NVSwitch in NVIDIA land).
As with other open consortia like the Unified Acceleration Foundation (UXL) and the Universal Chiplet Interconnect Express (UCIe), UALink’s open standard should spur innovation and collaboration, resulting in more options for AI server nodes and rack scale systems—at diverse price points. Cross-vendor compatibility for UALink hardware is baked into such open efforts, and innovation will not be stifled by proprietary licensing fees. Nonetheless, and as with NVIDIA’s networking solutions, UALink is years behind NVLink. The first specification is expected to be released this quarter supporting bandwidths of 128 Gigabits per Second (Gbps), which will be roughly doubled with an upgrade following shortly thereafter in the fourth quarter. NVLink’s fourth-generation specification supports up to 900 Gbps per GPU via 18 links per accelerator.
Convergence and Competition |
RECOMMENDATIONS |
The industry expects the first commercial implementations of UALink hardware to be in 2026, representing a fast turnaround for such a far-reaching standard and reflecting the need for alternatives to NVIDIA’s monopoly. This monopoly will not be easy to break, but UALink-compliant hardware for operating large systems consisting of 1,000+ accelerators will go some way. ABI Research makes the following recommendations and predictions.
Predictions:
Recommendations:
By any measure of success, commercial deployments by 2026 will be a huge achievement for a new standard in the increasingly complex AI semiconductor industry. Regardless of when, the fruits of UALink’s labors will challenge NVIDIA’s proprietary solutions in scaling up and out of individual processing units.