NVIDIA's market capitalization continues its meteoric rise, largely driven by ever-growing data center revenue that surprises financial analysts time and again. NVIDIA's early bets on Artificial Intelligence (AI), followed by generative AI (via its cooperation with OpenAI), were winners, and this lends its charismatic CEO the confidence and, in many people's eyes, the legitimacy to define the future of computing. During his keynote at COMPUTEX, we heard again that the future will be “generative,” reducing the need to retrieve data stored in data warehouses around the world.
According to Jensen Huang, data and content retrieval could lead to a massive power waste, and content generation, aided by generative AI, could help reconstruct the same content much cheaper and more energy-efficiently. Not all industry observers agree with this statement. Indeed, the claim that this next era of computing will be “green” thanks to the savings on networking costs is disingenuous—yes, accelerating generative AI workloads is less consumptive than sequentially calculating them à la “general-purpose” computing, but this still consumes more power than retrieving previously generated content the old-fashioned way. ABI Research will zoom in on this specific topic in a separate communication, so watch this space.
NVIDIA’s Competitive Strengths
NVIDIA's partner ecosystem is vast and many of the companies are either headquartered in Taiwan or enjoy a sizable presence on the island at the heart of Asia's advanced semiconductor supply chain. These plentiful channels to market enable NVIDIA to shift as many reference designs and HGX systems as TSMC can supply GPUs—and lead times appear to be decreasing, if server sellers and Supermicro's CEO are to be believed.
There's every chance that CUDA has now reached critical mass, and that all others, including chip giants Intel and AMD, will be playing catch-up for years to come. Having spoken to all major server ODMs, the word on the show floor at the Nangang Exhibition Center certainly corroborates that view, and marketing drives and product placement by the likes of, among many more, Supermicro, Ingrasys, Acer, and ASRock reveal the favorable real estate afforded to NVIDIA GPUs at the expense of Intel and AMD's accelerators.
Overall, ABI Research maintains that NVIDIA's leadership is best analyzed at the systems level, and the performance of the company's proprietary chip interconnects, switches, and networking (NVLink) reduce the potential bottlenecks for AI workloads. These interconnects are enabling every single GPU to talk to any other GPU in the system, making the entire system behave like one mega computer powered by a single GPU and a single Operating System (OS). This is one of the greatest advantages NVIDIA has compared to competitors that have been lacking an adequate interconnect to scale their cloud processing power to deal with LLMs and the colossal workloads of generative AI. This is an area where NVIDIA has excelled and represents a weak point in competitors' propositions, including those of Intel and AMD. NVIDIA first introduced NVLink in early 2016 and is currently on its fifth generation, delivering up to 200 Gigabytes per Second (Gbps) of data transfer-based chip-to-chip communication, almost doubling the rate compared to the previous version.
Open standards like Peripheral Component Interconnect Express (PCIe), Ultra Ethernet, and UALink (also Ethernet-based) are playing catch-up here also. Just before COMPUTEX, the industry leaders, including Intel, AMD, Google, and Microsoft, introduced UALink, a new open standard for interconnects aiming to compete with NVLink. The new standard will have almost 8 years of Research and Development (R&D) to catch up to before pushing the envelope to compete with NVLink on a mega-cluster level, as even the more performant hardware will lose out if individual accelerators cannot communicate effectively.
Sizing Up NVIDIA’s Competitors
In terms of NVIDIA's competitors in the AI cloud environment, Intel has responded with an 8-unit Gaudi 3 UBB offering priced highly competitively at US$125,000, notably less than the equivalent NVIDIA system. Couple that with the performance/$ claims, which reveal a 2.3X improvement for inference over NVIDIA's H100, (1.9X for training), and you have a compelling Total Cost of Ownership (TCO) proposition. The price of cloud instances on Intel's latest accelerators will reveal more in 2H 2024. Continued oversubscribed capacity on Intel's Developer Cloud demonstrates a positive response from clients. Intel's deep understanding of enterprise customers' needs, as well as Gaudi 3's performance on Mixture of Experts (MoE) workloads, which differ from LLMs, and are expected to capture a growing share going forward, bode well for the future. Performance at massive scale-up in an 8192 cluster was also demonstrated, which shows that the aforementioned benchmarked gains on common LLM workloads were not wiped out by Intel's Ethernet-based open networking hardware.
AMD's AI data center offering is spearheaded by its MI300X accelerator, which it continues to promote as its fastest ramping product, and which was available from most server OEMs demoing at COMPUTEX, competing with NVIDIA's H100 for both inference and training workloads. As with Intel's Gaudi platform, the real estate devoted to AMD's accelerators was dwarfed by NVIDIA's presence. AMD has taken inspiration from NVIDIA's release cadence, and this year will see an upgrade to the 300 series with the MI325X with increased compute and memory based on the same CDNA3 architecture set to launch later this year, CDNA4 in 2025, and another in 2026. Such frequent accelerator upgrades will likely only apply to Communication Service Providers (CSPs), however, as AI accelerators represent significant capital investments for most enterprises, and are expected to last for several years.
This was the second of two articles I wrote after my experience at COMPUTEX 2024. To learn about the PC AI developments at the show, read my previous piece.