All (AI) Systems Go: AMD Edges Closer to Market Leader NVIDIA with Purchase of ZT Systems

Subscribe To Download This Insight

By Paul Schell | 3Q 2024 | IN-7509

NVIDIA’s successful Artificial Intelligence (AI) data center strategy is built on a systems-level proposition scaling from individual Graphics Processing Units (GPUs) up to clusters of thousands of accelerators operating as one. ZT Systems’ legacy and expertise in building AI infrastructure for hyperscalers will bring this know-how in-house to AMD. The hope is that this acquisition will create a faster, more streamlined product delivery channel to market for its most performant hardware.

Registered users can unlock up to five pieces of premium content each month.

Log in or register to unlock this Insight.

 

ZT Systems Comes Onboard

NEWS


AMD plans to acquire privately-held ZT Systems for US$4.9 billion as part of its Artificial Intelligence (AI) systems strategy to better serve the largest customers of AI hardware, which includes hyperscalers like Microsoft and Google Cloud. ZT Systems’ hefty valuation and over 15 years of experience in designing and building data center infrastructure at scale for the largest cloud companies designates it as one of the leading providers of AI inference and training systems. Its services encompass the optimization of systems for performance, power, and cooling, all of which have a major impact on the viability and Total Cost of Ownership (TCO) of large-scale AI deployments.

ZT’s capabilities are underpinned by a 1,000-strong design engineer team with global reach and know-how in vital areas of large-scale system deployments, including the testing, debugging, and integration of chips, as well as the design, validation, and management of networks and large clusters of accelerators working in unison. These clusters enable the training and development of frontier, ultra-large Generative Artificial Intelligence (Gen AI) models such as Meta’s Llama 3, as well as the inferencing for scaled end-user applications. This move comes alongside the recent acquisitions of three AI software companies: Silo AI, for open-source, vertical-specific Large Language Models (LLMs);  Nod.ai, for compiling AI models across its Central Processing Unit (CPU) and Graphics Processing Unit (GPU) portfolio; and, Mipsology, for its AI computer vision solutions.

An Industry-Wide Aspiration: Systems Vertical Player

IMPACT


AMD’s acquisitions position the company more as a systems-vertical player than horizontal silicon vendor. Naturally, without ZT, AMD already possesses a certain level of expertise in systems design, but the company aims to aggressively capture more of the data center AI accelerator Total Addressable Market (TAM), which it projects to be US$400 billion by 2027. The company states that its Instinct data center GPUs have ramped up faster than any of its previous silicon to bring in an estimated US$4.5 billion of revenue for 2024, and the COMPUTEX 2024 announced NVIDIA-esq roadmap includes a memory-expanded version of its flagship Instinct accelerator and a yearly cadence of fundamental architecture upgrades to the accelerator portfolio.

ZT’s addition to the portfolio will hopefully enable faster, larger deployments of its data center GPUs with the simple goal of addressing more of the burgeoning AI data center market. AMD’s most significant customers consume clusters of GPUs, and this will include more customized, domain-specific infrastructure targeting highly optimized systems for enterprise customers, complemented by the now internalized team at Silo AI, which have experience working with the Instinct platform.

Accelerating AMD's Strategy for AI Computing

RECOMMENDATIONS


AMD’s purchase of ZT Systems, along with its growing portfolio of software assets, will enable it to optimize its proposition across silicon, software, and entire systems. To further close the gap with NVIDIA, several areas of AMD’s strategy deserve more attention:

  • The performant scale-up of chips within larger systems (beyond 8-GPU servers) is a major area that differentiates NVIDIA with NVLink. AMD’s response is the open UALink project based on its proprietary and established InfinityFabric. AMD must bring all major server builders onboard, and this includes players like IBM, not publicly in the consortium. Leveraging large infrastructure customers’ desires for interoperability and openness, which UALink sets out to achieve, will tempt customers to move away from NVIDIA’s closed ecosystem.
  • Early rumors suggest AMD will not sell unified solutions à la NVIDIA’s DGX so as not to compete with its largest

AI data center customers (i.e., server Original Equipment Manufacturers (OEMs)). Leveraging ZT’s expertise to build a highly optimized and easily scalable solution—especially once UALink hardware becomes available in 2026—will offer a more “turnkey” solution that can support efficient compute for leading-edge workloads.

  • Further strategic and focused software acquisitions, and using the recently acquired talent from Silo AI, for example, to drive optimizations for diverse AI frameworks is a vital source of performance differentiation. This is an area where NVIDIA has excelled.
  • The networking side of AI data centers for scaling out is equally important, and innovations around Network Interface Cards (NICs) are a vital part of this fabric. Collaborations with innovative new companies, such as Enfabrica with its multi-GPU NIC that connects GPUs to switches, will accelerate network traffic and improve the efficiency of AI systems.

Services

Companies Mentioned