Going Beyond Acceptance? Arm Neoverse Server CPUs Stand a Chance in the Market

Subscribe To Download This Insight

By Paul Schell | 2Q 2024 | IN-7282

Arm’s data center Central Processing Unit (CPU) portfolio was recently expanded with its third generation of Neoverse Intellectual Property (IP), including pre-integrated and validated configurations. The industry agrees: design wins like Microsoft’s Cobalt and Arm Total Design’s growth to 21 partners bode well for its infrastructure business as it transitions to a more impactful position.

Registered users can unlock up to five pieces of premium content each month.

Log in or register to unlock this Insight.

 

Next-Gen Data Center Designs

NEWS


Arm has expanded its Neoverse Compute Subsystems (CSS) with the February announcement of its third-generation V- and N-series Intellectual Property (IP) for Systems-on-Chip (SoCs) to address Artificial Intelligence (AI) workloads in data centers. Both come with significant performance improvements, in particular for AI data analytics workloads. The CSS portfolio enables faster and cheaper time to market for custom silicon as the CSS come configured, verified, and validated. The offering is bolstered by the complementary Arm Total Design, a one-stop shop for Neoverse designs, which now features 21 ecosystem partners helping to implement CSS IP on custom SoCs, including Siemens’ Electronic Design Automation (EDA) software and Samsung’s foundry services.

Neoverse IP includes Scalable Vector Extensions (SVEs) for AI/Machine Larning (ML) workloads to increase per core performance and provide implementers with the flexibility to scale their designs. Data center technology giants like NVIDIA, Microsoft, and Amazon Web Services (AWS) use Arm’s designs and customize them to optimize their entire stack around AI workloads, improving on the Total Cost of Ownership (TCO) of general-purpose Central Processing Units (CPUs). Although the AI performance of CPU cores is dwarfed by that of pure accelerators like NVIDIA’s data center Graphics Processing Units (GPUs), it is important for applications that have not been ported to accelerators (e.g., data management, orchestration, scheduling, aggregation, etc.) for technical or economic reasons, which stem from the cost of doing so, and the variety of AI used for today’s workloads.

Design Wins: Going from Acceptance to Predominance?

IMPACT


Arm’s Neoverse IP is typically implemented alongside accelerators like in the NVIDIA GH200 Grace Hopper Superchip detailed below. The appeal of Arm IP has led to its deployment in a host of supercomputers and cloud setups, which demonstrates its viability in server setups, as well as the flexibility and scalability of the IP. High-profile silicon design wins using Neoverse IP include the following:

  • Fujitsu’s 48-core A64FX SoC powers the Fugaku supercomputer located in Japan and the United Kingdom’s Isambard 2 supercomputer. The next-generation CPU code-named Monaka will be a 144-core chiplet design with Arm’s proprietary SVE2 instruction set for enhanced AI and High-Performance Compute (HPC) coverage.
  • SiPearl’s Rhea-1 microprocessor powered by Arm Neoverse V1 cores is designed to work alongside accelerators and powers the Jupiter supercomputer in Germany.
  • Microsoft’s Azure Cobalt 100 is a 128-core CPU based on Arm Neoverse CSS powering common cloud workloads alongside the Maia 100 accelerator.
  • Alibaba Cloud’s Yitian 710 is powered by 128 Arm cores and is custom built by T-Head, its in-house chip development unit, to be implemented in servers optimized for both general and specialized AI workloads.
  • NVIDIA’s Grace CPU is implemented in its GH200 and now flagship GB200 Superchip SoCs, each containing 72 Arm Neoverse V2 cores and providing industry-leading AI acceleration.
  • AWS’s Graviton4 is a 96-core Neoverse V2 processor for powering diverse cloud instances on Amazon Elastic Compute alongside its Trainium accelerators.

Server CPUs optimized for area, power, and performance free up engineers’ time to customize and innovate for their own requirements. By bringing together many complex elements of the semiconductor supply chain, Arm Total Design speeds up tape-outs and saves costly engineering hours over an individually curated design. The effectiveness of this solution is demonstrated by the variety of players using Arm’s IP for custom CPUs.

Arm's Infrastructure Offering Must Continue to Expand

RECOMMENDATIONS


The advantages of leveraging Arm’s designs are clearly outlined by its implementers: the scalability and performance per Watt (W) improvements over x86 processors from companies like Intel and AMD, as well as the scalable vector extensions, address the needs of those targeting modern HPC workloads, including AI/ML. Arm’s solid AI roadmap and the software and security maturity of the ecosystem portend more design wins and future implementations of its IP—as demonstrated by Fujitsu’s commitment to use the former’s designs in its next-generation Monaka processors, which will likely feature in supercomputers and (telco) edge AI deployments toward the end of the decade. To go from acceptance to predominance in the AI data center, Arm should consider the following industry trends and AI requisites:

  • The industry’s move to chiplets to improve production yields and overcome the performance limitations of SoC designs means more customers will look to Arm for its progress here, especially as a founding member of Universal Chiplet Interconnect Express (an open interconnect specification between chiplets on a package). Arm should continue to invest in collaborative efforts like its Chiplet System Architecture to encourage greater reuse of components like physical and soft IP between vendors.
  • The rapid proliferation of generative AI workloads should be considered and addressed to create designs complementary to these workloads, even if the majority of AI calculations will be offloaded to accelerators like Application-Specific Integrated Circuits (ASICs) and GPUs.
  • The more complex matrix calculations important to some AI/ML workloads are already supported in Neoverse IP and should be promoted to the same level as the vector extensions, as these AI workloads will continue to proliferate.
  • Arm should continue to bolster Arm Total Design and its ecosystem of design services, advanced analytics, complementary IP, EDA, and foundries to cover the breadth of data center chip designs. Consideration should be given to smaller players and startups that lack the budgets of hyperscalers for these essential services.
  • The Emerging Business Initiative between Arm and Intel should place equal focus on data center SoCs (as it does on mobile) to help provide essential IP, manufacturing support, and financial assistance for fabless startups, thereby spurring innovation in the area.

The recent block by Chinese regulators of Intel and AMD chips in government systems points to an opportunity for Arm-based SoC designs in the country and should avoid U.S. export controls at their current level. Arm’s IP is considered highly important to China’s domestic chip industry—especially in mobile and automotive SoCs—and will probably avoid sanctions from Chinese regulators for some time.