ISC21 Trends: Workload Convergence, HPC for the Masses
|
NEWS
|
Workload convergence was the theme permeating many of the exciting new technology discussions at the virtual conference, International Supercomputing 2021, that took place last week, but are different workloads coming together, or are more heterogenous compute solutions making the technology available to process a more diverse set of workloads?
High Performance Computing (HPC), in its never-ending quest for productivity, has always been quick to adopt and adapt leading edge technology for its purposes. For this reason, many observers look to this specialist industry to provide insights into future technology adoption. The early years of HPC saw large scale deployments of specialized and expensive hardware at well-funded research facilities or government backed institutions. Often these deployments were built for a very discrete set of tasks with well-defined technology boundaries and the central processing units (CPU) that were used were optimized for those purposes. As technology evolved, HPC systems started to move towards lower cost alongside commodity CPU that were manufactured in much higher volumes to meet their processing needs. From the early to mid-2000’s, X86-based CPU dominated the HPC datacenter, which they still do today. The emergence of graphics processing units (GPU) and other acceleration technologies has made HPC systems even more capable and flexible, expanding the range of workloads that benefit from being run on HPC systems.
Cost effective and capable HPC clusters have brought high performance computing to new enterprise verticals: oil and gas companies used them to perform reservoir simulations to maximize the return from their reserves, engineering companies used them to perform CAE and CFD workloads which sped up their design process, the biomedical research community used HPC to speed up genomic sequencing, and the finance industry used HPC clusters to more accurately calculate their exposure to fast moving financial markets. The evolution of GPU and acceleration technologies, and the ecosystem that grew around them, brought workloads that thrived on vector processors. Artificial Intelligence (AI) and Machine Learning (ML) workloads moved out of the intellectual domain and into the enterprise domain as real world use cases flourished and needed to be serviced by modern HPC clusters. Whether the new AI/ML workloads have driven the new architectures we are seeing in HPC today or not is largely academic, since standalone CPU computing has had its time in the sun.
Whatever You're Selling, It Had Better be Good at More Than One Thing
|
IMPACT
|
Heterogenous compute systems are fast becoming the standard and most HPC system providers offer fully integrated and flexible modular solutions that can be tailored to differing workloads and altered as demands change. The convergence of HPC, AI, and ML workloads is tightly coupled to the availability of these flexible and modular heterogenous hardware solutions. HPC systems are evolving rapidly, and as they become more capable, and the software that interacts with them improves, the need for expensive specialist tuning and optimization reduces. This has resulted in HPC hardware becoming more accessible. While its evolution towards heterogeneous compute is being driven by AI, all workload types benefit.
AI and ML models are being improved, allowing more accuracy and better precision, the emergence of large-scale natural language learning models and vision processing models are pushing the boundaries of what we will expect enterprise level HPC systems to achieve. These demands from AI and ML will increase exponentially in the coming years and innovation in the data center will need to rise to meet them.
Tomorrow's HPC Data Center Will Look Different: CPU Requires Assistance
|
RECOMMENDATIONS
|
The role of the CPU is changing. The next generation HPC systems will see our data centers filled with different types of processing units and accelerators that will collaborate at the hardware layer and share resources to meet the rise in demand.
Specialized HPC clusters designed for specific purposes will always exist, but for more general workloads, and to accommodate AI and ML applications, the CPU’s role in the HPC data center will need to be augmented with competing technology. Processor improvements that increase the speed and density of CPU cores are advancing, but they cannot keep up with the complexity and demand of these new AI workloads. Meeting the next generation of compute demands will be achieved by increasing efficiency through combining the processing technologies into one unit with shared resources. With the Grace CPU we have seen NVIDIA combine memory and GPU with CPU to increase compute throughput for AI and HPC workloads. It also offers the Bluefield-3 Data Processing Unit (DPU) which combines high speed networking, compute, and memory alongside hardware accelerators to decouple the underlying infrastructure from the applications that use it, which frees up the CPU for non-trivial compute. Intel recently announced the release of its Infrastructure Processing Unit (IPU), which, in a similar fashion aims to free up CPU cycles by shifting networking, storage, and cryptography tasks to more appropriate hardware.
These heterogeneous systems will become more prevalent, improving performance through acceleration and by ensuring that the optimal component is used for a specific purpose. They will require a new approach to manage them. This new approach will not require a monumental shift in thinking, but it’s believed the advantage will be handed to the system builder that not only considers the hardware but also the integration and secure management of that hardware.