Registered users can unlock up to five pieces of premium content each month.
NIMs Are Not Revolutionary, but Certainly Speak to Enterprise Challenges |
NEWS |
NVIDIA Inference Microservices (NIMs) announced at NVIDIA GTC are a new part of the NVIDIA AI Enterprise offering. NIMs are pre-built containers that enable enterprises to deploy Artificial Intelligence (AI) models and applications with optimized inference on any enterprise platform using CUDA-accelerated hardware, and claim to reduce deployment times from weeks to minutes. These microservices are built on top of the CUDA framework using NVIDIA inference software (including Triton Inference and TensorRT-LLM), and form part of the NeMo platform, which means other generative AI services can be used in parallel to support application deployment. Included within these microservices are industry-standard Application Programming Interfaces (APIs) for multiple domains (including language, speech, and drug discovery) to further accelerate AI application development. Alongside this announcement, NVIDIA unveiled a raft of cloud, Independent Software Vendor (ISV), and AI partners that are already building—or being made compatible with—NIMs.
NIMs address the following enterprise challenges:
NIMs will solve some of the challenges inherent within enterprise generative AI deployment. Of course, there are constraints. The biggest one is that NIMs are only compatible and “optimized” for CUDA-accelerated NVIDIA hardware. This will create even further vendor lock-in within the NVIDIA ecosystem, a boon for NVIDIA as it drives demand for its hardware, but restrictive for customers seeking alternative AI platforms. Developers may see this as one step too far, as it creates another layer of vendor lock-in; however, it is more likely that developers see NIMs as an opportunity to accelerate generative AI deployment.
Will the UXL Foundation Help Competitors Combat NVIDIA's Growing Software Dominance? |
IMPACT |
NVIDIA is certainly investing in its software offering to support their market-leading CUDA framework. The rest of the market still lags behind, but they are trying to catch up with internal Research and Development (R&D) and acquisitions targeting the software stack, as well as cooperation. A recent announcement from a consortium called the UXL Foundation (which includes Qualcomm, Google, Intel) aims to develop a suite of software tools that will support multiple types of AI accelerator chips. This open-source project, built on Intel’s oneAPI, aims to make chip-agnostic computer code by creating a standard programming or common specification designed for AI. This will hopefully ease NVIDIA’s grip on the AI market by undercutting the role of CUDA.
Driving competition at the software layer is the right approach given NVIDIA’s chipset dominance, but ABI Research has certain reservations. The key challenge remains that developers have been using CUDA for 15+ years, built code around it, and are still seeing the best performance from NVIDIA hardware. These constraints mean that it is easier said than done to get developers to migrate models and workloads even with a competitive open-source approach. In addition, using oneAPI as a starting point may not be beneficial: the toolkit remains complex and challenging to use, which will hinder its ability to scale across enterprise use cases. If we contrast this to NIMs’ high degree of simplicity, ABI Research sees obvious challenges, even if the consortium brings substantial improvements to oneAPI. Lastly, oneAPI is focused on supporting developers by deploying training workloads, which is certainly important, but increasingly the focus will shift toward inference, as this will be the largest growing workload. Success for the UXL Foundation will rely on fostering common innovation that targets inference workloads and creates a clear business case that justifies the difficult transition away from NVIDIA for developers.
R&D and Investment in Optimization Is Essential for Sustained Competitive Differentiation |
RECOMMENDATIONS |
The UXL Foundation may be successful in opening up the AI market, but chip vendors cannot rely on this alone. They must continue to develop in-house software capabilities to build a differentiated value proposition. The first area that they must target is optimization. As AI scales rapidly, costs, resources, and performance will create problems for stakeholders. Deep model optimization with hardware, application, and domain awareness will be the key variable that can unlock these AI bottlenecks.
Most chip vendors (like AMD, Qualcomm, Intel) already offer tools to support developers in optimizing AI for their chipsets, but these platforms are immature and complex, requiring deep AI expertise for effective use. This is just one of the challenges for chip vendors looking to develop a strong software value proposition. Moving forward, chip vendors must look to enhance their solutions by investing in or building partnerships with third parties. Some of the key areas that they must explore are highlighted below:
As NVIDIA continues to build out its software armory, competitors must follow suit and invest appropriately in R&D or third-party partnerships to build out an enticing software value proposition that creates traction with developers and enterprise customers.