The Market Is Shifting Its Focus from Large Language Models to Small Language Models
|
NEWS
|
The Artificial Intelligence (AI) market is seeing a profound shift in foundational model innovation. Although AI leaders are still innovating at the top end targeting Artificial General Intelligence (AGI), opportunities in the enterprise market are leading to investment and developer activity in a new class—Small Language Models (SLMs). SLMs are a new category of foundational models with fewer than 15 billion parameters. These models are built by extracting and compressing application-specific parameters from Large Language Models (LLMs). These are fine-tuned on datasets to target specific tasks, verticals, or applications. SLMs have much lower memory and power requirements, offering improved AI inferencing efficiency and time-to-train. Innovation in SLMs is being spurred by several interconnected supply side factors. Open-source expansion has accelerated innovation with platforms like Hugging Face and big tech investors like Meta acting as a foundation on which developers can build new leading-edge models. Part of this has been the availability and improvements of Machine Learning Operations (MLOps) tools. Big strides, and open-sourcing, in model compression, optimization, quantization, pruning, etc. are supporting the development of competitive SLMs. The transition from cloud to edge/devices has necessitated software innovation to build models capable of bringing productivity-AI applications to market. However, it is important to note that certain complex applications like mathematical reasoning and video understanding still require larger models.
Two types of SLMs (“tailored” and tiny) emerged in 2023, which was breakout year for “tailored” models (vertical or application-specific models with between 2 billion and 15 billion parameters). Marquee companies like Meta (LLaMA 2 7b, 13b) and Microsoft (Orca 13b) have poured capital into innovation, while startups have also shown success. Mistral (a 9-month-old startup) built an open-sourced 7 billion parameter language model offering similar performance to OpenAI’s 175 billion parameter model, GPT-3.5.
Moving into 2024, ABI Research sees new innovation targeting tiny language models. These models, with fewer than 2 billion parameters or even fewer than 1 billion parameters in the future, will further reduce power, memory, and computation demands to run generative AI applications without compromising processing efficiency. These models will be deployed in even more constrained locations. AI leaders remain mostly in the Research and Development (R&D) phase, with some being deployed in commercial applications:
- Google Gemini Nano (1.8b parameters): Running on Pixel 8 Pro enabling text summarization, contextual smart replies, and advanced proofreading.
- Microsoft Phi-1.5 (1.3b parameters): Built to support open-source research around biases, controllability, and safety challenges.
- TinyLlama (1.1b parameters): 500 Megabyte (MB) model trained on 3 trillion token dataset in just 90 days.
- MosaicML (owned by Databricks) MPT-1B (1b parameters)
- Eleuther AI’s Pythia (1b, 1.4b parameters): Research model suite.
These models still remain fairly basic, but evidence suggests that they are certainly moving in the right direction. For example, Microsoft Phi-1.5 has shown higher performance on difficult reasoning tasks compared to larger models like Vicuna-13B.
Can "Tailored" and "Tiny" Language Models Support Enterprise Deployment?
|
IMPACT
|
Enterprises are still facing enormous challenges hindering generative AI adoption deployment. First, models are not reliable enough for mission-critical or customer-facing use cases, as they remain subject to hallucinations and bias. Second, enterprises still do not have the infrastructure, datasets, structures, and skill set to support effective deployment. Third, running generative AI is costly. Leveraging targeted multiple SLMs could offer a better route to AI deployment, as compared to “giant” LLMs, by following these recommendations:
- Reducing Operating Costs: SLMs will reduce hardware utilization for training, fine-tuning, and inferencing, which will help optimize cloud bills and lower energy consumption.
- Improving Model Transparency: Especially when combined with Retrieval Augmented Generation (RAG), SLMs are easier to understand and audit, helping enterprises gain deeper insight into generative AI.
- Better Computational Efficiency: Fine-tuning giant models is time-consuming and takes hundreds of Graphics Processing Unit (GPU) hours; however, this process is much cheaper and quicker with SLMs. One example is TinyLlama, an open-source project, that is testing if it can pretrain tiny 1.1b Llama models on 3 trillion tokens in just 90 days.
- Alleviating Hardware Constraints: SLMs do not require specialized hardware to effectively run training or inferencing workloads. Given supply chain challenges around high-end GPUs, SLMs will enable enterprises to use Neural Processing Units (NPUs) or Central Processing Units (CPUs) to support AI deployment.
- Lower Memory Requirement to Open Up Deployments at the Far Edge: Performant tiny LLMs could enable every device to have a local language model capable of support generative use cases. This will help alleviate cloud dependency, support productive AI at scale, and solve data privacy/sovereignty challenges.
But challenges still remain. Knowledge will be limited, as SLMs are memory constrained. This can be alleviated with RAG, but this may not be appropriate for each use case. In addition, operationally, running hundreds of task-specific SLMs within or across different business units will be challenging. Both LLMs and SLMs offer enterprise opportunities and can support different purposes. Decision makers must focus on aligning the task, expectation, and infrastructure/skillset to help determine what model to deploy.
Supply Chain Has Opportunities to Take Advantage of SLM Innovation
|
RECOMMENDATIONS
|
So far, the generative AI market has been dominated by the loss-leading, cloud-focused consumer segment (which continues to expect free access to applications like ChatGPT). The enterprise and device spaces are lagging behind. Competitive SLMs may offer stakeholders an opportunity to build new monetization strategies and crack the enterprise space. ABI Research believes key players in the value chain must embrace, invest in, and support SLMs as part of a wider AI commercialization strategy:
- AI Leaders (e.g., Anthropic, Data Robot, Hyperscalers): SLMs will be crucial to building a competitive value proposition. They should target integration with emerging players in the space (e.g., Mistral) and offer enterprises tools that enable effective fine-tuning. Effective and safe integration with platforms like Hugging Face will be necessary to expose clients to leading-edge model development.
- Chipset Vendors: Edge and device generative AI will be massive growth areas, especially for those competing with NVIDIA’s leading GPUs. Developing deep optimizations and common Go-to-Market (GTM) strategies with leading SLM developers can help drive differentiation among competitors and create a clear case for AI deployment in these locations. Qualcomm and MediaTek are already exploring tiny LLMs to support generative AI applications in lower tiers of Personal Computers (PCs) and smartphones. Companies like Intel that are pursuing Independent Software Vendor (ISV) acceleration programs (aiming to building an AI application ecosystem) should look to leverage tiny LLMs effectively within this process to target all form factors.
- Original Equipment Manufacturers (OEMs): Similar to chipset vendors, the growing but highly fragmented on-device AI market is a perfect fit for SLMs. Developing GTM strategies with ISVs running on leading SLMs will offer a highly differentiated value proposition. Tiny models may not even require specialized hardware to process workloads on-device, which will enable on-device AI to be deployed across the long tail of devices.
In 2024, the SLM market will continue to expand. “Tailored” models will begin to mature with commercial initiatives starting to leverage 3 billion to 15 billion parameter models as exemplified by the PC and smartphone markets. Tiny models will receive greater R&D and investment, supported by open-source commitments and greater developer interest. One area that needs attention in 2024 is multi-modality, as productive applications will rely on this feature. Looking forward, accelerated innovation within MLOps techniques will enable further model compression, resulting in even smaller (sub-1 billion) and more competitive models.