Multimodal Learning Explained: How It's Changing the AI Industry So Quickly

According to ABI Research forecasts, the total installed base of devices with Artificial Intelligence will grow from 2.7 billion in 2019 to 4.5 billion in 2024. Billions of petabytes of data flow through AI devices every day. However, right now, most of these AI devices are working independently of one another. Yet, as the volume of data flowing through these devices increases in the coming years, technology companies and implementers will take advantage of multimodal learning – and it is fast becoming one of the most exciting and potentially transformative fields of AI.

What Is Multimodal Learning?

Multimodal learning consolidates a series of disconnected, heterogeneous data from various sensors and data inputs into a single model. Unlike traditional unimodal learning systems, multimodal systems can carry complementary information about each other, which will only become evident when they are both included in the learning process. Therefore, deep learning-based methods that combine signals from different modalities are capable of generating more robust inferences, or even new insights, which would be impossible in a unimodal system.

Multimodal learning presents two primary benefits:

  • Multiple sensors observing the same data can make more robust predictions, because detecting changes in it may only be possible when both modalities are present.
  • The fusion of multiple sensors can facilitate the capture of complementary information or trends that may not be captured by individual modalities.

Generative and Multimodal AI is expected to drive the next phase of cloud AI growth. Learn how in our technology trends whitepaper (go to trend #20 on the list).

How Multimodal Learning is Scaling

Multimodal learning is well placed to scale, as the underlying supporting technologies like deep learning (Deep Neural Networks (DNNs)) have already done so in unimodal applications like image recognition in camera surveillance or voice recognition and Natural Language Processing (NLP) in virtual assistants like Amazon’s Alexa. Furthermore, the cost of developing new multimodal systems has fallen because the market landscape for both hardware sensors and perception software is already very competitive.

In addition, organizations are beginning to embrace the need to invest in multimodal learning in order to break out of AI silos. Instead of independent AI devices, they want to manage and automate processes that span the entirety of their operations.

Given these factors, ABI Research projects that the total number of devices shipped with multimodal learning applications will grow from 3.9 million in 2017 to 514.1 million in 2023, at a Compound Annual Growth Rate (CAGR) of 83%.

However, most AI platform companies, including IBM, Microsoft, Amazon, and Google, continue to focus on predominantly unimodal systems. Even the most widely known multimodal systems, IBM Watson and Microsoft Azure have failed to gain much commercial traction – a result of poor marketing and positioning of multimodal learning's capabilities.

This gap between demand and supply presents opportunities for platform companies and other partners. Multimodal learning will also create an opportunity for chip vendors, as some use cases will need to be implemented at the edge. The implementation requirements of sophisticated edge multimodal learning systems will favor heterogeneous chip systems, because of their ability to serve both sequential and parallel processing.

Opportunities That Multimodal Learning Presents for Key End Markets

Momentum around driving multimodal learning applications into devices continues to build, with five end-market verticals most eagerly on board:

In the automotive space, multimodal learning is being introduced to Advanced Driver Assistance Systems (ADAS), In-Vehicle Human Machine Interface (HMI) assistants, and Driver Monitoring Systems (DMS) for real-time inferencing and prediction. 

Robotics vendors are incorporating multimodal learning systems into robotics HMIs and movement automation to broaden consumer appeal and provide greater collaboration between workers and robots in the industrial space.

Consumer device companies, especially those in the smartphone and smart home markets, are in fierce competition to demonstrate the value of their products over competitors. New features and refined systems are critical to generating a marketing edge, making consumer electronics companies prime candidates for integrating multimodal learning-enabled systems into their products. Growing use cases include security and payment authentication, recommendation and personalization engines, and personal assistants. 

Medical companies and hospitals are still relatively early in their exploration of multimodal learning techniques, but there are already some promising emerging applications in medical imaging. The value of multimodal learning to patients and doctors will be a difficult proposition for health services to resist, even if adoption starts out slow. 

Media and entertainment companies are already using multimodal learning to help with structuring their content into labeled metadata to improve content recommendation systems, personalized advertising, and automated compliance marking. So far, deployments of metadata tagging systems have been limited, as the technology has only recently been made available to the industry. 

Where Does Multimodal Learning Go from Here?

Multimodal learning has the potential to connect the disparate landscape of AI devices as well as deep learning, and truly power business intelligence and enterprise-wide optimization. Learn more about the exciting features of multimodal learning and its impact on key verticals, in our free whitepaper – Artificial Intelligence Meets Business Intelligence, which is part of ABI Research's AI & Machine Learning service. 

Related Blog Posts

Related Services