Close Menu
The Financial News 247The Financial News 247
  • Home
  • News
  • Business
  • Finance
  • Companies
  • Investing
  • Markets
  • Lifestyle
  • Tech
  • More
    • Opinion
    • Climate
    • Web Stories
    • Spotlight
    • Press Release
What's On
iPhone 18 Pro Upgrades, Fortnite Returns To The App Store, iPhone Fold Delays

iPhone 18 Pro Upgrades, Fortnite Returns To The App Store, iPhone Fold Delays

May 23, 2026
‘Obsession’ Star Inde Navarrette Discusses Her Spellbinding Performance In Hit Horror Thriller

‘Obsession’ Star Inde Navarrette Discusses Her Spellbinding Performance In Hit Horror Thriller

May 23, 2026
The Rise Of The Multimodal LLM

The Rise Of The Multimodal LLM

May 23, 2026
‘Obsession’ Projected To Drop Only 1% In 2nd Weekend Business At Box Office

‘Obsession’ Projected To Drop Only 1% In 2nd Weekend Business At Box Office

May 23, 2026
Saturday, May 23 Crossword Hints

Saturday, May 23 Crossword Hints

May 23, 2026
Facebook X (Twitter) Instagram
The Financial News 247The Financial News 247
Demo
  • Home
  • News
  • Business
  • Finance
  • Companies
  • Investing
  • Markets
  • Lifestyle
  • Tech
  • More
    • Opinion
    • Climate
    • Web Stories
    • Spotlight
    • Press Release
The Financial News 247The Financial News 247
Home » The Rise Of The Multimodal LLM

The Rise Of The Multimodal LLM

By News RoomMay 23, 2026No Comments5 Mins Read
Facebook Twitter Pinterest LinkedIn WhatsApp Telegram Reddit Email Tumblr
The Rise Of The Multimodal LLM
Share
Facebook Twitter LinkedIn Pinterest Email

There’s a new bit of jargon in the AI world, but it’s more than just a detail. It involves adding a familiar letter to a familiar acronym, and although that may sound glib, catching up might feel a little like déjà vu.

Do a quick conventional search for “LLMM.” You won’t come up with much, unless you check out the AI overviews, where Gemini in Google or Copilot in Bing tells you what this is.

“MLLM” does a bit better – you might find a result from IBM, and some academic papers, and a page from Github. But the idea of the Multimodal Large Language Model, or to some, the Large Language Multimodal Model, hasn’t really made it into the mainstream, to places like CNBC or Newsweek. It’s still sort of the province of the true tech geek – for now.

What is a Multimodal Large Language Model?

The essential concept of a Multimodal Large Language Model is that it works on different kinds of data, although there’s the implication that it does this through specific kinds of design. PhD researcher and engineer Sebastian Raschka defines the MLLM this way on a self-published platform:

“Multimodal LLMs are large language models capable of processing multiple types of inputs, where each ‘modality’ refers to a specific type of data—such as text (like in traditional LLMs), sound, images, videos, and more.”

If you assume that the machines do this by attaining something like a sophisticated form of distillation, you’d be right. But there’s another component to this, too. In some ways, it sounds like engineers are going back to the well of using classical ML techniques to enhance what an LLM, as a central “brain,” can do.

This starts with attaching sensor tools to the LLM itself, to bring that multimodal data in.

“Recent research shows that Multimodal Large Language Models (MLLMs) can be enhanced with sensory gear (e.g., IoT sensors, wearables, cameras) by using visual prompting to ground them in real-world sensor data,” explains a summary of a paper called “By My Eyes” that’s pioneering this kind of research, where authors write:

“We design a visual prompt that directs MLLMs to utilize visualized sensor data alongside the target sensory task descriptions. Additionally, we introduce a visualization generator that automates the creation of optimal visualizations tailored to a given sensory task, eliminating the need for prior task-specific knowledge.”

The Art of Imitation

If the traditional token-based LLM approach imitated human writing by scouring the internet and applying prediction models, the new MLLM/LLMM system is able to, in a sense, learn by seeing. It’s not limited to text as an input, or an output. And it’s interactive.

“From a Human Computer Interaction (HCI) and Human Augmentation (HA) perspective, MLLMs also offer various opportunities,” writes Jun Rekimoto in an article maintained at the Association for Computing Machinery’s Digital Library. “If such models can recognize the world in ways similar to humans, a range of applications becomes possible. These include technologies that can record and understand skilled human actions for transfer to others, assess skill development, recognize real-world behaviors to provide personalized assistance and assist individuals with disabilities by augmenting their sensory perception of the environment.”

That said, there’s a lot that MLLMs can do that bypasses traditional inference. That’s especially true when it comes to real-world tasks involving physics. The developer world pondered, for about a year, how to teach LLMs about physics through text, and then the world realized that you could just equip the LLM to see, and teach it that way.

Terms from the Aughts

Take the term “feature extraction.”

A model, perhaps a convolutional neural network, can look at an image, analyze it, and extract features to classify and identify what’s in view. Now, you can attach that CNN to an LLM which will then process what the CNN sees and identifies. That’s a powerful combination, and it’s feeding a good deal of research into this kind of build.

Suppose you have a ball bouncing through a room and you want the LLM to “follow the ball.” How do you encode all of that information into the neural net? How do you “show” the model what the ball’s trajectory is like based on real-world physics?

Well, it’s a lot easier if the LLM can see.

Some of the experts are also pointing out that such equipped LLMs can know more about relational data from the jump, eliminating repetitive querying. Some sources estimate that the use of these novel models can lead to up to 75% FLOP reduction.

More Techniques

Within the realm of MLLM design, there’s more jargon emerging. For example, there’s the idea of token sparsification or compression. Here’s an explanation from a page at Github:

“Token compression reduces the number of visual tokens processed by MLLMs while preserving critical cross-modal semantics, enabling more efficient training and faster inference without large accuracy regressions. The field is fragmented across encoders, projectors, and LLM-side techniques; a centralized, searchable resource is needed.”

Then there’s structural pruning and knowledge distillation (here’s a paper) in which similar goals apply. Engineers are finding many ways to increase the efficiency of these models. As for attention mechanisms, there’s a lot of work being done on that, too, but maybe that’s another article.

So although it may look a little like roman numerals, the MLLM, as a descendant of the LLM, has a lot of potential. You may indeed hear a lot more about them, this year and in the years to come.

LLMs MLLMs
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related News

iPhone 18 Pro Upgrades, Fortnite Returns To The App Store, iPhone Fold Delays

iPhone 18 Pro Upgrades, Fortnite Returns To The App Store, iPhone Fold Delays

May 23, 2026
Saturday, May 23 Crossword Hints

Saturday, May 23 Crossword Hints

May 23, 2026
Senator Cassidy’s Loss Shows Political Risk of Public Health Leadership

Senator Cassidy’s Loss Shows Political Risk of Public Health Leadership

May 22, 2026
Today’s NYT Strands Hints, Spangram, Answers For Saturday, May 23 (Staying Alive)

Today’s NYT Strands Hints, Spangram, Answers For Saturday, May 23 (Staying Alive)

May 22, 2026
10 Best Star Wars Games To Play In 2026

10 Best Star Wars Games To Play In 2026

May 22, 2026
Ozzy Osbourne’s Family Is Resurrecting Him As An AI Hologram

Ozzy Osbourne’s Family Is Resurrecting Him As An AI Hologram

May 22, 2026
Add A Comment
Leave A Reply Cancel Reply

Don't Miss
‘Obsession’ Star Inde Navarrette Discusses Her Spellbinding Performance In Hit Horror Thriller

‘Obsession’ Star Inde Navarrette Discusses Her Spellbinding Performance In Hit Horror Thriller

News May 23, 2026

Inde Navarrette, the breakout star of Curry Barker’s horror movie sensation Obsession, wants you to…

The Rise Of The Multimodal LLM

The Rise Of The Multimodal LLM

May 23, 2026
‘Obsession’ Projected To Drop Only 1% In 2nd Weekend Business At Box Office

‘Obsession’ Projected To Drop Only 1% In 2nd Weekend Business At Box Office

May 23, 2026
Saturday, May 23 Crossword Hints

Saturday, May 23 Crossword Hints

May 23, 2026
Stay In Touch
  • Facebook
  • Twitter
  • Pinterest
  • Instagram
  • YouTube
  • Vimeo
Our Picks
‘Mandalorian And Grogu’ Projected To Have Lowest Disney ‘Star Wars’ Opening

‘Mandalorian And Grogu’ Projected To Have Lowest Disney ‘Star Wars’ Opening

May 23, 2026
James Comer demands info from Kalshi, Polymarket CEOs on alleged insider trading

James Comer demands info from Kalshi, Polymarket CEOs on alleged insider trading

May 22, 2026
Senator Cassidy’s Loss Shows Political Risk of Public Health Leadership

Senator Cassidy’s Loss Shows Political Risk of Public Health Leadership

May 22, 2026
Drama Desk Awards Reshape Tony Awards Race

Drama Desk Awards Reshape Tony Awards Race

May 22, 2026
The Financial News 247
Facebook X (Twitter) Instagram Pinterest
  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact us
© 2026 The Financial 247. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.