TheSequence

TheSequence

Share this post

TheSequence
TheSequence
Edge 431: Meet the Multimodal State Space Models

Edge 431: Meet the Multimodal State Space Models

Extending SSMs behind language.

Sep 17, 2024
∙ Paid
8

Share this post

TheSequence
TheSequence
Edge 431: Meet the Multimodal State Space Models
2
Share
Created Using Ideogram

In this issue:

  • An introduction to Cobra, a multimodal SSM.

  • A review of the original Cobra research paper.

  • A walkthrough NVIDIA’s TensorRT-LLM framework.

💡 ML Concept of the Day: Cobra Extends SSMs to Multiple Modalities

The efficiencies of state space models(SSMs) were initially positioned as an alternative to transformer-based LLMs. A constant question in that space is whether SSMs could scale to other modalities. This is the goal of a very novel SSM  model known as Cobra( you know, we need to keep the snake names coming 😊 )

In recent years, multimodal large language models (MLLMs) have seen significant advancements in various fields. These models often rely on the well-known Transformer network, which, despite its popularity, suffers from inefficient quadratic computation complexity. To address this inefficiency, Cobra has been introduced as a solution with linear computational complexity. Cobra achieves this by incorporating the efficient Mamba language model into the visual modality.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Jesus Rodriguez
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share