Edge 431: Meet the Multimodal State Space Models

Extending SSMs behind language.

Sep 17, 2024

∙ Paid

In this issue:

An introduction to Cobra, a multimodal SSM.
A review of the original Cobra research paper.
A walkthrough NVIDIA’s TensorRT-LLM framework.

💡 ML Concept of the Day: Cobra Extends SSMs to Multiple Modalities

The efficiencies of state space models(SSMs) were initially positioned as an alternative to transformer-based LLMs. A constant question in that space is whether SSMs could scale to other modalities. This is the goal of a very novel SSM model known as Cobra( you know, we need to keep the snake names coming 😊 )

In recent years, multimodal large language models (MLLMs) have seen significant advancements in various fields. These models often rely on the well-known Transformer network, which, despite its popularity, suffers from inefficient quadratic computation complexity. To address this inefficiency, Cobra has been introduced as a solution with linear computational complexity. Cobra achieves this by incorporating the efficient Mamba language model into the visual modality.

TheSequence

Edge 431: Meet the Multimodal State Space Models

Extending SSMs behind language.

In this issue:

💡 ML Concept of the Day: Cobra Extends SSMs to Multiple Modalities

This post is for paid subscribers