In this issue:
An introduction to Cobra, a multimodal SSM.
A review of the original Cobra research paper.
A walkthrough NVIDIA’s TensorRT-LLM framework.
💡 ML Concept of the Day: Cobra Extends SSMs to Multiple Modalities
The efficiencies of state space models(SSMs) were initially positioned as an alternative to transformer-based LLMs. A constant question in that space is whether SSMs could scale to other modalities. This is the goal of a very novel SSM model known as Cobra( you know, we need to keep the snake names coming 😊 )
In recent years, multimodal large language models (MLLMs) have seen significant advancements in various fields. These models often rely on the well-known Transformer network, which, despite its popularity, suffers from inefficient quadratic computation complexity. To address this inefficiency, Cobra has been introduced as a solution with linear computational complexity. Cobra achieves this by incorporating the efficient Mamba language model into the visual modality.