TheSequence

TheSequence

Share this post

TheSequence
TheSequence
Edge 437: Inside BlackMamba, One of the Most Important SSM Models Ever Created
Copy link
Facebook
Email
Notes
More

Edge 437: Inside BlackMamba, One of the Most Important SSM Models Ever Created

The model combines SSMs, MoEs in a single architecture.

Oct 08, 2024
∙ Paid
17

Share this post

TheSequence
TheSequence
Edge 437: Inside BlackMamba, One of the Most Important SSM Models Ever Created
Copy link
Facebook
Email
Notes
More
1
Share
Created Using Ideogram

In this issue:

  1. An introduction BlackMamba that combines MoEs and SSMs in a single architecture.

  2. A review of the BlackMamba paper.

  3. An overview of Princeton’s University SWE-Bench for software engineering task.

A small self-serving note before we start 😉:

For the past year, I’ve been working on several ideas in AI evaluation and benchmarking—an area that, as many of you know, presents a massive challenge in today’s AI landscape. After experimenting with various approaches, I decided to incubate LayerLens, a new AI company focused on streamlining the evaluation and benchmarking of foundation models. This marks my third venture-backed AI project in the last 18 months. We've assembled a phenomenal team, with experience at companies like Google, Microsoft, and Cisco, as well as top universities. We’ve also raised a sizable pre-seed round. More details about that in the next few weeks.

We are currently hiring across the board, particularly for roles in AI research and engineering with a focus on benchmarking and evaluation. If you’re interested in this space and looking for a new challenge, feel free to reach out to me at jr@layerlens.ai. I look forward to hearing from some of you!

💡 ML Concept of the Day: BlackMamba Combines MoEs and SSMs in a Single Architecture

The combination of mixture-of-experts(MoEs) and transformers in an incredible popular architecture choice in generative AI. Can we do the same with state space models(SSMs) ?

BlackMamba introduces a new approach to processing long sequences and handling diverse AI tasks. It merges the strengths of SSMs MoE frameworks. Specifically, BlackMamba builds on the Mamba SSM, known for its efficiency in managing long sequences, and incorporates MoE strategies to streamline performance while addressing scalability.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Jesus Rodriguez
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More