Edge 439: SSMs with Attention, Understanding Zamba
Combining the best of SSMs and transformers in a single architecture.
In this issue:
Combining SSMs and Attention layers.
The original Zamba paper.
The Litserve framework for high performance model serving.
💡 ML Concept of the Day: SSMs with Attention, Understanding Zamba
In this series, we have explored state space models(SSMs) as an alternative to transformer architectures. However, there is a lot of merit in building hybrid architectures that combine aspects of transformers and SSMs. AI21’s Jamba is an example of this architecture which have gotten some attention. One of the models that outlined this idea is Zamba created by Zyphra, one of the most innovative AI labs in the market.
Zamba's new architecture blends Mamba blocks with a globally shared attention layer applied after every six Mamba blocks. This hybrid setup enhances Zamba's ability to learn long-range dependencies and perform in-context learning more efficiently than traditional Mamba models, all while reducing computational demands during both training and inference compared to standard transformer models.