Edge 394: Not Just Transformers: Jamba is New LLM that Brings the Best of SSMs, Transformers, and MoEs in a Single Architecture
Jamba addresses some of the limitations of transformers with a novel architecture paradigms.
Transformer architectures have been the dominant paradigm in LLMs leading to exceptional advancements in research and development. The question of whether transformers will be the final architecture to reach AGI versus the real possibility of new architecture paradigm has been a passionate topic of debate in the AI community. Recently, researchers from Princeton University and Carnegie Mellon proposed the Mamba architecture based on state space models(SSMs) which has become the most viable alternative to transformers.
Instead of thinking about SSMs vs. transformers, could we try to combine the two? This is the thesis behind a new model called Jamba released by the ambitious team at AI21 Labs. Jamba combines transformers and SSMs in a single architecture that could open new avenues for the future of LLMs.
The Problem
Until this point, the creation of LLMs has largely hinged on the use of traditional Transformer structures, known for their robust capabilities. However, these structures have two significant limitations: