Edge 423: Understanding the SSM Fundamental Equation
Some of the foundations of SSMs plus an exploration of a classic model.
In this issue:
An introduction to the fundamental equation of SSM.
A deep dive into the S4 SSM model.
An overview of the NVIDIA NIM framework.
💡 ML Concept of the Day: Understanding the SSM Fundamental Equation
In the previous issue of this newsletter, we introduced the concept of state space models(SSMs) as the most viable alternative to transformers. Fundamentally, SSMs are a variation of sequence models that try to map discrete/continuous sequential representations like tokens in a prompt to another discreate/continuous representation. The precursor of SSMs is known as Linear State Space Layer(LSSL) which maps a sequence u → y by simply simulating a linear continuous-time state-space representation:
x = Ax + Bu, y = Cx+Du
Although effective, LSSLs struggle to scale in high scale computations. SSMs improve upon LSSLs by introducing matrices to solve the problem of continuous memorization. From that perspective, an SSM can be seen as the following equation in which A, B, C and D are known as state variables: