State-space models (SSMs) have emerged as a potential alternative architecture for building large language models (LLMs) compared to the previously ubiquitous transformer architecture. One theoretical weakness of transformers is that they cannot express certain kinds of sequential computation and state tracking (Merrill & Sabharwal, 2023), which SSMs are explicitly designed to address via their close architectural similarity to recurrent neural networks (RNNs). But do SSMs truly have an advantage (over transformers) in expressive power for state tracking? Surprisingly, the answer is no. Our analysis reveals that the expressive power of SSMs is limited very similarly to transformers: SSMs cannot express computation outside the complexity class $mathsf{TC}^0$. In particular, this means they cannot solve simple state-tracking problems like permutation composition. It follows that SSMs are provably unable to accurately track chess moves with certain notation, evaluate code, or track entities in a long narrative. To supplement our formal analysis, we report experiments showing that Mamba-style SSMs indeed struggle with state tracking. Thus, despite its recurrent formulation, the state in an SSM is an illusion: SSMs have similar expressiveness limitations to non-recurrent models like transformers, which may fundamentally limit their ability to solve real-world state-tracking problems.

## Overview

- Examines the concept of "state" in state-space models, a widely used framework in machine learning and control theory
- Argues that the notion of "state" in these models is often an illusion, and the models may be better characterized as "history-based" rather than "state-based"
- Provides a new perspective on the foundations of state-space models and their limitations

## Plain English Explanation

State-space models are a popular tool in machine learning and control systems, used to represent and analyze dynamic systems. These models assume the existence of a hidden "state" that captures all the relevant information about the system at a given time. The state is then used to predict the future behavior of the system.

However, the paper presented here argues that the idea of "state" in these models is often an illusion. The authors suggest that state-space models may be better understood as "history-based" models, where the current output of the system depends on its entire past history, rather than a single, well-defined state. 

This perspective challenges the traditional view of state-space models and offers a new way of thinking about their foundations. By recognizing the limitations of the state concept, the research can lead to the development of more accurate and robust modeling techniques, with potential applications in fields like [machine learning](https://aimodels.fyi/papers/arxiv/mambabyte-token-free-selective-state-space-model), [event-based sensing](https://aimodels.fyi/papers/arxiv/state-space-models-event-cameras), and [video processing](https://aimodels.fyi/papers/arxiv/ssm-meets-video-diffusion-models-efficient-video).

## Technical Explanation

The paper begins by outlining the standard architecture of state-space models, which typically consist of a state transition equation and an observation equation. The state transition equation describes how the hidden state evolves over time, while the observation equation relates the state to the observed outputs of the system.

The authors then argue that the notion of "state" in these models is often an illusion. They demonstrate that the state at any given time can be fully determined by the system's entire past history, rather than a single, well-defined state. This suggests that state-space models may be better characterized as "history-based" models, where the current output depends on the system's entire past, rather than a single state.

The paper presents several examples and theoretical analyses to support this perspective, including [a discussion of linear state-space models](https://aimodels.fyi/papers/arxiv/mambaad-exploring-state-space-models-multi-class) and a comparison to other modeling frameworks, such as [event-based sensing](https://aimodels.fyi/papers/arxiv/state-space-models-event-cameras) and [video diffusion models](https://aimodels.fyi/papers/arxiv/ssm-meets-video-diffusion-models-efficient-video). The authors also explore the implications of this view for the design and interpretation of state-space models.

## Critical Analysis

The paper raises some valid concerns about the foundations of state-space models and the potential limitations of the state concept. By challenging the traditional view of these models, the authors encourage readers to think critically about the assumptions and limitations of state-space modeling.

However, the paper does not provide a comprehensive solution or alternative to state-space models. While the "history-based" perspective offers a new way of thinking about these models, it is not clear how this insight can be directly applied to practical modeling and analysis tasks.

Additionally, the paper does not address some of the well-established strengths and applications of state-space models, such as their ability to handle uncertainty, their connections to Kalman filtering and control theory, and their widespread use in areas like [machine learning](https://aimodels.fyi/papers/arxiv/mambabyte-token-free-selective-state-space-model) and [signal processing](https://aimodels.fyi/papers/arxiv/state-space-model-new-generation-network-alternative). Further research may be needed to fully assess the implications of the authors' perspective and its potential impact on the field.

## Conclusion

This paper challenges the conventional wisdom surrounding state-space models by arguing that the notion of "state" is often an illusion. The authors propose a "history-based" view of these models, suggesting that the current output may be better characterized by the system's entire past, rather than a single, well-defined state.

While this perspective offers a new way of thinking about the foundations of state-space models, it also raises questions about the practical implications and limitations of this view. The paper encourages critical thinking about the assumptions and interpretations of these widely used models, which may lead to the development of more accurate and robust modeling techniques in the future.