Expand description
§Mamba-2 Inference Caches
This module defines the state that must be preserved between calls during
autoregressive (token-by-token) generation. During training or prefill
the full sequence is available at once and the chunked SSD algorithm is used
(see [crate::mamba2::Mamba2::forward]). During decoding the model
processes one token per step and the SSM operates in its pure recurrent
form (see [crate::mamba2::Mamba2::step]):
hₜ = Āₜ hₜ₋₁ + B̄ₜ xₜ (state update)
yₜ = Cₜᵀ hₜ + D xₜ (output)Two pieces of state are required per layer:
-
Convolution cache — the last
conv_kernelinputs to the depthwise Conv1d, kept so that every decoding step can apply the causal filter without re-processing previous tokens. -
SSM hidden state — the matrix
hₜ ∈ ℝ^{P×N}(per head), which compresses the entire past context into a fixed-size representation regardless of how many tokens have been generated. This is the key memory-efficiency advantage of SSMs over attention: the KV-cache of a Transformer grows as O(T·N) with sequence length, whereas the SSM state is always O(P·N).
Structs§
- Mamba2
Cache - The mutable state carried between decoding steps for a single Mamba-2 layer.
- Mamba2
Cache Config - Configuration / factory for a single
Mamba2Cache. - Mamba2
Cache Record - The record type for the module.
- Mamba2
Cache Record Item - The record item type for the module.
- Mamba2
Caches - A collection of per-layer caches for a complete Mamba-2 network.
- Mamba2
Caches Config - Configuration / factory for
Mamba2Caches. - Mamba2
Caches Record - The record type for the module.
- Mamba2
Caches Record Item - The record item type for the module.