Skip to main content

Module cache

Module cache 

Source
Expand description

§Mamba-2 Inference Caches

This module defines the state that must be preserved between calls during autoregressive (token-by-token) generation. During training or prefill the full sequence is available at once and the chunked SSD algorithm is used (see [crate::mamba2::Mamba2::forward]). During decoding the model processes one token per step and the SSM operates in its pure recurrent form (see [crate::mamba2::Mamba2::step]):

  hₜ = Āₜ hₜ₋₁ + B̄ₜ xₜ        (state update)
  yₜ = Cₜᵀ hₜ + D xₜ            (output)

Two pieces of state are required per layer:

  1. Convolution cache — the last conv_kernel inputs to the depthwise Conv1d, kept so that every decoding step can apply the causal filter without re-processing previous tokens.

  2. SSM hidden state — the matrix hₜ ∈ ℝ^{P×N} (per head), which compresses the entire past context into a fixed-size representation regardless of how many tokens have been generated. This is the key memory-efficiency advantage of SSMs over attention: the KV-cache of a Transformer grows as O(T·N) with sequence length, whereas the SSM state is always O(P·N).

Structs§

Mamba2Cache
The mutable state carried between decoding steps for a single Mamba-2 layer.
Mamba2CacheConfig
Configuration / factory for a single Mamba2Cache.
Mamba2CacheRecord
The record type for the module.
Mamba2CacheRecordItem
The record item type for the module.
Mamba2Caches
A collection of per-layer caches for a complete Mamba-2 network.
Mamba2CachesConfig
Configuration / factory for Mamba2Caches.
Mamba2CachesRecord
The record type for the module.
Mamba2CachesRecordItem
The record item type for the module.