Expand description
§Mamba-2 Layer and Layer Stack
A Mamba-2 layer is the standard Pre-LN residual block used throughout
the network. It wraps a single Mamba2 SSM block with an RMSNorm
(applied to the input, before the block) and adds the input back as a
residual connection:
y = x + Mamba2( RMSNorm(x) )This matches the architecture described in §5 of the Mamba-2 paper and is identical in structure to Pre-LN Transformer layers.
§Virtual layers
Mamba2Layers supports virtual layers: a larger logical depth achieved
by cycling through a smaller set of real (weight-bearing) layers
according to a Schedule. For example, 48 virtual layers over 12 real
layers repeats each weight set 4 times. Each virtual layer still has its
own cache (the hidden state evolves independently), but shares the
underlying parameters.
§Residual scale
The first and/or last residual connection in the stack can optionally be
zeroed out (ignore_first_residual / ignore_last_residual), which is
useful when composing Mamba-2 blocks with other module types (e.g. in a
hybrid Mamba-2 + attention architecture where neighbouring blocks already
carry residuals).
Structs§
- Mamba2
Layer - A single Mamba-2 residual block:
- Mamba2
Layer Config - Configuration / factory for
Mamba2Layer. - Mamba2
Layer Record - The record type for the module.
- Mamba2
Layer Record Item - The record item type for the module.
- Mamba2
Layers - A stack of Mamba-2 layers with optional virtual-layer scheduling.
- Mamba2
Layers Config - Configuration / factory for
Mamba2Layers. - Mamba2
Layers Record - The record type for the module.
- Mamba2
Layers Record Item - The record item type for the module.