Expand description
§Mamba-3 Layer and Layer Stack
A Mamba-3 layer is the standard Pre-LN residual block used throughout
the network. It wraps a single Mamba3 SSM block with an RMSNorm
(applied to the input, before the block) and adds the input back as a
residual connection:
y = x + Mamba3( RMSNorm(x) )This matches the architecture described in §5 of the Mamba-2 paper and is identical in structure to Pre-LN Transformer layers.
§Virtual layers
Mamba3Layers supports virtual layers: a larger logical depth achieved
by cycling through a smaller set of real (weight-bearing) layers
according to a Schedule. For example, 48 virtual layers over 12 real
layers repeats each weight set 4 times. Each virtual layer still has its
own cache (the hidden state evolves independently), but shares the
underlying parameters.
§Residual scale
The first and/or last residual connection in the stack can optionally be
zeroed out (ignore_first_residual / ignore_last_residual), which is
useful when composing Mamba-3 blocks with other module types (e.g. in a
hybrid Mamba-3 + attention architecture where neighbouring blocks already
carry residuals).
Structs§
- Mamba3
Layer - A single Mamba-3 residual block:
- Mamba3
Layer Config - Configuration / factory for
Mamba3Layer. - Mamba3
Layer Record - The record type for the module.
- Mamba3
Layer Record Item - The record item type for the module.
- Mamba3
Layers - A stack of Mamba-3 layers with optional virtual-layer scheduling.
- Mamba3
Layers Config - Configuration / factory for
Mamba3Layers. - Mamba3
Layers Record - The record type for the module.
- Mamba3
Layers Record Item - The record item type for the module.