Skip to main content

Module layer

Module layer 

Source
Expand description

§Mamba-2 Layer and Layer Stack

A Mamba-2 layer is the standard Pre-LN residual block used throughout the network. It wraps a single Mamba2 SSM block with an RMSNorm (applied to the input, before the block) and adds the input back as a residual connection:

  y = x + Mamba2( RMSNorm(x) )

This matches the architecture described in §5 of the Mamba-2 paper and is identical in structure to Pre-LN Transformer layers.

§Virtual layers

Mamba2Layers supports virtual layers: a larger logical depth achieved by cycling through a smaller set of real (weight-bearing) layers according to a Schedule. For example, 48 virtual layers over 12 real layers repeats each weight set 4 times. Each virtual layer still has its own cache (the hidden state evolves independently), but shares the underlying parameters.

§Residual scale

The first and/or last residual connection in the stack can optionally be zeroed out (ignore_first_residual / ignore_last_residual), which is useful when composing Mamba-2 blocks with other module types (e.g. in a hybrid Mamba-2 + attention architecture where neighbouring blocks already carry residuals).

Structs§

Mamba2Layer
A single Mamba-2 residual block:
Mamba2LayerConfig
Configuration / factory for Mamba2Layer.
Mamba2LayerRecord
The record type for the module.
Mamba2LayerRecordItem
The record item type for the module.
Mamba2Layers
A stack of Mamba-2 layers with optional virtual-layer scheduling.
Mamba2LayersConfig
Configuration / factory for Mamba2Layers.
Mamba2LayersRecord
The record type for the module.
Mamba2LayersRecordItem
The record item type for the module.