Skip to main content

Module layer

Module layer 

Source
Expand description

§Mamba-3 Layer and Layer Stack

A Mamba-3 layer is the standard Pre-LN residual block used throughout the network. It wraps a single Mamba3 SSM block with an RMSNorm (applied to the input, before the block) and adds the input back as a residual connection:

  y = x + Mamba3( RMSNorm(x) )

This matches the architecture described in §5 of the Mamba-2 paper and is identical in structure to Pre-LN Transformer layers.

§Virtual layers

Mamba3Layers supports virtual layers: a larger logical depth achieved by cycling through a smaller set of real (weight-bearing) layers according to a Schedule. For example, 48 virtual layers over 12 real layers repeats each weight set 4 times. Each virtual layer still has its own cache (the hidden state evolves independently), but shares the underlying parameters.

§Residual scale

The first and/or last residual connection in the stack can optionally be zeroed out (ignore_first_residual / ignore_last_residual), which is useful when composing Mamba-3 blocks with other module types (e.g. in a hybrid Mamba-3 + attention architecture where neighbouring blocks already carry residuals).

Structs§

Mamba3Layer
A single Mamba-3 residual block:
Mamba3LayerConfig
Configuration / factory for Mamba3Layer.
Mamba3LayerRecord
The record type for the module.
Mamba3LayerRecordItem
The record item type for the module.
Mamba3Layers
A stack of Mamba-3 layers with optional virtual-layer scheduling.
Mamba3LayersConfig
Configuration / factory for Mamba3Layers.
Mamba3LayersRecord
The record type for the module.
Mamba3LayersRecordItem
The record item type for the module.