Expand description
RMSNorm followed by a SiLU(z) gate (Mamba-2 output norm). RMS normalisation fused with a SiLU(z) gate — the Mamba-2 output norm.
norm_before_gate selects the order of the two operations:
true— normalise, then gate:y = (x / rms(x) · γ) · SiLU(z)false— gate, then normalise:y = rms(x · SiLU(z)) · γapplied tox · SiLU(z)
The numerical-stability epsilon is the per-dtype [div_eps] (so there is no
configurable epsilon); the fp16 path uses the same max(|x|)-rescaling
trick as RmsNorm.
Structs§
- RmsNorm
Gated - Applies Gated Rms Normalization over an input tensor along the last dimension.
- RmsNorm
Gated Config - Configuration to create a
RmsNormGatedlayer. - RmsNorm
Gated Record - The record type for the module.
- RmsNorm
Gated Record Item - The record item type for the module.