Expand description
Recompute-based gradient math (the memory-efficient backward).
§Recompute-based gradient math for the Mamba-3 single-SSD
The analytic backward of the single-pass MIMO-first scan. Forward
intermediates (K1–K4) are recomputed from the saved leaf inputs, then a
reverse per-chunk loop fuses the K5 state-to-output (BLUE), the strict
lower-triangular intra-chunk (LOWER), and the K4 state-passing backwards; the
γ-weighted same-step (DIAG) term is computed batched (no recurrence, tiny
m × m tensors). Because this pathway applies the trapezoid weights
internally, it additionally returns d_gamma and d_scale. The shared K3
extended helper (and K1/K2/K4) are reused from the double-SSD module.
Everything operates on backend primitives through the rank-tagged [F]
wrapper: the custom Backward node
runs with a generic backend B, so the high-level Tensor is unavailable
and the math uses B’s float_* ops.
Structs§
- Combined
Single SsdGrads - Per-input gradients produced by
combined_backwardfor the Single-SSD. Addsd_gamma_bnlhandd_scale_bnlhover the double-ssd formcrate::mamba3::double_ssd::ssd::serial_recalculated::combined_backward::CombinedGrads.
Functions§
- combined_
backward - Memory-efficient backward for the Mamba-3 MIMO-first chunkwise Single-SSD.