burn_mamba::mamba3::single_ssd::ssd::ssd_path

Struct Mamba3SingleSsdInput

pub struct Mamba3SingleSsdInput {
    pub v_bnlmhp: Tensor<6>,
    pub b_bnlmhr: Tensor<6>,
    pub c_bnlmhr: Tensor<6>,
    pub da_bnlh: Tensor<4>,
    pub gamma_bnlh: Tensor<4>,
    pub scale_bnlh: Tensor<4>,
    pub initial_state_bhpr: Tensor<4>,
    pub init_state_hpr: Option<Tensor<3>>,
}

Expand description

MIMO-first input bundle for the merged-form SSD.

All tensors are pre-processed by the caller (Mamba3::forward_single_ssd): B/C are already QK-normed, RoPE-applied, bias-added, and expanded to per-head; V is the raw, unscaled MIMO-expanded value. The combined log-decay da = Δ·A is pre-computed. The two trapezoidal coefficients gammaₜ and scaleₜ are supplied separately because the SSD itself does the K-scaling and γ-weighted diagonal correction internally. D-skip and Z-gating are handled by the caller.

Fields§

§v_bnlmhp: Tensor<6>

Value tensor, MIMO-expanded but not trapezoidally scaled.

§Shape

[batch, nchunks, chunk_len, mimo_rank, nheads, per_head_dim]

§b_bnlmhr: Tensor<6>

K/B tensor: QK-normed, RoPE-applied, bias-added, expanded to per-head. Not pre-scaled — the SSD multiplies by scaleₜ internally for the lower-triangular and state-recurrence paths, while the diagonal correction reuses the unscaled tensor.

§Shape

[batch, nchunks, chunk_len, mimo_rank, nheads, state_rank]

§c_bnlmhr: Tensor<6>

Q/C tensor: same processing as b_bnlmhr.

§Shape

[batch, nchunks, chunk_len, mimo_rank, nheads, state_rank]

§da_bnlh: Tensor<4>

Pre-combined log-decay Δ·A (negative).

§Shape

[batch, nchunks, chunk_len, nheads]

§gamma_bnlh: Tensor<4>

γₜ = λₜ · Δₜ — used as the per-token diagonal multiplier.

§Shape

[batch, nchunks, chunk_len, nheads]

§scale_bnlh: Tensor<4>

scaleₜ = γₜ + (1 − λₜ₊₁) · Δₜ₊₁ — K is multiplied by this for the lower-triangular and state recurrence paths. The shifted term is zero at the very last sequence position (no future token exists).

§Shape

[batch, nchunks, chunk_len, nheads]

§initial_state_bhpr: Tensor<4>

Initial SSM hidden state (merged-form accumulator).

When continuing from a prior call, this should already include the boundary β contribution (1 − λ₀) · Δ₀ · Σₘ Kₜ₋₁[m] ⊗ (xₜ₋₁ ⊙ mimo_xₘ) (which the previous call could not yet add because it did not know λ₀, Δ₀).

§Shape

[batch, nheads, per_head_dim, state_rank]

§init_state_hpr: Option<Tensor<3>>

Optional learnable initial state (broadcast over batch).

§Shape

[nheads, per_head_dim, state_rank]

Mamba3SingleSsdInput

Struct Mamba3SingleSsdInput Copy item path

Fields§

§Shape

§Shape

§Shape

§Shape

§Shape

§Shape

§Shape

§Shape

Implementations§

impl Mamba3SingleSsdInput

pub fn single_ssd_minimal(self) -> (Tensor<6>, Tensor<4>)

§Shapes

impl Mamba3SingleSsdInput

pub fn single_ssd_serial(self) -> (Tensor<6>, Tensor<4>)

§Returns

impl Mamba3SingleSsdInput

pub fn single_ssd_serial_recalculated(self) -> (Tensor<6>, Tensor<4>)

§Returns

impl Mamba3SingleSsdInput

pub fn sanity(&self)

impl Mamba3SingleSsdInput

pub fn run(self, path: &Mamba3SsdPath) -> (Tensor<6>, Tensor<4>)

§Returns

Auto Trait Implementations§

impl Freeze for Mamba3SingleSsdInput

impl RefUnwindSafe for Mamba3SingleSsdInput

impl Send for Mamba3SingleSsdInput

impl Sync for Mamba3SingleSsdInput

impl Unpin for Mamba3SingleSsdInput

impl UnsafeUnpin for Mamba3SingleSsdInput

impl UnwindSafe for Mamba3SingleSsdInput

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct Mamba3SingleSsdInput

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,