pub enum Mamba3SsdPath {
Minimal(Option<usize>),
Serial(Option<usize>),
SerialRecalculated(Option<usize>),
}Expand description
Algorithm selection for the Mamba-3 chunkwise SSD.
This selects the chunkwise SSD algorithm. The pathway (double- vs
single-ssd) is selected separately, by the supplied cache variant (see
crate::mamba3::cache::Mamba3Caches); Mamba3::forward threads this
same selection into whichever pathway the cache implies, converting it into
the per-pathway input bundle (crate::mamba3::double_ssd::ssd::Mamba3DoubleSsdInput
or crate::mamba3::single_ssd::ssd::Mamba3SingleSsdInput) and calling
that bundle’s run.
Each variant carries an optional chunk length. Larger values increase the
intra-chunk GEMM work and reduce the inter-chunk scan length; the optimal
value is approximately √(state_rank · per_head_dim) (see
Self::optimal_chunk_len). None falls back to that optimal value.
If no path is specified, the cache defaults to
crate::mamba3::cache::Mamba3Caches::SingleSsd with Self::default
(i.e. Self::SerialRecalculated with an unset chunk length).
Variants§
Minimal(Option<usize>)
Minimal/segsum SSD: mostly batched matmuls; backward via autodiff.
See crate::mamba3::double_ssd::ssd::Mamba3DoubleSsdInput::double_ssd_minimal
/ crate::mamba3::single_ssd::ssd::Mamba3SingleSsdInput::single_ssd_minimal.
For training, prefer Self::SerialRecalculated.
Serial(Option<usize>)
(Hybrid) serial SSD: a serial loop over the chunks plus batched matmuls; backward via autodiff.
See crate::mamba3::double_ssd::ssd::Mamba3DoubleSsdInput::double_ssd_serial
/ crate::mamba3::single_ssd::ssd::Mamba3SingleSsdInput::single_ssd_serial.
For a memory-saving custom backward, see Self::SerialRecalculated.
SerialRecalculated(Option<usize>)
(Hybrid) serial SSD with a custom, memory-efficient backward that recomputes the forward intermediates instead of storing them.
See crate::mamba3::double_ssd::ssd::Mamba3DoubleSsdInput::double_ssd_serial_recalculated
/ crate::mamba3::single_ssd::ssd::Mamba3SingleSsdInput::single_ssd_serial_recalculated.
For a plain autodiff backward, see Self::Serial.
Implementations§
Source§impl Mamba3SsdPath
impl Mamba3SsdPath
Sourcepub fn optimal_chunk_len(state_rank: usize, per_head_dim: usize) -> usize
pub fn optimal_chunk_len(state_rank: usize, per_head_dim: usize) -> usize
Optimal chunk length, approximately √(state_rank · per_head_dim),
rounded up to a multiple of 32 and capped at 512.
Sourcepub fn chunk_len_or_optimal(
&self,
state_rank: usize,
per_head_dim: usize,
) -> usize
pub fn chunk_len_or_optimal( &self, state_rank: usize, per_head_dim: usize, ) -> usize
The chunk length carried by this variant, or Self::optimal_chunk_len
when unset.
Sourcepub fn default_optimal_from_block(block: &Mamba3) -> Self
pub fn default_optimal_from_block(block: &Mamba3) -> Self
The recommended default path for a given block: Self::SerialRecalculated
with Self::optimal_chunk_len for the block’s dimensions.
Trait Implementations§
Source§impl Clone for Mamba3SsdPath
impl Clone for Mamba3SsdPath
Source§fn clone(&self) -> Mamba3SsdPath
fn clone(&self) -> Mamba3SsdPath
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more