pub enum MambaBidiLayersConfig {
Mamba1 {
n_real_layers: usize,
n_virtual_layers: Option<(usize, BidiSchedule)>,
mamba_block: Mamba1Config,
ignore_first_residual: bool,
ignore_last_residual: bool,
outputs_merge: Vec<OutputMergeConfig>,
class_latents: Vec<ClassLatent>,
residuals: ResidualsConfig,
},
Mamba2 {
n_real_layers: usize,
n_virtual_layers: Option<(usize, BidiSchedule)>,
mamba_block: Mamba2Config,
ignore_first_residual: bool,
ignore_last_residual: bool,
outputs_merge: Vec<OutputMergeConfig>,
class_latents: Vec<ClassLatent>,
residuals: ResidualsConfig,
},
Mamba3 {
n_real_layers: usize,
n_virtual_layers: Option<(usize, BidiSchedule)>,
mamba_block: Mamba3Config,
ignore_first_residual: bool,
ignore_last_residual: bool,
outputs_merge: Vec<OutputMergeConfig>,
class_latents: Vec<ClassLatent>,
residuals: ResidualsConfig,
},
}Expand description
The serializable config for MambaBidiLayers. Each variant is concrete
(per-family), so #[derive(Config)] applies; init builds the matching
stack variant.
Variants§
Mamba1
Build a Mamba-1 bidirectional stack.
Fields
n_virtual_layers: Option<(usize, BidiSchedule)>mamba_block: Mamba1ConfigShared block config.
outputs_merge: Vec<OutputMergeConfig>One merge config per pair, length n_real_layers / 2.
class_latents: Vec<ClassLatent>Stack-level class latents, spliced into the sequence before the
first pair (e.g. a Middle summary latent in place of mean-pooling).
residuals: ResidualsConfigInter-pair residual scheme (plain additive vs Multi-Gate).
Mamba2
Build a Mamba-2 bidirectional stack.
Fields
n_virtual_layers: Option<(usize, BidiSchedule)>mamba_block: Mamba2ConfigShared block config.
outputs_merge: Vec<OutputMergeConfig>One merge config per pair, length n_real_layers / 2.
class_latents: Vec<ClassLatent>Stack-level class latents, spliced into the sequence before the
first pair (e.g. a Middle summary latent in place of mean-pooling).
residuals: ResidualsConfigInter-pair residual scheme (plain additive vs Multi-Gate).
Mamba3
Build a Mamba-3 bidirectional stack.
Fields
n_virtual_layers: Option<(usize, BidiSchedule)>mamba_block: Mamba3ConfigShared block config.
outputs_merge: Vec<OutputMergeConfig>One merge config per pair, length n_real_layers / 2.
class_latents: Vec<ClassLatent>Stack-level class latents, spliced into the sequence before the
first pair (e.g. a Middle summary latent in place of mean-pooling).
residuals: ResidualsConfigInter-pair residual scheme (plain additive vs Multi-Gate).
Implementations§
Source§impl MambaBidiLayersConfig
impl MambaBidiLayersConfig
Sourcepub fn init(&self, device: &Device) -> MambaBidiLayers
pub fn init(&self, device: &Device) -> MambaBidiLayers
Allocate and initialise the selected bidirectional stack on device.