pub struct Mamba2NetworkConfig {
pub n_real_layers: usize,
pub n_virtual_layers: Option<(usize, Schedule)>,
pub vocab_size: usize,
pub pad_vocab_size_multiple: usize,
pub mamba_block: Mamba2Config,
pub missing_lm_head: bool,
}Expand description
Configuration / factory for Mamba2Network.
Fields§
§n_real_layers: usizeNumber of real (weight-bearing) Mamba-2 layers.
n_virtual_layers: Option<(usize, Schedule)>Optional virtual-layer scheduling. See Mamba2Layers for details.
vocab_size: usizeThe unpadded vocabulary size as specified by the tokenizer.
At initialisation this value is rounded up to the nearest multiple of
pad_vocab_size_multiple to obtain the actual embedding / logit
dimension padded_vocab_size.
pad_vocab_size_multiple: usizeVocabulary size will be rounded up to a multiple of this value.
Set to 1 to disable rounding. Common values: 8, 16, 64.
mamba_block: Mamba2ConfigConfiguration shared by all Mamba-2 blocks.
missing_lm_head: boolWhen true, the LM head weight is not allocated separately; instead
the transposed embedding matrix is used directly (weight tying).
Implementations§
Source§impl Mamba2NetworkConfig
impl Mamba2NetworkConfig
Sourcepub fn new(
n_real_layers: usize,
vocab_size: usize,
pad_vocab_size_multiple: usize,
mamba_block: Mamba2Config,
missing_lm_head: bool,
) -> Self
pub fn new( n_real_layers: usize, vocab_size: usize, pad_vocab_size_multiple: usize, mamba_block: Mamba2Config, missing_lm_head: bool, ) -> Self
Create a new instance of the config.
§Arguments
§Required Arguments
§n_real_layers
Number of real (weight-bearing) Mamba-2 layers.
§vocab_size
The unpadded vocabulary size as specified by the tokenizer.
At initialisation this value is rounded up to the nearest multiple of
pad_vocab_size_multiple to obtain the actual embedding / logit
dimension padded_vocab_size.
§pad_vocab_size_multiple
Vocabulary size will be rounded up to a multiple of this value.
Set to 1 to disable rounding. Common values: 8, 16, 64.
§mamba_block
Configuration shared by all Mamba-2 blocks.
§missing_lm_head
When true, the LM head weight is not allocated separately; instead
the transposed embedding matrix is used directly (weight tying).
§Default Arguments
§n_virtual_layers
Optional virtual-layer scheduling. See Mamba2Layers for details.
- Defaults to
"None"
Source§impl Mamba2NetworkConfig
impl Mamba2NetworkConfig
Sourcepub fn with_n_virtual_layers(
self,
n_virtual_layers: Option<(usize, Schedule)>,
) -> Self
pub fn with_n_virtual_layers( self, n_virtual_layers: Option<(usize, Schedule)>, ) -> Self
Sets the value for the field n_virtual_layers.
Optional virtual-layer scheduling. See Mamba2Layers for details.
- Defaults to
"None"
Source§impl Mamba2NetworkConfig
impl Mamba2NetworkConfig
Sourcepub fn init<B: Backend>(&self, device: &B::Device) -> Mamba2Network<B>
pub fn init<B: Backend>(&self, device: &B::Device) -> Mamba2Network<B>
Allocate and initialise the full network on device.