pub struct Mamba1NetworkConfig {
pub n_layer: usize,
pub vocab_size: usize,
pub pad_vocab_size_multiple: usize,
pub mamba_block: Mamba1Config,
pub missing_lm_head: bool,
}Fields§
§n_layer: usize§vocab_size: usizeIf vocab_size is divisible by pad_vocab_size_multiple, this should be considered the unpadded vocab size.
Otherwise, this is padded into ((vocab_size / self.pad_vocab_size_multiple) + 1) * pad_vocab_size_multiple.
pad_vocab_size_multiple: usizeIf no pad is required, vocab_size must be divisible by pad_vocab_size_multiple. If pad is required, vocab_size increases until it’s divisible by pad_vocab_size_multiple.
To disable vocab padding, you can set this to 1.
mamba_block: Mamba1Config§missing_lm_head: boolIf set to true, lm_head is set to None and it re-utilizes the transposed embedding weights.
Implementations§
Source§impl Mamba1NetworkConfig
impl Mamba1NetworkConfig
Sourcepub fn new(
n_layer: usize,
vocab_size: usize,
pad_vocab_size_multiple: usize,
mamba_block: Mamba1Config,
missing_lm_head: bool,
) -> Self
pub fn new( n_layer: usize, vocab_size: usize, pad_vocab_size_multiple: usize, mamba_block: Mamba1Config, missing_lm_head: bool, ) -> Self
Create a new instance of the config.
§Arguments
§Required Arguments
§n_layer
§vocab_size
If vocab_size is divisible by pad_vocab_size_multiple, this should be considered the unpadded vocab size.
Otherwise, this is padded into ((vocab_size / self.pad_vocab_size_multiple) + 1) * pad_vocab_size_multiple.
§pad_vocab_size_multiple
If no pad is required, vocab_size must be divisible by pad_vocab_size_multiple. If pad is required, vocab_size increases until it’s divisible by pad_vocab_size_multiple.
To disable vocab padding, you can set this to 1.
§mamba_block
§missing_lm_head
If set to true, lm_head is set to None and it re-utilizes the transposed embedding weights.
Source§impl Mamba1NetworkConfig
impl Mamba1NetworkConfig
Sourcepub fn init<B: Backend>(&self, device: &B::Device) -> Mamba1Network<B>
pub fn init<B: Backend>(&self, device: &B::Device) -> Mamba1Network<B>
Returns the initialized model.