Skip to main content

Mamba3Network

burn_mamba::mamba3::network

Struct Mamba3Network

pub struct Mamba3Network<B: Backend> {
    pub embedding: Embedding<B>,
    pub layers: Mamba3Layers<B>,
    pub norm_f: RmsNorm<B>,
    pub lm_head: Option<Linear<B>>,
}

Expand description

A complete Mamba-3 language model.

See the module-level documentation for an overview of the architecture and the two execution modes.

Fields§

§embedding: Embedding<B>

Token embedding table.

Shape of weight matrix: [padded_vocab_size, d_model]. Maps integer token IDs to d_model-dimensional vectors.

§layers: Mamba3Layers<B>

The stack of Mamba-3 residual blocks.

§norm_f: RmsNorm<B>

Final layer normalisation applied after all Mamba-3 blocks and before the LM head. This is the norm_f in the original implementation.

§lm_head: Option<Linear<B>>

Optional separate LM head projection.

Some(linear) — dedicated weight matrix of shape [d_model, padded_vocab_size].
None — the embedding weights are reused (transposed). This is the “weight-tied” variant and is selected when missing_lm_head = true.

Implementations§

impl<B: Backend + Mamba3BackendExt> Mamba3Network<B>

pub fn forward( &self, x: Tensor<B, 2, Int>, caches: Option<Mamba3Caches<B>>, ssd_path: Mamba3SsdPath, ) -> (Tensor<B, 3>, Mamba3Caches<B>)

Process a full token sequence and return next-token logits.

Internally this calls Mamba3Layers::forward, which runs the chunkwise SSD algorithm over every layer. This is the mode to use during training (backpropagation through the entire sequence) and during the prefill phase of inference.

§Arguments

x — integer token IDs, shape [batch, sequence]
caches — optional pre-filled layer caches. Pass None to start from a zero state (training) or to create fresh caches that can be returned and reused for a subsequent decoding step.
ssd_path — SSD algorithm and chunk length selection.

§Returns

(logits, caches) where:

logits has shape [batch, sequence, padded_vocab_size]
caches contains the SSM and convolution state at the end of the sequence, ready to be passed to the first Self::step call.

pub fn step( &self, x: Tensor<B, 1, Int>, caches: Option<Mamba3Caches<B>>, ) -> (Tensor<B, 2>, Mamba3Caches<B>)

Process a single token and return next-token logits.

Internally this calls Mamba3Layers::step, which advances each layer’s recurrent state by one step:

  hₜ = Āₜ hₜ₋₁ + B̄ₜ xₜ
  yₜ = Cₜᵀ hₜ + D xₜ

This is O(H·P·N) per token — independent of sequence length — and is the correct mode for token-by-token generation after prefill.

§Arguments

x — current token IDs, shape [batch]
caches — layer caches from the previous step (or None for the very first token, which starts from a zero hidden state)

§Returns

(logits, caches) where:

logits has shape [batch, padded_vocab_size]
caches contains the updated state for the next step.

Trait Implementations§

impl<B> AutodiffModule<B> for Mamba3Network<B>
where B: AutodiffBackend + Backend, <B as AutodiffBackend>::InnerBackend: Backend,

type InnerModule = Mamba3Network<<B as AutodiffBackend>::InnerBackend>

Inner module without auto-differentiation.

fn valid(&self) -> Self::InnerModule

Returns the same module, but on the inner backend without auto-differentiation.

fn from_inner(module: Self::InnerModule) -> Self

Wraps an inner module back into an auto-diff module.

impl<B: Backend> Clone for Mamba3Network<B>

fn clone(&self) -> Self

Returns a duplicate of the value. Read more

1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

impl<B: Debug + Backend> Debug for Mamba3Network<B>

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

impl<B: Backend> Display for Mamba3Network<B>

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

impl<B> HasAutodiffModule<B> for Mamba3Network<B::InnerBackend>
where B: AutodiffBackend + Backend, <B as AutodiffBackend>::InnerBackend: Backend,

type TrainModule = Mamba3Network<B>

The module with auto-differentiation.

impl<B: Backend> Module<B> for Mamba3Network<B>

type Record = Mamba3NetworkRecord<B>

Type to save and load the module.

fn load_record(self, record: Self::Record) -> Self

Load the module state from a record.

fn into_record(self) -> Self::Record

Convert the module into a record containing the state.

fn num_params(&self) -> usize

Get the number of parameters the module has, including all of its sub-modules.

fn visit<Visitor: ModuleVisitor<B>>(&self, visitor: &mut Visitor)

Visit each tensor parameter in the module with a visitor.

fn map<Mapper: ModuleMapper<B>>(self, mapper: &mut Mapper) -> Self

Map each tensor parameter in the module with a mapper.

fn collect_devices(&self, devices: Devices<B>) -> Devices<B>

Return all the devices found in the underneath module tree added to the given vector without duplicates.

fn to_device(self, device: &B::Device) -> Self

Move the module and all of its sub-modules to the given device. Read more

fn fork(self, device: &B::Device) -> Self

Fork the module and all of its sub-modules to the given device. Read more

fn devices(&self) -> Vec<<B as BackendTypes>::Device>

Return all the devices found in the underneath module tree without duplicates.

fn no_grad(self) -> Self

Each tensor in the module tree will not require grad. Read more

fn train<AB>(self) -> Self::TrainModule
where AB: AutodiffBackend<InnerBackend = B>, Self: HasAutodiffModule<AB>,

Move the module and all of its sub-modules to the autodiff backend. Read more

fn quantize_weights(self, quantizer: &mut Quantizer) -> Self

Quantize the weights of the module.

impl<B: Backend> ModuleDisplay for Mamba3Network<B>

fn format(&self, passed_settings: DisplaySettings) -> String

Formats the module with provided display settings. Read more

fn custom_settings(&self) -> Option<DisplaySettings>

Custom display settings for the module. Read more

fn custom_content(&self, _content: Content) -> Option<Content>

Custom attributes for the module. Read more

impl<B: Backend> ModuleDisplayDefault for Mamba3Network<B>

fn content(&self, content: Content) -> Option<Content>

Attributes of the module used for display purposes. Read more

fn num_params(&self) -> usize

Gets the number of the parameters of the module.

Auto Trait Implementations§

impl<B> !Freeze for Mamba3Network<B>

impl<B> !RefUnwindSafe for Mamba3Network<B>

impl<B> Send for Mamba3Network<B>

impl<B> Sync for Mamba3Network<B>

impl<B> Unpin for Mamba3Network<B>
where <B as BackendTypes>::Device: Unpin, <B as BackendTypes>::FloatTensorPrimitive: Unpin, <B as BackendTypes>::QuantizedTensorPrimitive: Unpin,

impl<B> UnsafeUnpin for Mamba3Network<B>
where <B as BackendTypes>::Device: UnsafeUnpin, <B as BackendTypes>::FloatTensorPrimitive: UnsafeUnpin, <B as BackendTypes>::QuantizedTensorPrimitive: UnsafeUnpin,

impl<B> !UnwindSafe for Mamba3Network<B>

Blanket Implementations§

impl<T> Any for T
where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

impl<T> Borrow<T> for T
where T: ?Sized,

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

impl<T> BorrowMut<T> for T
where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

impl<T> CloneToUninit for T
where T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

impl<T> From<T> for T

fn from(t: T) -> T

Returns the argument unchanged.

impl<T, U> Into<U> for T
where U: From<T>,

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

impl<T> ToOwned for T
where T: Clone,

type Owned = T

The resulting type after obtaining ownership.

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more

impl<T> ToString for T
where T: Display + ?Sized,

fn to_string(&self) -> String

Converts the given value to a String. Read more

impl<T, U> TryFrom<U> for T
where U: Into<T>,

type Error = Infallible

The type returned in the event of a conversion error.

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.