Skip to main content

MultiGateResidual

burn_mamba::modules::multi_gate

Struct MultiGateResidual

pub struct MultiGateResidual {
    pub w_beta: Param<Tensor<1>>,
    pub w_alpha: Param<Tensor<1>>,
    pub b_beta: Param<Tensor<1>>,
    pub d_model: usize,
    pub n_stream: usize,
}

Expand description

One layer’s Multi-Gate Residual parameters: the mixer query w⁽ᵝ⁾ + bias b⁽ᵝ⁾, and the aggregator (AttnPool) query w⁽ᵅ⁾.

Fields§

§w_beta: Param<Tensor<1>>

Mixer query w⁽ᵝ⁾ ∈ ℝ^d (the per-stream sigmoid gate), [d_model].

§w_alpha: Param<Tensor<1>>

Aggregator query w⁽ᵅ⁾ ∈ ℝ^d (the AttnPool softmax), [d_model].

§b_beta: Param<Tensor<1>>

Per-stream mixer gate bias b⁽ᵝ⁾, [n_stream].

§d_model: usize

Model width d.

§n_stream: usize

Number of parallel residual streams n.

Implementations§

impl MultiGateResidual

fn scale(&self) -> f32

fn rms_denom<const D: usize>(&self, x: Tensor<D>) -> Tensor<D>

The parameter-free RMS denominator d(x) ∈ [‥, 1] such that the RMSNorm (matching RmsNorm math with γ ≡ 1) is x / d(x). Returning the denominator rather than the normalised tensor lets Self::normed_score fold it out of the (feature-axis) score reduction, so the full-width normalised tensor is never built. The fp16 path keeps the same overflow-safe max-rescale, folded into the same scalar denominator.

fn normed_score<const R: usize>(&self, x: Tensor<R>, w: Tensor<R>) -> Tensor<R>

The RMSNorm-then-dot score scale · Σ_feat(x · w) / (rms(x)+eps), shape [‥, 1]. The RMS denominator is constant over the feature axis, so it is folded out of the reduction (via Self::rms_denom) — equal to Σ_feat(rms_norm(x) · w) · scale but without materialising the full-width normalised tensor.

fn mix_pool<const R: usize>( &self, layer_output: Tensor<R>, streams: Tensor<R>, ) -> (Tensor<R>, Tensor<R>)

The shared mix + pool, generic over the streams rank R (the stream axis is R-2, the feature axis R-1). Self::forward (R = 4) and Self::step (R = 3) only differ by that rank, so both lift their layer_output to a singleton stream axis, call this, and drop it again. All reductions keep their axis (size 1) for broadcasting, so scores/gates are […, n_stream, 1] throughout.

layer_output: F_l lifted to a unit stream axis, […, 1, d_model]
streams: the n_stream residual streams, […, n_stream, d_model]

Returns (h, streams') with h still carrying its unit stream axis ([…, 1, d_model]) and streams' the same shape as streams.

pub fn forward( &self, layer_output: Tensor<3>, streams: Tensor<4>, ) -> (Tensor<3>, Tensor<4>)

Full-sequence mix + pool.

layer_output: this layer’s transform F_l, [batch, sequence, d_model]
streams: the n_stream residual streams, [batch, sequence, n_stream, d_model]

Returns (h, streams'): the pooled input h for the next layer ([batch, sequence, d_model]) and the updated streams (same shape as in).

pub fn step( &self, layer_output: Tensor<2>, streams: Tensor<3>, ) -> (Tensor<2>, Tensor<3>)

Single-token mix + pool (the Self::forward math with the sequence axis dropped).

layer_output: [batch, d_model]
streams: [batch, n_stream, d_model]

Returns (h, streams'): [batch, d_model] and [batch, n_stream, d_model].

Trait Implementations§

impl AutodiffModule for MultiGateResidual

fn valid(&self) -> Self

Returns the same module, but on the inner backend without auto-differentiation.

fn from_inner(module: Self) -> Self

Wraps an inner module back into an auto-diff module.

impl Clone for MultiGateResidual

fn clone(&self) -> Self

Returns a duplicate of the value. Read more

1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

impl Debug for MultiGateResidual

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

impl Display for MultiGateResidual

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

impl Module for MultiGateResidual

type Record = MultiGateResidualRecord

Type to save and load the module.

fn load_record(self, record: Self::Record) -> Self

Load the module state from a record.

fn into_record(self) -> Self::Record

Convert the module into a record containing the state.

fn num_params(&self) -> usize

Get the number of parameters the module has, including all of its sub-modules.

fn visit<Visitor: ModuleVisitor>(&self, visitor: &mut Visitor)

Visit each tensor parameter in the module with a visitor.

fn map<Mapper: ModuleMapper>(self, mapper: &mut Mapper) -> Self

Map each tensor parameter in the module with a mapper.

fn collect_devices(&self, devices: Devices) -> Devices

Return all the devices found in the underneath module tree added to the given vector without duplicates.

fn to_device(self, device: &Device) -> Self

Move the module and all of its sub-modules to the given device. Read more

fn fork(self, device: &Device) -> Self

Fork the module and all of its sub-modules to the given device. Read more

fn devices(&self) -> Vec<Device>

Return all the devices found in the underneath module tree without duplicates.

fn no_grad(self) -> Self

Each tensor in the module tree will not require grad. Read more

fn train(self) -> Self
where Self: AutodiffModule,

Move the module and all of its sub-modules to the autodiff backend. Read more

fn quantize_weights(self, quantizer: &mut Quantizer) -> Self

Quantize the weights of the module.

impl ModuleDisplay for MultiGateResidual

fn format(&self, passed_settings: DisplaySettings) -> String

Formats the module with provided display settings. Read more

fn custom_settings(&self) -> Option<DisplaySettings>

Custom display settings for the module. Read more

fn custom_content(&self, _content: Content) -> Option<Content>

Custom attributes for the module. Read more

impl ModuleDisplayDefault for MultiGateResidual

fn content(&self, content: Content) -> Option<Content>

Attributes of the module used for display purposes. Read more

fn num_params(&self) -> usize

Gets the number of the parameters of the module.

Auto Trait Implementations§

impl !Freeze for MultiGateResidual

impl !RefUnwindSafe for MultiGateResidual

impl !UnwindSafe for MultiGateResidual

impl Send for MultiGateResidual

impl Sync for MultiGateResidual

impl Unpin for MultiGateResidual

impl UnsafeUnpin for MultiGateResidual

Blanket Implementations§

impl<T> Any for T
where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

impl<T> Borrow<T> for T
where T: ?Sized,

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

impl<T> BorrowMut<T> for T
where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

impl<T> CloneToUninit for T
where T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

impl<T> From<T> for T

fn from(t: T) -> T

Returns the argument unchanged.

impl<T, U> Into<U> for T
where U: From<T>,

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

impl<T> ToOwned for T
where T: Clone,

type Owned = T

The resulting type after obtaining ownership.

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more

impl<T> ToString for T
where T: Display + ?Sized,

fn to_string(&self) -> String

Converts the given value to a String. Read more

impl<T, U> TryFrom<U> for T
where U: Into<T>,

type Error = Infallible

The type returned in the event of a conversion error.

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.