Skip to main content

MultiGateResidual

Struct MultiGateResidual 

Source
pub struct MultiGateResidual {
    pub w_beta: Param<Tensor<1>>,
    pub w_alpha: Param<Tensor<1>>,
    pub b_beta: Param<Tensor<1>>,
    pub d_model: usize,
    pub n_stream: usize,
}
Expand description

One layer’s Multi-Gate Residual parameters: the mixer query w⁽ᵝ⁾ + bias b⁽ᵝ⁾, and the aggregator (AttnPool) query w⁽ᵅ⁾.

Fields§

§w_beta: Param<Tensor<1>>

Mixer query w⁽ᵝ⁾ ∈ ℝ^d (the per-stream sigmoid gate), [d_model].

§w_alpha: Param<Tensor<1>>

Aggregator query w⁽ᵅ⁾ ∈ ℝ^d (the AttnPool softmax), [d_model].

§b_beta: Param<Tensor<1>>

Per-stream mixer gate bias b⁽ᵝ⁾, [n_stream].

§d_model: usize

Model width d.

§n_stream: usize

Number of parallel residual streams n.

Implementations§

Source§

impl MultiGateResidual

Source

fn scale(&self) -> f32

Source

fn rms_denom<const D: usize>(&self, x: Tensor<D>) -> Tensor<D>

The parameter-free RMS denominator d(x) ∈ [‥, 1] such that the RMSNorm (matching RmsNorm math with γ ≡ 1) is x / d(x). Returning the denominator rather than the normalised tensor lets Self::normed_score fold it out of the (feature-axis) score reduction, so the full-width normalised tensor is never built. The fp16 path keeps the same overflow-safe max-rescale, folded into the same scalar denominator.

Source

fn normed_score<const R: usize>(&self, x: Tensor<R>, w: Tensor<R>) -> Tensor<R>

The RMSNorm-then-dot score scale · Σ_feat(x · w) / (rms(x)+eps), shape [‥, 1]. The RMS denominator is constant over the feature axis, so it is folded out of the reduction (via Self::rms_denom) — equal to Σ_feat(rms_norm(x) · w) · scale but without materialising the full-width normalised tensor.

Source

fn mix_pool<const R: usize>( &self, layer_output: Tensor<R>, streams: Tensor<R>, ) -> (Tensor<R>, Tensor<R>)

The shared mix + pool, generic over the streams rank R (the stream axis is R-2, the feature axis R-1). Self::forward (R = 4) and Self::step (R = 3) only differ by that rank, so both lift their layer_output to a singleton stream axis, call this, and drop it again. All reductions keep their axis (size 1) for broadcasting, so scores/gates are […, n_stream, 1] throughout.

  • layer_output: F_l lifted to a unit stream axis, […, 1, d_model]
  • streams: the n_stream residual streams, […, n_stream, d_model]

Returns (h, streams') with h still carrying its unit stream axis ([…, 1, d_model]) and streams' the same shape as streams.

Source

pub fn forward( &self, layer_output: Tensor<3>, streams: Tensor<4>, ) -> (Tensor<3>, Tensor<4>)

Full-sequence mix + pool.

  • layer_output: this layer’s transform F_l, [batch, sequence, d_model]
  • streams: the n_stream residual streams, [batch, sequence, n_stream, d_model]

Returns (h, streams'): the pooled input h for the next layer ([batch, sequence, d_model]) and the updated streams (same shape as in).

Source

pub fn step( &self, layer_output: Tensor<2>, streams: Tensor<3>, ) -> (Tensor<2>, Tensor<3>)

Single-token mix + pool (the Self::forward math with the sequence axis dropped).

  • layer_output: [batch, d_model]
  • streams: [batch, n_stream, d_model]

Returns (h, streams'): [batch, d_model] and [batch, n_stream, d_model].

Trait Implementations§

Source§

impl AutodiffModule for MultiGateResidual

Source§

fn valid(&self) -> Self

Returns the same module, but on the inner backend without auto-differentiation.
Source§

fn from_inner(module: Self) -> Self

Wraps an inner module back into an auto-diff module.
Source§

impl Clone for MultiGateResidual

Source§

fn clone(&self) -> Self

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for MultiGateResidual

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Display for MultiGateResidual

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Module for MultiGateResidual

Source§

type Record = MultiGateResidualRecord

Type to save and load the module.
Source§

fn load_record(self, record: Self::Record) -> Self

Load the module state from a record.
Source§

fn into_record(self) -> Self::Record

Convert the module into a record containing the state.
Source§

fn num_params(&self) -> usize

Get the number of parameters the module has, including all of its sub-modules.
Source§

fn visit<Visitor: ModuleVisitor>(&self, visitor: &mut Visitor)

Visit each tensor parameter in the module with a visitor.
Source§

fn map<Mapper: ModuleMapper>(self, mapper: &mut Mapper) -> Self

Map each tensor parameter in the module with a mapper.
Source§

fn collect_devices(&self, devices: Devices) -> Devices

Return all the devices found in the underneath module tree added to the given vector without duplicates.
Source§

fn to_device(self, device: &Device) -> Self

Move the module and all of its sub-modules to the given device. Read more
Source§

fn fork(self, device: &Device) -> Self

Fork the module and all of its sub-modules to the given device. Read more
§

fn devices(&self) -> Vec<Device>

Return all the devices found in the underneath module tree without duplicates.
§

fn no_grad(self) -> Self

Each tensor in the module tree will not require grad. Read more
§

fn train(self) -> Self
where Self: AutodiffModule,

Move the module and all of its sub-modules to the autodiff backend. Read more
§

fn quantize_weights(self, quantizer: &mut Quantizer) -> Self

Quantize the weights of the module.
Source§

impl ModuleDisplay for MultiGateResidual

§

fn format(&self, passed_settings: DisplaySettings) -> String

Formats the module with provided display settings. Read more
§

fn custom_settings(&self) -> Option<DisplaySettings>

Custom display settings for the module. Read more
§

fn custom_content(&self, _content: Content) -> Option<Content>

Custom attributes for the module. Read more
Source§

impl ModuleDisplayDefault for MultiGateResidual

Source§

fn content(&self, content: Content) -> Option<Content>

Attributes of the module used for display purposes. Read more
Source§

fn num_params(&self) -> usize

Gets the number of the parameters of the module.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T> ToString for T
where T: Display + ?Sized,

Source§

fn to_string(&self) -> String

Converts the given value to a String. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.