pub struct Mamba2Layer<B: Backend> {
pub norm: RmsNorm<B>,
pub mamba_block: Mamba2<B>,
}Expand description
A single Mamba-2 residual block:
output = x·scale + Mamba2( RMSNorm(x) )where scale is 1.0 normally and 0.0 when the residual connection is
intentionally suppressed by the layer stack configuration.
Fields§
§norm: RmsNorm<B>Pre-norm applied to the input before the SSM block.
Using RMSNorm before the block (Pre-LN) is standard practice in modern LLMs and improves training stability.
mamba_block: Mamba2<B>The Mamba-2 SSM block (see Mamba2).
Implementations§
Source§impl<B: Backend + Mamba2BackendExt> Mamba2Layer<B>
impl<B: Backend + Mamba2BackendExt> Mamba2Layer<B>
Sourcepub fn forward(
&self,
x: Tensor<B, 3>,
cache: Option<Mamba2Cache<B>>,
ssd_path: Mamba2SsdPath,
residual_scale: f32,
) -> (Tensor<B, 3>, Mamba2Cache<B>)
pub fn forward( &self, x: Tensor<B, 3>, cache: Option<Mamba2Cache<B>>, ssd_path: Mamba2SsdPath, residual_scale: f32, ) -> (Tensor<B, 3>, Mamba2Cache<B>)
Run the Pre-LN residual block over a full sequence.
Computes:
output = x · residual_scale + Mamba2( RMSNorm(x) )§Shapes
x:[batch, sequence, d_model]- output :
[batch, sequence, d_model]
Sourcepub fn step(
&self,
x: Tensor<B, 2>,
cache: Option<Mamba2Cache<B>>,
residual_scale: f32,
) -> (Tensor<B, 2>, Mamba2Cache<B>)
pub fn step( &self, x: Tensor<B, 2>, cache: Option<Mamba2Cache<B>>, residual_scale: f32, ) -> (Tensor<B, 2>, Mamba2Cache<B>)
Run the Pre-LN residual block for a single decoding step.
Computes:
output = x · residual_scale + Mamba2.step( RMSNorm(x) )§Shapes
x:[batch, d_model]- output:
[batch, d_model]
Trait Implementations§
Source§impl<B> AutodiffModule<B> for Mamba2Layer<B>where
B: AutodiffBackend + Backend,
<B as AutodiffBackend>::InnerBackend: Backend,
impl<B> AutodiffModule<B> for Mamba2Layer<B>where
B: AutodiffBackend + Backend,
<B as AutodiffBackend>::InnerBackend: Backend,
Source§type InnerModule = Mamba2Layer<<B as AutodiffBackend>::InnerBackend>
type InnerModule = Mamba2Layer<<B as AutodiffBackend>::InnerBackend>
Inner module without auto-differentiation.
Source§fn valid(&self) -> Self::InnerModule
fn valid(&self) -> Self::InnerModule
Returns the same module, but on the inner backend without auto-differentiation.
Source§fn from_inner(module: Self::InnerModule) -> Self
fn from_inner(module: Self::InnerModule) -> Self
Wraps an inner module back into an auto-diff module.
Source§impl<B: Backend> Clone for Mamba2Layer<B>
impl<B: Backend> Clone for Mamba2Layer<B>
Source§impl<B: Debug + Backend> Debug for Mamba2Layer<B>
impl<B: Debug + Backend> Debug for Mamba2Layer<B>
Source§impl<B: Backend> Display for Mamba2Layer<B>
impl<B: Backend> Display for Mamba2Layer<B>
Source§impl<B> HasAutodiffModule<B> for Mamba2Layer<B::InnerBackend>where
B: AutodiffBackend + Backend,
<B as AutodiffBackend>::InnerBackend: Backend,
impl<B> HasAutodiffModule<B> for Mamba2Layer<B::InnerBackend>where
B: AutodiffBackend + Backend,
<B as AutodiffBackend>::InnerBackend: Backend,
Source§type TrainModule = Mamba2Layer<B>
type TrainModule = Mamba2Layer<B>
The module with auto-differentiation.
Source§impl<B: Backend> Module<B> for Mamba2Layer<B>
impl<B: Backend> Module<B> for Mamba2Layer<B>
Source§type Record = Mamba2LayerRecord<B>
type Record = Mamba2LayerRecord<B>
Type to save and load the module.
Source§fn load_record(self, record: Self::Record) -> Self
fn load_record(self, record: Self::Record) -> Self
Load the module state from a record.
Source§fn into_record(self) -> Self::Record
fn into_record(self) -> Self::Record
Convert the module into a record containing the state.
Source§fn num_params(&self) -> usize
fn num_params(&self) -> usize
Get the number of parameters the module has, including all of its sub-modules.
Source§fn visit<Visitor: ModuleVisitor<B>>(&self, visitor: &mut Visitor)
fn visit<Visitor: ModuleVisitor<B>>(&self, visitor: &mut Visitor)
Visit each tensor parameter in the module with a visitor.
Source§fn map<Mapper: ModuleMapper<B>>(self, mapper: &mut Mapper) -> Self
fn map<Mapper: ModuleMapper<B>>(self, mapper: &mut Mapper) -> Self
Map each tensor parameter in the module with a mapper.
Source§fn collect_devices(&self, devices: Devices<B>) -> Devices<B>
fn collect_devices(&self, devices: Devices<B>) -> Devices<B>
Return all the devices found in the underneath module tree added to the given vector
without duplicates.
Source§fn to_device(self, device: &B::Device) -> Self
fn to_device(self, device: &B::Device) -> Self
Move the module and all of its sub-modules to the given device. Read more
Source§fn fork(self, device: &B::Device) -> Self
fn fork(self, device: &B::Device) -> Self
Fork the module and all of its sub-modules to the given device. Read more
§fn devices(&self) -> Vec<<B as BackendTypes>::Device>
fn devices(&self) -> Vec<<B as BackendTypes>::Device>
Return all the devices found in the underneath module tree without duplicates.
§fn train<AB>(self) -> Self::TrainModulewhere
AB: AutodiffBackend<InnerBackend = B>,
Self: HasAutodiffModule<AB>,
fn train<AB>(self) -> Self::TrainModulewhere
AB: AutodiffBackend<InnerBackend = B>,
Self: HasAutodiffModule<AB>,
Move the module and all of its sub-modules to the autodiff backend. Read more
§fn quantize_weights(self, quantizer: &mut Quantizer) -> Self
fn quantize_weights(self, quantizer: &mut Quantizer) -> Self
Quantize the weights of the module.
Source§impl<B: Backend> ModuleDisplay for Mamba2Layer<B>
impl<B: Backend> ModuleDisplay for Mamba2Layer<B>
§fn format(&self, passed_settings: DisplaySettings) -> String
fn format(&self, passed_settings: DisplaySettings) -> String
Formats the module with provided display settings. Read more
§fn custom_settings(&self) -> Option<DisplaySettings>
fn custom_settings(&self) -> Option<DisplaySettings>
Custom display settings for the module. Read more
§fn custom_content(&self, _content: Content) -> Option<Content>
fn custom_content(&self, _content: Content) -> Option<Content>
Custom attributes for the module. Read more
Auto Trait Implementations§
impl<B> !Freeze for Mamba2Layer<B>
impl<B> !RefUnwindSafe for Mamba2Layer<B>
impl<B> Send for Mamba2Layer<B>
impl<B> Sync for Mamba2Layer<B>
impl<B> Unpin for Mamba2Layer<B>
impl<B> UnsafeUnpin for Mamba2Layer<B>where
<B as BackendTypes>::Device: UnsafeUnpin,
<B as BackendTypes>::FloatTensorPrimitive: UnsafeUnpin,
<B as BackendTypes>::QuantizedTensorPrimitive: UnsafeUnpin,
impl<B> !UnwindSafe for Mamba2Layer<B>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more