pub struct VocabNetwork<M: Module> {
pub embedding: Embedding,
pub layers: Layers<M>,
pub norm_f: RmsNorm,
pub lm_head: Option<Linear>,
}Expand description
A complete autoregressive language model over a token vocabulary:
Embedding (vocab → d_model) → Layers<M> → norm_f → LM head (d_model → vocab).
This is the token-LM counterpart of LatentNetwork; both are built on the
shared Layers core. The only differences are the I/O boundary (a token
Embedding and a vocab logit head, instead of two latent Linears) and a
final pre-head RmsNorm.
The LM head is tied (lm_head = None, the transposed embedding weight is
reused) or untied (a dedicated Linear); the vocabulary is rounded up to
a multiple for GPU alignment (see VocabNetworkBuilder).
Fields§
§embedding: EmbeddingToken embedding table, weight shape [padded_vocab, d_model].
layers: Layers<M>The shared Mamba-x layer stack.
norm_f: RmsNormFinal RMSNorm applied before the LM head (norm_f).
lm_head: Option<Linear>Optional dedicated LM head. None ⇒ weight-tied (reuse embeddingᵀ).
Implementations§
Source§impl<M: MambaBlock> VocabNetwork<M>
impl<M: MambaBlock> VocabNetwork<M>
Sourcepub fn forward(
&self,
x: Tensor<2, Int>,
caches: Option<M::Caches>,
ssd_path: M::SsdPath,
) -> (Tensor<3>, M::Caches)
pub fn forward( &self, x: Tensor<2, Int>, caches: Option<M::Caches>, ssd_path: M::SsdPath, ) -> (Tensor<3>, M::Caches)
Full-sequence pass: token IDs [batch, sequence] → logits
[batch, sequence, padded_vocab].
Sourcepub fn step(
&self,
x: Tensor<1, Int>,
caches: Option<M::Caches>,
layers_own_index: Option<&mut usize>,
layer_indices: Option<&mut Vec<usize>>,
) -> (Tensor<2>, M::Caches)
pub fn step( &self, x: Tensor<1, Int>, caches: Option<M::Caches>, layers_own_index: Option<&mut usize>, layer_indices: Option<&mut Vec<usize>>, ) -> (Tensor<2>, M::Caches)
Single-token step: token IDs [batch] → logits [batch, padded_vocab].
The vocab network has no class tokens of its own (those would duplicate
the layers’ class latents); it simply forwards the inner Layers
cursors — layers_own_index (stack-level latents) and layer_indices
(per-virtual-layer) — to Layers::step.
Sourcepub fn step_infinite(&self, x: Tensor<1, Int>) -> Tensor<2>
pub fn step_infinite(&self, x: Tensor<1, Int>) -> Tensor<2>
Stationary fixed point of the LM under a constant token: logits
[batch, padded_vocab] after infinitely many repeats of x, no caches
(see Layers::step_infinite).
Sourcepub fn step_n_approx(
&self,
x: Tensor<1, Int>,
n: usize,
caches: Option<M::Caches>,
) -> (Tensor<2>, M::Caches)
pub fn step_n_approx( &self, x: Tensor<1, Int>, n: usize, caches: Option<M::Caches>, ) -> (Tensor<2>, M::Caches)
Approximate jump of n consecutive Self::step calls on the same
constant token — see Layers::step_n_approx for the approximation
contract.
Sourcefn apply_lm_head(&self, x: Tensor<3>) -> Tensor<3>
fn apply_lm_head(&self, x: Tensor<3>) -> Tensor<3>
Project [batch, sequence, d_model] → [batch, sequence, padded_vocab]
using the dedicated head, or the tied (transposed embedding) weight.