Skip to main content

Module gqa

Module gqa 

Source
Expand description

Group→head expansion of B/C (GQA-style sharing). Grouped-Query Attention (GQA) dimension expansion.

Mamba-2 and Mamba-3 produce B and C projections per-group (size ngroups), but the chunkwise SSD algorithms consume them per-head (size nheads). This helper bridges the two by replicating each group’s vector across the heads_per_group = nheads / ngroups heads of that group.

Functions§

gqa_expand_to_heads
Expand a tensor’s ngroups dim at group_dim into an nheads dim, by replicating each group’s slice across heads_per_group = nheads / ngroups heads of that group.