Expand description
SiLU activation (fp16-aware).
SiLU (a.k.a. swish) activation: silu(x) = x · sigmoid(x).
Implemented as x / (1 + exp(−x)), which is fp16-aware (no separate
sigmoid op) and used for the gating branches throughout the Mamba blocks.
Structs§
- Silu
- SiLU activation module:
silu(x) = x · sigmoid(x) = x / (1 + exp(−x)).