Expand description
Naive implementation where the Mamba3 block is not adapted.
Two independent layers are executed as a bidi pair, where the input flip-split happens before the layer normalization, and they are merged (by a ) after the block output, before the layer-pair skip connection.