pub fn k1_ssd_chunk_cumsum<B: Backend>(
dt_discretized_bhnl: Tensor<B, 4>,
a_decay_h: Tensor<B, 1>,
) -> (Tensor<B, 4>, Tensor<B, 3>)Expand description
Based on the Kernel 1 Triton reference _chunk_cumsum_fwd_kernel (ssd_chunk_state.py).
Returns:
- da_cumsum_bhnl [used in K3+K5][*] - intra-chunk cumsum.
- da_chunk_end_bhn [used in K4][omitted][*] - last da_cumsum per chunk.