Skip to main content

Module attention

atomr_accel_cuda::kernel::cudnn

Module attention

Expand description

Multi-head attention (cudnnFusedAttnFwd/cudnnFusedAttnBwd) request types.

Routes through the v9 frontend OPERATION_MATMUL_DESCRIPTOR + softmax + dropout fusion path. Supports causal masking, sliding window, paged-KV (skeleton), MQA / GQA via head-count split.

Structs§

AttentionParams: Attention parameters.
MultiHeadAttnBwdRequest: MHA backward request.
MultiHeadAttnFwdRequest: MHA forward request.

Enums§

AttentionMask: Mask kind applied to the attention scores.

Functions§

build_mha_bwd_graph
build_mha_fwd_graph