Skip to main content

Module attention

Module attention 

Source
Expand description

Multi-head attention (cudnnFusedAttnFwd/cudnnFusedAttnBwd) request types.

Routes through the v9 frontend OPERATION_MATMUL_DESCRIPTOR + softmax + dropout fusion path. Supports causal masking, sliding window, paged-KV (skeleton), MQA / GQA via head-count split.

Structs§

AttentionParams
Attention parameters.
MultiHeadAttnBwdRequest
MHA backward request.
MultiHeadAttnFwdRequest
MHA forward request.

Enums§

AttentionMask
Mask kind applied to the attention scores.

Functions§

build_mha_bwd_graph
build_mha_fwd_graph