Expand description
Multi-head attention (cudnnFusedAttnFwd/cudnnFusedAttnBwd)
request types.
Routes through the v9 frontend OPERATION_MATMUL_DESCRIPTOR +
softmax + dropout fusion path. Supports causal masking, sliding
window, paged-KV (skeleton), MQA / GQA via head-count split.
Structs§
- Attention
Params - Attention parameters.
- Multi
Head Attn BwdRequest - MHA backward request.
- Multi
Head Attn FwdRequest - MHA forward request.
Enums§
- Attention
Mask - Mask kind applied to the attention scores.