Expand description
WGMMA (warp-group matrix multiply accumulate) intrinsic shim.
Hopper’s wgmma.mma_async.sync instruction is issued from a
128-thread warpgroup; the host side has nothing to call, but NVRTC
kernels embed the intrinsics through PTX inline assembly. This
module ships a small set of macro shims (in atomr_hopper.cuh) that
pin the asm constraints to the supported (M, N, K, dtype-A, dtype-B, dtype-D) shapes and give Rust callers symbolic names for
the descriptors they have to build host-side.
Only the most common matmul variants are wrapped. Adding a new
variant means adding a new WGMMA_MMA_ASYNC_* macro in
atomr_hopper.cuh and a constant in WgmmaShape.
Enums§
- Wgmma
Shape - Subset of WGMMA matmul shapes commonly exercised by attention /
matmul kernels. The numeric tuple is
(M, N, K)(row × col × inner).