Expand description
fp8 scale-pointer helpers for cuBLASLt matmul.
cuBLASLt’s fp8 path multiplies each operand by a per-tensor (or
per-row) f32 scale before accumulating. The scales are passed
as device pointers stored on the cublasLtMatmulDesc_t via
the A/B/C/D_SCALE_POINTER attributes.
ScaleSet bundles the four pointers and exposes
ScaleSet::apply which writes them onto a descriptor. We keep
the wrapper small — actual fp8 conversion (e4m3 / e5m2 packing)
lives on the GPU in cuBLASLt itself.
Structs§
- Scale
Set - Bundle of optional scale pointers for cuBLASLt fp8 matmul.
Functions§
- null_
scale_ ptr - Best-effort sentinel used when a caller wants the scale pointer slot occupied but doesn’t actually have a device buffer. Mostly useful for tests; a real fp8 path always supplies device pointers minted by the calling DeviceActor.