Module scaling

Expand description

fp8 scale-pointer helpers for cuBLASLt matmul.

cuBLASLt’s fp8 path multiplies each operand by a per-tensor (or per-row) f32 scale before accumulating. The scales are passed as device pointers stored on the cublasLtMatmulDesc_t via the A/B/C/D_SCALE_POINTER attributes.

ScaleSet bundles the four pointers and exposes ScaleSet::apply which writes them onto a descriptor. We keep the wrapper small — actual fp8 conversion (e4m3 / e5m2 packing) lives on the GPU in cuBLASLt itself.

Structs§

ScaleSet: Bundle of optional scale pointers for cuBLASLt fp8 matmul.

Functions§

null_scale_ptr: Best-effort sentinel used when a caller wants the scale pointer slot occupied but doesn’t actually have a device buffer. Mostly useful for tests; a real fp8 path always supplies device pointers minted by the calling DeviceActor.

Module scaling

Module scaling Copy item path

Structs§

Functions§

Module scaling