Skip to main content

Module scaling

Module scaling 

Source
Expand description

fp8 scaling-factor helpers.

Hopper+ fp8 cuBLAS calls (cublasGemmEx with CUDA_R_8F_E4M3 / CUDA_R_8F_E5M2 operands) take a per-tensor or per-row scaling factor that brings the input/output values into the representable fp8 range. This module factors out the small bookkeeping helpers used by both cuBLAS and cuBLASLt fp8 paths so they don’t have to be duplicated.

The full fp8 path lights up under the cublas-fp8 cargo feature (currently scaffolded — Phase 1 cuBLAS slice ships the helper types, the wired call site lives in cuBLASLt’s own module).

Structs§

PerRowScale
Per-row scaling factor: a vector of m scalars, one per row of the matrix. Stored device-side; the caller passes a GpuRef<f32> when the cuBLASLt descriptor accepts row-wise amax.
PerTensorScale
Per-tensor scaling factor: a single multiplicative scalar.