Module scaling

Expand description

fp8 scaling-factor helpers.

Hopper+ fp8 cuBLAS calls (cublasGemmEx with CUDA_R_8F_E4M3 / CUDA_R_8F_E5M2 operands) take a per-tensor or per-row scaling factor that brings the input/output values into the representable fp8 range. This module factors out the small bookkeeping helpers used by both cuBLAS and cuBLASLt fp8 paths so they don’t have to be duplicated.

The full fp8 path lights up under the cublas-fp8 cargo feature (currently scaffolded — Phase 1 cuBLAS slice ships the helper types, the wired call site lives in cuBLASLt’s own module).

Structs§

PerRowScale: Per-row scaling factor: a vector of m scalars, one per row of the matrix. Stored device-side; the caller passes a GpuRef<f32> when the cuBLASLt descriptor accepts row-wise amax.
PerTensorScale: Per-tensor scaling factor: a single multiplicative scalar.

Module scaling

Module scaling Copy item path

Structs§

Module scaling