Module matmul

Expand description

Typed MatmulRequest<T: GemmSupported> plus the BlasLtDispatch impl that routes it through the kernel envelope.

Today’s pre-Phase-1 actor accepted only MatmulConfig + GpuRef<f32>. MatmulRequest<T> widens that to:

any T: GemmSupported (f32 / f64 / f16 / bf16 / fp8),
explicit D output buffer (so fp8 split-k and out-of-place cases work),
the curated Epilogue enum,
optional bias, gelu_aux,
per-tensor / per-row fp8 scale pointers via ScaleSet,
a workspace_size hint folded into the heuristic search.

cudarc 0.19.4’s safe Matmul trait is implemented for f32 and (under feature f16) half::f16 / half::bf16. For dtypes cudarc doesn’t yet wrap (fp8) the dispatch falls through to a typed Err(GpuError::Unrecoverable) until we land the sys-level path — see [dispatch_safe_path] below.

Structs§

MatmulRequest: Typed matmul request. Public surface; instantiated by callers.

Module matmul

Module matmul Copy item path

Structs§

Module matmul