Skip to main content

Module matmul

Module matmul 

Source
Expand description

Typed MatmulRequest<T: GemmSupported> plus the BlasLtDispatch impl that routes it through the kernel envelope.

Today’s pre-Phase-1 actor accepted only MatmulConfig + GpuRef<f32>. MatmulRequest<T> widens that to:

  • any T: GemmSupported (f32 / f64 / f16 / bf16 / fp8),
  • explicit D output buffer (so fp8 split-k and out-of-place cases work),
  • the curated Epilogue enum,
  • optional bias, gelu_aux,
  • per-tensor / per-row fp8 scale pointers via ScaleSet,
  • a workspace_size hint folded into the heuristic search.

cudarc 0.19.4’s safe Matmul trait is implemented for f32 and (under feature f16) half::f16 / half::bf16. For dtypes cudarc doesn’t yet wrap (fp8) the dispatch falls through to a typed Err(GpuError::Unrecoverable) until we land the sys-level path — see [dispatch_safe_path] below.

Structs§

MatmulRequest
Typed matmul request. Public surface; instantiated by callers.