Expand description
Typed MatmulRequest<T: GemmSupported> plus the BlasLtDispatch
impl that routes it through the kernel envelope.
Today’s pre-Phase-1 actor accepted only MatmulConfig + GpuRef<f32>.
MatmulRequest<T> widens that to:
- any
T: GemmSupported(f32 / f64 / f16 / bf16 / fp8), - explicit
Doutput buffer (so fp8 split-k and out-of-place cases work), - the curated
Epilogueenum, - optional
bias,gelu_aux, - per-tensor / per-row fp8 scale pointers via
ScaleSet, - a
workspace_sizehint folded into the heuristic search.
cudarc 0.19.4’s safe Matmul trait is implemented for f32 and
(under feature f16) half::f16 / half::bf16. For dtypes
cudarc doesn’t yet wrap (fp8) the dispatch falls through to a
typed Err(GpuError::Unrecoverable) until we land the sys-level
path — see [dispatch_safe_path] below.
Structs§
- Matmul
Request - Typed matmul request. Public surface; instantiated by callers.