Expand description
BlasLtActor — wraps [cudarc::cublaslt::CudaBlasLT] for
transformer-shaped fused matmul (matmul + bias + activation +
aux-store + bias-grad reduction) across the full dtype matrix
cuBLASLt accepts.
See epilogue for the curated Epilogue enum, heuristic
for the algorithm cache, workspace for the workspace pool,
scaling for the fp8 scale-pointer wiring, and matmul for
the typed MatmulRequest<T> plus its BlasLtDispatch impl.
Re-exports§
pub use epilogue::Epilogue;pub use heuristic::HeuristicCacheRef;pub use heuristic::HeuristicEntry;pub use heuristic::HeuristicKey;pub use heuristic::DEFAULT_HEURISTIC_CAPACITY;pub use matmul::MatmulRequest;pub use scaling::ScaleSet;pub use workspace::WorkspaceLease;pub use workspace::WorkspacePool;
Modules§
- epilogue
Epilogueenum — atomr-accel’s curated mapping over cuBLASLt’scublasLtEpilogue_t.- heuristic
- Heuristic-cache for cuBLASLt matmul algorithms.
- matmul
- Typed
MatmulRequest<T: GemmSupported>plus theBlasLtDispatchimpl that routes it through the kernel envelope. - scaling
- fp8 scale-pointer helpers for cuBLASLt matmul.
- workspace
- cuBLASLt workspace pool — recycles per-heuristic device buffers.
Structs§
Enums§
- Activation
- Available activation for kernel fusing in matmul
- Blas
LtMsg - Public message surface.