Module blas_lt

Expand description

BlasLtActor — wraps [cudarc::cublaslt::CudaBlasLT] for transformer-shaped fused matmul (matmul + bias + activation + aux-store + bias-grad reduction) across the full dtype matrix cuBLASLt accepts.

See epilogue for the curated Epilogue enum, heuristic for the algorithm cache, workspace for the workspace pool, scaling for the fp8 scale-pointer wiring, and matmul for the typed MatmulRequest<T> plus its BlasLtDispatch impl.

Re-exports§

pub use epilogue::Epilogue;
pub use heuristic::HeuristicCacheRef;
pub use heuristic::HeuristicEntry;
pub use heuristic::HeuristicKey;
pub use heuristic::DEFAULT_HEURISTIC_CAPACITY;
pub use matmul::MatmulRequest;
pub use scaling::ScaleSet;
pub use workspace::WorkspaceLease;
pub use workspace::WorkspacePool;

Modules§

epilogue: Epilogue enum — atomr-accel’s curated mapping over cuBLASLt’s cublasLtEpilogue_t.
heuristic: Heuristic-cache for cuBLASLt matmul algorithms.
matmul: Typed MatmulRequest<T: GemmSupported> plus the BlasLtDispatch impl that routes it through the kernel envelope.
scaling: fp8 scale-pointer helpers for cuBLASLt matmul.
workspace: cuBLASLt workspace pool — recycles per-heuristic device buffers.

Structs§

BlasLtActor

Enums§

Activation: Available activation for kernel fusing in matmul
BlasLtMsg: Public message surface.

Module blas_lt

Module blas_lt Copy item path

Re-exports§

Modules§

Structs§

Enums§

Module blas_lt