Skip to main content

Module blas_lt

Module blas_lt 

Source
Expand description

BlasLtActor — wraps [cudarc::cublaslt::CudaBlasLT] for transformer-shaped fused matmul (matmul + bias + activation + aux-store + bias-grad reduction) across the full dtype matrix cuBLASLt accepts.

See epilogue for the curated Epilogue enum, heuristic for the algorithm cache, workspace for the workspace pool, scaling for the fp8 scale-pointer wiring, and matmul for the typed MatmulRequest<T> plus its BlasLtDispatch impl.

Re-exports§

pub use epilogue::Epilogue;
pub use heuristic::HeuristicCacheRef;
pub use heuristic::HeuristicEntry;
pub use heuristic::HeuristicKey;
pub use heuristic::DEFAULT_HEURISTIC_CAPACITY;
pub use matmul::MatmulRequest;
pub use scaling::ScaleSet;
pub use workspace::WorkspaceLease;
pub use workspace::WorkspacePool;

Modules§

epilogue
Epilogue enum — atomr-accel’s curated mapping over cuBLASLt’s cublasLtEpilogue_t.
heuristic
Heuristic-cache for cuBLASLt matmul algorithms.
matmul
Typed MatmulRequest<T: GemmSupported> plus the BlasLtDispatch impl that routes it through the kernel envelope.
scaling
fp8 scale-pointer helpers for cuBLASLt matmul.
workspace
cuBLASLt workspace pool — recycles per-heuristic device buffers.

Structs§

BlasLtActor

Enums§

Activation
Available activation for kernel fusing in matmul
BlasLtMsg
Public message surface.