Expand description
BlasActor — full cuBLAS surface (Phase 1 cuBLAS slice).
Wraps a [cudarc::cublas::CudaBlas] handle, performs cuBLAS L1/L2/
L3 ops on its assigned stream, and returns completion via the
configured CompletionStrategy (§3.2 stateless-handle archetype +
§5.10 callback wiring).
Sub-modules:
gemm— typedGemm<T>for f32/f64/f16/bf16 (cudarc safe layer) and the legacySgemmRequestadapter that routes throughGemm<f32>for back-compat.gemm_strided_batched— strided-batched gemm via cudarc’s safe layer for f32/f64/f16/bf16; can drop tocrate::sys::cublas::gemm_strided_batched_exif more dtypes are needed in a follow-up.l1— axpy / dot / nrm2 / scal / asum / iamax / iamin / copy / swap / rot via the cuBLAS ex-suffix entry points.l2— gemv / ger via cudarc’sGemv<T>and the localcublasGemv_v2/cublasGer_v2wrappers.l3— geam / syrk / trsm via the localcublasSgeam/cublasSsyrk_v2/cublasStrsm_v2wrappers (and dgeam/dsyrk/ dtrsm).scaling— fp8 scaling-factor helpers (per-tensor / per-row), stubbed under thecublas-fp8feature for use bycublasGemmExon Hopper+.
The mailbox is freed immediately after the kernel is enqueued — the
actor never blocks on the GPU (§5.2). Reply delivery happens on the
Tokio task spawned by crate::kernel::envelope::run_kernel.
Re-exports§
pub use gemm::GemmRequest;pub use gemm_strided_batched::GemmStridedBatchedRequest;pub use l1::AsumRequest;pub use l1::AxpyRequest;pub use l1::CopyRequest;pub use l1::DotRequest;pub use l1::IamaxRequest;pub use l1::IaminRequest;pub use l1::Nrm2Request;pub use l1::RotRequest;pub use l1::ScalRequest;pub use l1::SwapRequest;pub use l2::GemvRequest;pub use l2::GerRequest;pub use l3::GeamRequest;pub use l3::SyrkRequest;pub use l3::TrsmRequest;
Modules§
- gemm
- Typed
GemmRequest<T>+GemmDispatchimpls. - gemm_
strided_ batched - Typed
GemmStridedBatchedRequest<T>+GemmStridedBatchedDispatchimpls. - l1
- Typed L1 ops: axpy, dot, nrm2, scal, asum, iamax, iamin, copy, swap, rot.
- l2
- Typed L2 ops: gemv, ger.
- l3
- Typed L3 ops other than gemm: geam (matrix add/scale), syrk (symmetric rank-k update), trsm (triangular solve).
- scaling
- fp8 scaling-factor helpers.
Structs§
- Blas
Actor - Two-track construction: a real cuBLAS-backed actor (
props), and a mock variant used byexamples/echo_no_gpuand unit tests where no GPU is present.
Enums§
- BlasMsg
- Public messages for
BlasActor. Each variant boxes a typed dispatcher trait object so the dtype dimension travels through the box without forcing an N-fold mailbox explosion.