Skip to main content

Module blas

Module blas 

Source
Expand description

BlasActor — full cuBLAS surface (Phase 1 cuBLAS slice).

Wraps a [cudarc::cublas::CudaBlas] handle, performs cuBLAS L1/L2/ L3 ops on its assigned stream, and returns completion via the configured CompletionStrategy (§3.2 stateless-handle archetype + §5.10 callback wiring).

Sub-modules:

  • gemm — typed Gemm<T> for f32/f64/f16/bf16 (cudarc safe layer) and the legacy SgemmRequest adapter that routes through Gemm<f32> for back-compat.
  • gemm_strided_batched — strided-batched gemm via cudarc’s safe layer for f32/f64/f16/bf16; can drop to crate::sys::cublas::gemm_strided_batched_ex if more dtypes are needed in a follow-up.
  • l1 — axpy / dot / nrm2 / scal / asum / iamax / iamin / copy / swap / rot via the cuBLAS ex-suffix entry points.
  • l2 — gemv / ger via cudarc’s Gemv<T> and the local cublasGemv_v2 / cublasGer_v2 wrappers.
  • l3 — geam / syrk / trsm via the local cublasSgeam / cublasSsyrk_v2 / cublasStrsm_v2 wrappers (and dgeam/dsyrk/ dtrsm).
  • scaling — fp8 scaling-factor helpers (per-tensor / per-row), stubbed under the cublas-fp8 feature for use by cublasGemmEx on Hopper+.

The mailbox is freed immediately after the kernel is enqueued — the actor never blocks on the GPU (§5.2). Reply delivery happens on the Tokio task spawned by crate::kernel::envelope::run_kernel.

Re-exports§

pub use gemm::GemmRequest;
pub use gemm_strided_batched::GemmStridedBatchedRequest;
pub use l1::AsumRequest;
pub use l1::AxpyRequest;
pub use l1::CopyRequest;
pub use l1::DotRequest;
pub use l1::IamaxRequest;
pub use l1::IaminRequest;
pub use l1::Nrm2Request;
pub use l1::RotRequest;
pub use l1::ScalRequest;
pub use l1::SwapRequest;
pub use l2::GemvRequest;
pub use l2::GerRequest;
pub use l3::GeamRequest;
pub use l3::SyrkRequest;
pub use l3::TrsmRequest;

Modules§

gemm
Typed GemmRequest<T> + GemmDispatch impls.
gemm_strided_batched
Typed GemmStridedBatchedRequest<T> + GemmStridedBatchedDispatch impls.
l1
Typed L1 ops: axpy, dot, nrm2, scal, asum, iamax, iamin, copy, swap, rot.
l2
Typed L2 ops: gemv, ger.
l3
Typed L3 ops other than gemm: geam (matrix add/scale), syrk (symmetric rank-k update), trsm (triangular solve).
scaling
fp8 scaling-factor helpers.

Structs§

BlasActor
Two-track construction: a real cuBLAS-backed actor (props), and a mock variant used by examples/echo_no_gpu and unit tests where no GPU is present.

Enums§

BlasMsg
Public messages for BlasActor. Each variant boxes a typed dispatcher trait object so the dtype dimension travels through the box without forcing an N-fold mailbox explosion.