Module workspace

Expand description

cuBLASLt workspace pool — recycles per-heuristic device buffers.

cuBLASLt’s cublasLtMatmul takes an opaque workspace device pointer + a workspaceSizeInBytes. The size depends on the selected algorithm; the heuristic cache reports a per-algorithm workspaceSize and we want to avoid allocating a fresh slab on every call. This pool buckets free slabs by rounded-up size class (next power of two ≥ requested) and hands them out under a WorkspaceLease RAII guard that returns the slab on Drop.

The pool is intentionally a plain struct (not a separate actor) because BlasLtActor already has single-threaded ownership of all matmul calls. Wrapping it in another actor would just add a mailbox hop on the hot path. If a future phase needs cross-actor sharing we’ll lift this to its own actor exactly the way PinnedBufferPool (whose allocator pattern this module mirrors) is structured.

Structs§

WorkspaceLease: RAII lease — returns the slab to the pool on Drop. Callers can take a shared reference to the inner slice for kernel launch via WorkspaceLease::slice.
WorkspacePool: Cloneable handle to the workspace pool.

Constants§

DEFAULT_POOL_CAPACITY_PER_CLASS: Default cap on number of pooled slabs per size class. Beyond this, excess returns are dropped instead of pooled. With the default 256 distinct heuristic shapes and a typical 2-3 distinct workspace classes (4 MiB, 32 MiB, 256 MiB) we expect ≤ a few hundred MiB of pinned VRAM in steady state.

Functions§

size_class: Round a workspace request up to the next power of two ≥ 1 KiB. Bucketing by power-of-two limits the long-tail of unique sizes the pool tracks.