Expand description
cuBLASLt workspace pool — recycles per-heuristic device buffers.
cuBLASLt’s cublasLtMatmul takes an opaque workspace device
pointer + a workspaceSizeInBytes. The size depends on the
selected algorithm; the heuristic cache reports a per-algorithm
workspaceSize and we want to avoid allocating a fresh slab on
every call. This pool buckets free slabs by rounded-up size
class (next power of two ≥ requested) and hands them out under
a WorkspaceLease RAII guard that returns the slab on Drop.
The pool is intentionally a plain struct (not a separate
actor) because BlasLtActor already has single-threaded ownership
of all matmul calls. Wrapping it in another actor would just add a
mailbox hop on the hot path. If a future phase needs cross-actor
sharing we’ll lift this to its own actor exactly the way
PinnedBufferPool (whose allocator pattern this module mirrors)
is structured.
Structs§
- Workspace
Lease - RAII lease — returns the slab to the pool on Drop. Callers can
take a shared reference to the inner slice for kernel launch via
WorkspaceLease::slice. - Workspace
Pool - Cloneable handle to the workspace pool.
Constants§
- DEFAULT_
POOL_ CAPACITY_ PER_ CLASS - Default cap on number of pooled slabs per size class. Beyond this, excess returns are dropped instead of pooled. With the default 256 distinct heuristic shapes and a typical 2-3 distinct workspace classes (4 MiB, 32 MiB, 256 MiB) we expect ≤ a few hundred MiB of pinned VRAM in steady state.
Functions§
- size_
class - Round a workspace request up to the next power of two ≥ 1 KiB. Bucketing by power-of-two limits the long-tail of unique sizes the pool tracks.