Expand description
LRU plan cache for cuTENSOR operations.
cutensorCreatePlan is expensive enough that real workloads
amortise it across many calls with identical shape signatures.
PlanCache holds an LruCache<PlanKey, CachedPlan> so the actor
can hash a description once, look up an existing plan, and only
pay for the descriptor + plan + workspace-estimate triplet on a
miss.
§Key
Keyed by (op_kind, modes_hash, extents_hash, alignment, compute_descriptor_tag, scalar_dtype_tag, autotune_algo) —
everything that influences cuTENSOR’s choice of internal kernel
and workspace size. The autotune-picked algo is folded into the
key so an autotuned plan never collides with a default-algo plan.
Structs§
- Cached
Plan - Newtype around the cuTENSOR descriptor pointers so we can
unsafe impl Send. EachCachedPlanowns its descriptors and the plan itself; onDropwe tear them down in reverse construction order. - Plan
Cache - Thread-safe wrapper around
LruCache<PlanKey, Arc<CachedPlan>>.Arclets the actor hand the plan out to a kernel-launch closure that may outlive a subsequent cache eviction. - PlanKey
- Hashable plan key. Modes / extents arrive pre-hashed (u64) so the
key remains
Copy + Eq.
Enums§
- OpKind
- Operation kind discriminator embedded in the cache key.
Constants§
- DEFAULT_
PLAN_ CACHE_ SIZE - Default LRU capacity. 256 is a generous upper bound: each entry
owns a
cutensorPlan_tplus acutensorOperationDescriptor_tplus tensor descriptors, which together cost ~few KiB on the host — order-MiB total at full occupancy.