Module plan_cache

Expand description

LRU plan cache for cuTENSOR operations.

cutensorCreatePlan is expensive enough that real workloads amortise it across many calls with identical shape signatures. PlanCache holds an LruCache<PlanKey, CachedPlan> so the actor can hash a description once, look up an existing plan, and only pay for the descriptor + plan + workspace-estimate triplet on a miss.

§Key

Keyed by (op_kind, modes_hash, extents_hash, alignment, compute_descriptor_tag, scalar_dtype_tag, autotune_algo) — everything that influences cuTENSOR’s choice of internal kernel and workspace size. The autotune-picked algo is folded into the key so an autotuned plan never collides with a default-algo plan.

Structs§

CachedPlan: Newtype around the cuTENSOR descriptor pointers so we can unsafe impl Send. Each CachedPlan owns its descriptors and the plan itself; on Drop we tear them down in reverse construction order.
PlanCache: Thread-safe wrapper around LruCache<PlanKey, Arc<CachedPlan>>. Arc lets the actor hand the plan out to a kernel-launch closure that may outlive a subsequent cache eviction.
PlanKey: Hashable plan key. Modes / extents arrive pre-hashed (u64) so the key remains Copy + Eq.

Enums§

OpKind: Operation kind discriminator embedded in the cache key.

Constants§

DEFAULT_PLAN_CACHE_SIZE: Default LRU capacity. 256 is a generous upper bound: each entry owns a cutensorPlan_t plus a cutensorOperationDescriptor_t plus tensor descriptors, which together cost ~few KiB on the host — order-MiB total at full occupancy.

Functions§

hash_i32s: Hash a slice of i32 modes.
hash_i64s: Hash a slice of i64 (extents or strides) into a u64. Uses the std FxHash-equivalent default hasher. Cheap and stable within a single process — that’s all we need for plan-cache lookups.