Expand description
NvrtcActor — JIT-compile and launch user-supplied CUDA C++
kernels at runtime.
Two-step lifecycle:
Compile { src, kernel_name, opts, reply }→ returns aKernelHandletied to the currentDeviceStategeneration.Launch { kernel, args, cfg, reply }→ enqueues a kernel call on the actor’s stream. Replies after stream completion.
KernelHandle is Send + Sync + 'static and survives across actor
boundaries. It carries a generation token; if the underlying
context is rebuilt, [KernelHandle::launch_check] returns
GpuError::GpuRefStale and the launch fails fast.
§Phase 0.3 — boxed-dispatch arg types
KernelArg previously had eleven explicit variants (one per dtype
for each of slice / scalar) and handle_launch matched on each
twice (once to validate, once to push). Phase 0.3 collapses the
typed pairs into two boxed-dyn variants plus a Usize fallback.
§Phase 5 — NVRTC v2
NvrtcOpts now exposes:
lto—--dlink-time-opt/-dltofor link-time optimisation (CUDA 12.0+; gated behind thenvrtc-ltocargo feature).cpp_std—--std=c++17/--std=c++20.arch— typedSmArchselection (sm_80,sm_86,sm_89,sm_90,sm_90a,sm_100,sm_120).name_expressions—nvrtcAddNameExpression/nvrtcGetLoweredNamefor templated kernels: pass mangled C++ names and look up the lowered ABI symbol from the resultingKernelHandle.extra_options— escape hatch for arbitrary-D…/-I…flags.
Compilation is also available asynchronously via
NvrtcMsg::CompileAsync, which off-loads the NVRTC call to a
Tokio blocking thread pool so callers don’t block the actor mailbox
on a 10-second template instantiation. Both the sync and async
paths read through the crate::nvrtc_cache::NvrtcCache persistent
disk cache so repeated invocations replay the cubin instead of
re-running NVRTC.
Structs§
- Kernel
Handle - Handle to a JIT-compiled, loaded kernel function. Validity is
gated by
crate::device::DeviceState::generation. - Nvrtc
Actor - Nvrtc
Opts - Subset of cudarc’s [
CompileOptions] exposed at our message surface, plus Phase-5 additions for LTO, C++ standard selection, per-arch SM targeting, name-expression registration, and free-form extra flags.
Enums§
- CppStd
- C++ standard version for the NVRTC
--std=...flag. - Kernel
Arg - A single argument to an NVRTC kernel launch.
- Nvrtc
Msg - SmArch
- Selected target SM architecture for NVRTC compilation. Each variant
maps to a
--gpu-architecture=...flag understood by the bundled NVRTC toolchain. Variant naming matches NVCC’s published list:
Functions§
- compile_
to_ ptx - Phase 5: stand-alone PTX/CUBIN emission for callers that want the
raw bytes without spawning an actor. Bypasses the actor mailbox;
honours the same cache and arch-selection logic. The returned tuple
is
(ptx, cubin)wherecubinisSomeonly when LTO is on or the cache hit happened to carry one. - default_
disk_ cache_ path - Phase 5: convenience to construct a builder-style NVRTC compile task that lives behind a default cache directory. Returns the resolved cache path as a hint for tooling that wants to surface the on-disk location.