Skip to main content

Module nvrtc

Module nvrtc 

Source
Expand description

NvrtcActor — JIT-compile and launch user-supplied CUDA C++ kernels at runtime.

Two-step lifecycle:

  1. Compile { src, kernel_name, opts, reply } → returns a KernelHandle tied to the current DeviceState generation.
  2. Launch { kernel, args, cfg, reply } → enqueues a kernel call on the actor’s stream. Replies after stream completion.

KernelHandle is Send + Sync + 'static and survives across actor boundaries. It carries a generation token; if the underlying context is rebuilt, [KernelHandle::launch_check] returns GpuError::GpuRefStale and the launch fails fast.

§Phase 0.3 — boxed-dispatch arg types

KernelArg previously had eleven explicit variants (one per dtype for each of slice / scalar) and handle_launch matched on each twice (once to validate, once to push). Phase 0.3 collapses the typed pairs into two boxed-dyn variants plus a Usize fallback.

§Phase 5 — NVRTC v2

NvrtcOpts now exposes:

  • lto--dlink-time-opt / -dlto for link-time optimisation (CUDA 12.0+; gated behind the nvrtc-lto cargo feature).
  • cpp_std--std=c++17 / --std=c++20.
  • arch — typed SmArch selection (sm_80, sm_86, sm_89, sm_90, sm_90a, sm_100, sm_120).
  • name_expressionsnvrtcAddNameExpression / nvrtcGetLoweredName for templated kernels: pass mangled C++ names and look up the lowered ABI symbol from the resulting KernelHandle.
  • extra_options — escape hatch for arbitrary -D… / -I… flags.

Compilation is also available asynchronously via NvrtcMsg::CompileAsync, which off-loads the NVRTC call to a Tokio blocking thread pool so callers don’t block the actor mailbox on a 10-second template instantiation. Both the sync and async paths read through the crate::nvrtc_cache::NvrtcCache persistent disk cache so repeated invocations replay the cubin instead of re-running NVRTC.

Structs§

KernelHandle
Handle to a JIT-compiled, loaded kernel function. Validity is gated by crate::device::DeviceState::generation.
NvrtcActor
NvrtcOpts
Subset of cudarc’s [CompileOptions] exposed at our message surface, plus Phase-5 additions for LTO, C++ standard selection, per-arch SM targeting, name-expression registration, and free-form extra flags.

Enums§

CppStd
C++ standard version for the NVRTC --std=... flag.
KernelArg
A single argument to an NVRTC kernel launch.
NvrtcMsg
SmArch
Selected target SM architecture for NVRTC compilation. Each variant maps to a --gpu-architecture=... flag understood by the bundled NVRTC toolchain. Variant naming matches NVCC’s published list:

Functions§

compile_to_ptx
Phase 5: stand-alone PTX/CUBIN emission for callers that want the raw bytes without spawning an actor. Bypasses the actor mailbox; honours the same cache and arch-selection logic. The returned tuple is (ptx, cubin) where cubin is Some only when LTO is on or the cache hit happened to carry one.
default_disk_cache_path
Phase 5: convenience to construct a builder-style NVRTC compile task that lives behind a default cache directory. Returns the resolved cache path as a hint for tooling that wants to surface the on-disk location.