Crate atomr_accel_cuda

Expand description

§atomr-accel-cuda

GPU acceleration via the actor model. Wraps NVIDIA CUDA libraries as actors on top of atomr. See README.md and the architecture document under docs/ for the full design.

§Foundation Phase F1 (current)

Two-tier supervision: device::DeviceActor (stable address) ↔ device::ContextActor (owns Arc<CudaContext>, restartable).
gpu_ref::GpuRef with generation-token validity checks.
dispatcher::GpuDispatcher pinning actor execution to a single OS thread.
completion::HostFnCompletion for sub-microsecond stream completion via cuLaunchHostFunc.
stream::PerActorAllocator as the default §5.7 strategy.
kernel::BlasActor performing cuBLAS SGEMM as the canonical demo.

Phases F2–F5 (cuDNN, cuFFT, NCCL, TensorRT, the PythonGpuBridge) and the four blueprint sub-crates are deferred.

Modules§

completion: Completion strategies (§5.10).
device: DeviceActor (outer tier) + ContextActor (inner tier) — §5.11.
dispatcher: GpuDispatcher (§5.1) — pinned single-thread runtime that ensures the actor’s CUDA context stays current on the same OS thread for the actor’s whole lifetime.
dtype: CudaDtype — CUDA-side dtype mappings and capability markers.
error: Error taxonomy and the supervisor decider for context-poisoning recovery (§5.3, §5.11 of the architecture document).
event: EventActor — typed actor surface around CudaEvent.
gpu_ref: GpuRef<T> — opaque, message-friendly handle to a GPU buffer (§5.8).
graph: GraphActor — record a CUDA stream-capture once, replay many.
hopper: Phase 5: Hopper / Blackwell host-side primitives. The module surface is always compiled (the tma::TensorMapDescriptor builder and cluster::LaunchSpec types are useful even on hosts that don’t link a Hopper driver). The hopper cargo feature gates the FFI implementations of cuTensorMapEncodeTiled / cudaLaunchKernelExC. Hopper / Blackwell primitives (Phase 5).
host: Host-side support: pinned (page-locked) memory pool + PinnedBuf<T>.
kernel: Kernel-actor wrappers around CUDA library handles (§3.2).
memory: Managed (unified) memory + Phase 3 driver-API helpers.
module: ModuleActor — load prebuilt cubin/PTX from disk (or memory) and launch its kernels.
multi_device: Top-level multi-device actors that span multiple DeviceActors.
nvrtc_cache: Persistent disk cache for NVRTC-compiled CUDA kernels (Phase 0.6).
observability: Observability glue: install [atomr_telemetry::TelemetryExtension] on a host ActorSystem and expose a small set of GPU-specific probes that callers feed from kernel actors / placement actors / stream allocators.
p2p: P2P (peer-to-peer) topology + cross-device async memcpy.
pipeline: Multi-stream pipeline pattern.
placement: PlacementActor — picks the best-fit DeviceActor for each request based on a configurable PlacementPolicy.
prelude: Common imports for users of atomr-accel-cuda.
replay: Deterministic-replay harness.
stream: Stream allocation strategies (§5.7).
streams_pipeline: atomr-streams-based pipeline helpers — the F10 successor to the actor-based crate::pipeline::PipelineExecutor.
sys: sys — thin Rust wrappers over cudarc’s raw *::sys FFI for library entry points that aren’t yet exposed by the safe layer.

Crate atomr_accel_cuda

Crate atomr_accel_cuda Copy item path

§atomr-accel-cuda

§Foundation Phase F1 (current)

Modules§

Crate atomr_accel_cuda