Skip to main content

Crate atomr_accel_cuda

Crate atomr_accel_cuda 

Source
Expand description

§atomr-accel-cuda

GPU acceleration via the actor model. Wraps NVIDIA CUDA libraries as actors on top of atomr. See README.md and the architecture document under docs/ for the full design.

§Foundation Phase F1 (current)

Phases F2–F5 (cuDNN, cuFFT, NCCL, TensorRT, the PythonGpuBridge) and the four blueprint sub-crates are deferred.

Modules§

completion
Completion strategies (§5.10).
device
DeviceActor (outer tier) + ContextActor (inner tier) — §5.11.
dispatcher
GpuDispatcher (§5.1) — pinned single-thread runtime that ensures the actor’s CUDA context stays current on the same OS thread for the actor’s whole lifetime.
dtype
CudaDtype — CUDA-side dtype mappings and capability markers.
error
Error taxonomy and the supervisor decider for context-poisoning recovery (§5.3, §5.11 of the architecture document).
event
EventActor — typed actor surface around CudaEvent.
gpu_ref
GpuRef<T> — opaque, message-friendly handle to a GPU buffer (§5.8).
graph
GraphActor — record a CUDA stream-capture once, replay many.
hopper
Phase 5: Hopper / Blackwell host-side primitives. The module surface is always compiled (the tma::TensorMapDescriptor builder and cluster::LaunchSpec types are useful even on hosts that don’t link a Hopper driver). The hopper cargo feature gates the FFI implementations of cuTensorMapEncodeTiled / cudaLaunchKernelExC. Hopper / Blackwell primitives (Phase 5).
host
Host-side support: pinned (page-locked) memory pool + PinnedBuf<T>.
kernel
Kernel-actor wrappers around CUDA library handles (§3.2).
memory
Managed (unified) memory + Phase 3 driver-API helpers.
module
ModuleActor — load prebuilt cubin/PTX from disk (or memory) and launch its kernels.
multi_device
Top-level multi-device actors that span multiple DeviceActors.
nvrtc_cache
Persistent disk cache for NVRTC-compiled CUDA kernels (Phase 0.6).
observability
Observability glue: install [atomr_telemetry::TelemetryExtension] on a host ActorSystem and expose a small set of GPU-specific probes that callers feed from kernel actors / placement actors / stream allocators.
p2p
P2P (peer-to-peer) topology + cross-device async memcpy.
pipeline
Multi-stream pipeline pattern.
placement
PlacementActor — picks the best-fit DeviceActor for each request based on a configurable PlacementPolicy.
prelude
Common imports for users of atomr-accel-cuda.
replay
Deterministic-replay harness.
stream
Stream allocation strategies (§5.7).
streams_pipeline
atomr-streams-based pipeline helpers — the F10 successor to the actor-based crate::pipeline::PipelineExecutor.
sys
sys — thin Rust wrappers over cudarc’s raw *::sys FFI for library entry points that aren’t yet exposed by the safe layer.