pub enum DeviceMsg {
Show 30 variants
Alloc(Box<dyn AllocDispatch>),
CopyToHost(Box<dyn CopyToHostDispatch>),
CopyFromHost(Box<dyn CopyFromHostDispatch>),
Allocate {
len: usize,
reply: Sender<Result<GpuRef<f32>, GpuError>>,
},
AllocateF32 {
len: usize,
reply: Sender<Result<GpuRef<f32>, GpuError>>,
},
AllocateF64 {
len: usize,
reply: Sender<Result<GpuRef<f64>, GpuError>>,
},
AllocateI8 {
len: usize,
reply: Sender<Result<GpuRef<i8>, GpuError>>,
},
AllocateI32 {
len: usize,
reply: Sender<Result<GpuRef<i32>, GpuError>>,
},
AllocateI64 {
len: usize,
reply: Sender<Result<GpuRef<i64>, GpuError>>,
},
AllocateU8 {
len: usize,
reply: Sender<Result<GpuRef<u8>, GpuError>>,
},
AllocateU32 {
len: usize,
reply: Sender<Result<GpuRef<u32>, GpuError>>,
},
AllocateU64 {
len: usize,
reply: Sender<Result<GpuRef<u64>, GpuError>>,
},
CopyToHostF32 {
src: GpuRef<f32>,
dst: HostBuf<f32>,
reply: Sender<Result<HostBuf<f32>, GpuError>>,
},
CopyFromHostF32 {
src: HostBuf<f32>,
dst: GpuRef<f32>,
reply: Sender<Result<HostBuf<f32>, GpuError>>,
},
CopyToHostF64 {
src: GpuRef<f64>,
dst: HostBuf<f64>,
reply: Sender<Result<HostBuf<f64>, GpuError>>,
},
CopyFromHostF64 {
src: HostBuf<f64>,
dst: GpuRef<f64>,
reply: Sender<Result<HostBuf<f64>, GpuError>>,
},
CopyToHostI32 {
src: GpuRef<i32>,
dst: HostBuf<i32>,
reply: Sender<Result<HostBuf<i32>, GpuError>>,
},
CopyFromHostI32 {
src: HostBuf<i32>,
dst: GpuRef<i32>,
reply: Sender<Result<HostBuf<i32>, GpuError>>,
},
CopyToHostU32 {
src: GpuRef<u32>,
dst: HostBuf<u32>,
reply: Sender<Result<HostBuf<u32>, GpuError>>,
},
CopyFromHostU32 {
src: HostBuf<u32>,
dst: GpuRef<u32>,
reply: Sender<Result<HostBuf<u32>, GpuError>>,
},
CopyToHostU8 {
src: GpuRef<u8>,
dst: HostBuf<u8>,
reply: Sender<Result<HostBuf<u8>, GpuError>>,
},
CopyFromHostU8 {
src: HostBuf<u8>,
dst: GpuRef<u8>,
reply: Sender<Result<HostBuf<u8>, GpuError>>,
},
Sgemm(Box<SgemmRequest>),
SnapshotContext {
reply: Sender<Option<Arc<CudaContext>>>,
},
SnapshotStream {
reply: Sender<Option<Arc<CudaStream>>>,
},
SnapshotChildren {
reply: Sender<Option<KernelChildren>>,
},
WatchGeneration {
reply: Sender<Receiver<u64>>,
},
Stats {
reply: Sender<DeviceLoad>,
},
ContextReady {
children: KernelChildren,
},
ContextLost,
}Expand description
Public messages sent to a DeviceActor.
Phase 0.4 — the formerly-21 dtype-enumerated Allocate* /
CopyToHost* / CopyFromHost* variants collapse into 3 boxed
dispatchers:
DeviceMsg::Alloc— typed allocationDeviceMsg::CopyToHost— D2H async copyDeviceMsg::CopyFromHost— H2D async copy
Each carries a Box<dyn …Dispatch> whose concrete payload is an
AllocReq<T> / CopyToHostReq<T> / CopyFromHostReq<T> for some
T: CudaDtype. GpuRef<T> keeps its static dtype on both ends —
the box is purely a uniform mailbox surface.
The legacy Allocate* / CopyToHost* / CopyFromHost* variants
remain as #[deprecated] aliases. Existing call sites compile and
run unchanged; the handler arm constructs the equivalent
Box<dyn …Dispatch> and forwards through the new path.
Variants§
Alloc(Box<dyn AllocDispatch>)
Phase 0.4 generic alloc. Construct via
DeviceMsg::alloc::<T> or
Box::new(AllocReq::<T> { … }) directly.
CopyToHost(Box<dyn CopyToHostDispatch>)
Phase 0.4 generic D2H copy.
CopyFromHost(Box<dyn CopyFromHostDispatch>)
Phase 0.4 generic H2D copy.
Allocate
Deprecated alias for DeviceMsg::AllocateF32. F1
callers wrote Allocate { len, reply } — kept for back-compat.
Fields
AllocateF32
Fields
AllocateF64
Fields
AllocateI8
Fields
AllocateI32
Fields
AllocateI64
Fields
AllocateU8
Fields
AllocateU32
Fields
AllocateU64
Fields
CopyToHostF32
D2H async copy — buffer round-trips back via the reply so a pinned buffer can return to its pool.
Fields
CopyFromHostF32
Fields
CopyToHostF64
Fields
CopyFromHostF64
Fields
CopyToHostI32
Fields
CopyFromHostI32
Fields
CopyToHostU32
Fields
CopyFromHostU32
Fields
CopyToHostU8
Fields
CopyFromHostU8
Fields
Sgemm(Box<SgemmRequest>)
Fire an SGEMM through the context’s BlasActor.
SnapshotContext
F4: Snapshot the underlying Arc<CudaContext> so a top-level
observer (P2pTopology, NcclWorldActor) can build cross-device
machinery. Replies None if the context isn’t ready.
SnapshotStream
Phase 4.5++ — Snapshot the device’s primary Arc<CudaStream>
(the stream owned by ContextActor). Returned to downstream
raw-pointer FFI users (TensorRT enqueueV3, custom kernel
launchers) that need to share a single CUDA execution timeline
with the rest of the device’s library actors.
Replies None if the context isn’t ready (e.g. mock mode, or
before ContextReady). On real hardware the returned stream
is the same one that BLAS / cuDNN / cuFFT child actors were
minted off.
SnapshotChildren
F7: Snapshot the current KernelChildren so application code
can talk to library actors directly (e.g. RngActor,
CudnnActor). Replies None until ContextActor::Init
completes.
Fields
reply: Sender<Option<KernelChildren>>WatchGeneration
F9: Subscribe to the device’s DeviceState::generation_watch.
The receiver fires every time the underlying CudaContext
rebuilds. Used by NcclWorldActor and P2pTopology to
react to context loss.
Stats
F5: Per-device load snapshot for placement scheduling.
Fields
reply: Sender<DeviceLoad>ContextReady
Internal: ContextActor has finished initialising and the
kernel actors are live.
Fields
children: KernelChildrenContextLost
Internal: ContextActor notifies that the context was torn
down (e.g. on poisoning); pending work should be re-stashed
until a new ContextReady arrives.
Implementations§
Source§impl DeviceMsg
impl DeviceMsg
Sourcepub fn alloc<T: CudaDtype>(
len: usize,
reply: Sender<Result<GpuRef<T>, GpuError>>,
) -> Self
pub fn alloc<T: CudaDtype>( len: usize, reply: Sender<Result<GpuRef<T>, GpuError>>, ) -> Self
Phase 0.4: typed-allocation constructor. Boxes an
AllocReq<T> into the generic DeviceMsg::Alloc variant.