pub struct GpuRef<T> { /* private fields */ }Expand description
A live device-buffer handle.
Holds a strong Arc to the slice (keeping the underlying memory
alive even if the DeviceActor has begun shutdown) plus a Weak to
the surrounding DeviceState (so reference cycles cannot trap the
system in a non-terminating state). Calling GpuRef::access before
each use validates that the context generation has not advanced.
Implementations§
Source§impl<T> GpuRef<T>
impl<T> GpuRef<T>
Sourcepub fn new(slice: Arc<CudaSlice<T>>, state: &Arc<DeviceState>) -> Self
pub fn new(slice: Arc<CudaSlice<T>>, state: &Arc<DeviceState>) -> Self
Wrap a raw Arc<CudaSlice<T>> produced by a DeviceActor into a
GpuRef<T>.
Only DeviceActor (and code reachable from its dispatcher) should
call this — outside callers must obtain GpuRefs by asking the
DeviceActor to allocate.
Sourcepub fn access(&self) -> Result<&Arc<CudaSlice<T>>, GpuError>
pub fn access(&self) -> Result<&Arc<CudaSlice<T>>, GpuError>
Validate the reference and return access to the underlying slice.
Returns GpuError::GpuRefStale if either:
- the owning
DeviceStatehas been dropped, - the device is no longer accepting operations, or
- the context generation has advanced past the one this ref was minted with (i.e. a poisoned-context rebuild has happened).
Sourcepub fn generation(&self) -> u64
pub fn generation(&self) -> u64
Generation token at construction. Exposed for tests.
pub fn is_empty(&self) -> bool
Sourcepub fn device_id(&self) -> Option<u32>
pub fn device_id(&self) -> Option<u32>
Device id this GpuRef was minted on, or None if the owning
DeviceState has been dropped.
Sourcepub fn record_write(&self, stream: &Arc<CudaStream>)
pub fn record_write(&self, stream: &Arc<CudaStream>)
Record the stream that most recently wrote to this buffer. Library actors (BlasActor, CudnnActor, FftActor, etc.) call this after enqueueing a kernel that mutates the slice so that downstream consumers can inject a cross-stream wait.
Sourcepub fn last_write_stream(&self) -> Option<Arc<CudaStream>>
pub fn last_write_stream(&self) -> Option<Arc<CudaStream>>
Most recent producing stream, if any. Returns None when no
kernel has been recorded against this buffer.
Sourcepub fn raw_device_ptr(&self) -> Result<u64, GpuError>
pub fn raw_device_ptr(&self) -> Result<u64, GpuError>
Phase 4.5++ — opaque CUdeviceptr (u64) for downstream
raw-pointer FFI APIs (TensorRT enqueueV3, cuStreamWriteValue64,
custom CUDA modules that aren’t fronted by cudarc).
Validates the GpuRef first via GpuRef::access. The pointer
is captured against the slice’s own associated stream — the
_guard returned by cudarc’s device_ptr() is dropped before
the function returns, but the underlying allocation outlives
this call because the inner Arc<CudaSlice<T>> is held by
self. Callers must ensure they don’t dispatch the resulting
pointer on a stream that has already gone out of scope; in
practice the pointer is consumed immediately by an FFI shim
(TensorRT enqueueV3, etc.) on a stream the caller owns.
Returns GpuError::GpuRefStale if the underlying generation
token is stale or the device is shutting down.
Trait Implementations§
Source§impl<T> DevSliceArg for GpuRef<T>where
T: CudaDtype,
impl<T> DevSliceArg for GpuRef<T>where
T: CudaDtype,
Source§fn validate(&self) -> Result<Box<dyn Any + Send>, GpuError>
fn validate(&self) -> Result<Box<dyn Any + Send>, GpuError>
GpuRef and return a keep-alive
owner. The caller stores this Box<dyn Any + Send> in a Vec
to keep the device buffer alive until the kernel completes.Source§fn push<'a>(&'a self, builder: &mut LaunchArgs<'a>) -> Result<(), GpuError>
fn push<'a>(&'a self, builder: &mut LaunchArgs<'a>) -> Result<(), GpuError>
builder. Implementors
re-access() the GpuRef (cheap — pointer-equality check
against DeviceState.generation) and call
[PushKernelArg::arg] with &CudaSlice<T>. Read more