Skip to main content

Module collective

Module collective 

Source
Expand description

CollectiveActor — wraps an [cudarc::nccl::Comm] for one rank within an NcclWorldActor group.

Phase 2 NCCL slice: full collective surface (AllReduce, AllGather, ReduceScatter, AllToAll(v), Reduce, Broadcast), point-to-point Send/Recv, typed group scope guard, NVLS/SHARP/fp8 capability probe, and a custom PreMulSum reduce op. dtype-generic via the NcclReduceSupported marker (defined here until Phase 0 lands).

Each CollectiveActor is bound to one specific crate::device::DeviceState (one rank in the NCCL world). The parent NcclWorldActor spawns N of these (one per device) and routes messages to all of them in a group_start/group_end pair where appropriate.

Re-exports§

pub use all_to_all::AllToAllRequest;
pub use all_to_all::AllToAllvRequest;
pub use allgather::AllGatherRequest;
pub use allreduce::AllReduceRequest;
pub use broadcast::BroadcastRequest;
pub use capabilities::probe_capabilities;
pub use capabilities::NcclCapabilities;
pub use custom_op::PreMulSumOp;
pub use group::GroupGuard;
pub use p2p::RecvRequest;
pub use p2p::SendRequest;
pub use reduce::ReduceRequest;
pub use reduce_scatter::ReduceScatterRequest;

Modules§

all_to_all
Typed AllToAll / AllToAllv requests.
allgather
Typed AllGather request — generic over T: NcclReduceSupported.
allreduce
Typed AllReduce request. Generic over T: NcclReduceSupported (any of f32/f64/i8/u8/i32/u32/i64/u64; f16/bf16 with f16).
broadcast
Typed Broadcast request — generic over T: NcclReduceSupported.
capabilities
Runtime probe for NCCL capabilities.
custom_op
Custom reduce ops — currently ncclRedOpCreatePreMulSum.
group
Typed scope guard around ncclGroupStart / ncclGroupEnd.
p2p
Typed point-to-point Send / Recv requests.
reduce
Typed Reduce request: reduce-to-root variant of AllReduce.
reduce_scatter
Typed ReduceScatter request — generic over T: NcclReduceSupported.

Structs§

CollectiveActor

Enums§

CollectiveMsg
Public message surface for the CollectiveActor. Hot path goes through CollectiveMsg::Op which carries a boxed CollectiveDispatch; the legacy AllReduceF32 / BroadcastF32 variants remain for back-compat and route through the same machinery.
ReduceOp

Traits§

NcclReduceSupported
Marker for dtypes carried in NCCL collectives. Mirrors the NcclReduceSupported marker that the Phase 0 dtype.rs will host — defined locally here so the NCCL slice can ship before Phase 0 fully lands. The set matches NCCL’s reduce-supported types: f32, f64, f16, bf16, i8, u8, i32, u32, i64, u64. fp8 e4m3/e5m2 are behind nccl-fp8 and require NCCL >= 2.20.