Expand description
CollectiveActor — wraps an [cudarc::nccl::Comm] for one rank
within an NcclWorldActor group.
Phase 2 NCCL slice: full collective surface (AllReduce, AllGather,
ReduceScatter, AllToAll(v), Reduce, Broadcast), point-to-point
Send/Recv, typed group scope guard, NVLS/SHARP/fp8 capability
probe, and a custom PreMulSum reduce op. dtype-generic via the
NcclReduceSupported marker (defined here until Phase 0 lands).
Each CollectiveActor is bound to one specific
crate::device::DeviceState (one rank in the NCCL world). The
parent NcclWorldActor spawns N of these (one per device) and
routes messages to all of them in a group_start/group_end
pair where appropriate.
Re-exports§
pub use all_to_all::AllToAllRequest;pub use all_to_all::AllToAllvRequest;pub use allgather::AllGatherRequest;pub use allreduce::AllReduceRequest;pub use broadcast::BroadcastRequest;pub use capabilities::probe_capabilities;pub use capabilities::NcclCapabilities;pub use custom_op::PreMulSumOp;pub use group::GroupGuard;pub use p2p::RecvRequest;pub use p2p::SendRequest;pub use reduce::ReduceRequest;pub use reduce_scatter::ReduceScatterRequest;
Modules§
- all_
to_ all - Typed AllToAll / AllToAllv requests.
- allgather
- Typed AllGather request — generic over
T: NcclReduceSupported. - allreduce
- Typed AllReduce request. Generic over
T: NcclReduceSupported(any of f32/f64/i8/u8/i32/u32/i64/u64; f16/bf16 withf16). - broadcast
- Typed Broadcast request — generic over
T: NcclReduceSupported. - capabilities
- Runtime probe for NCCL capabilities.
- custom_
op - Custom reduce ops — currently
ncclRedOpCreatePreMulSum. - group
- Typed scope guard around
ncclGroupStart/ncclGroupEnd. - p2p
- Typed point-to-point Send / Recv requests.
- reduce
- Typed Reduce request: reduce-to-root variant of AllReduce.
- reduce_
scatter - Typed ReduceScatter request — generic over
T: NcclReduceSupported.
Structs§
Enums§
- Collective
Msg - Public message surface for the
CollectiveActor. Hot path goes throughCollectiveMsg::Opwhich carries a boxedCollectiveDispatch; the legacyAllReduceF32/BroadcastF32variants remain for back-compat and route through the same machinery. - Reduce
Op
Traits§
- Nccl
Reduce Supported - Marker for dtypes carried in NCCL collectives. Mirrors the
NcclReduceSupportedmarker that the Phase 0dtype.rswill host — defined locally here so the NCCL slice can ship before Phase 0 fully lands. The set matches NCCL’s reduce-supported types: f32, f64, f16, bf16, i8, u8, i32, u32, i64, u64. fp8 e4m3/e5m2 are behindnccl-fp8and require NCCL >= 2.20.