Expand description
Custom reduce ops — currently ncclRedOpCreatePreMulSum.
cudarc 0.19.4 does not expose the raw ncclComm_t from
cudarc::nccl::Comm — the field is private. PreMulSum creation
needs that pointer. We rely on the documented layout
(comm: ncclComm_t is the first field of the pub struct Comm)
and read it via a layout-fragile pointer cast, gated behind a
#[repr(C)] shadow type guarded with a static assertion on
offset and size.
If cudarc upgrades break this assumption the static assertions will fail at compile time, and we’ll move to a vendored Comm constructor.
Structs§
- PreMul
SumOp - PreMulSum custom reduce op: AllReduce-equivalent with a per-tensor
scalar premultiplier living in device memory. Construct via
PreMulSumOp::new; destroy viaPreMulSumOp::destroybefore the comm goes away.