Skip to main content

Module custom_op

Module custom_op 

Source
Expand description

Custom reduce ops — currently ncclRedOpCreatePreMulSum.

cudarc 0.19.4 does not expose the raw ncclComm_t from cudarc::nccl::Comm — the field is private. PreMulSum creation needs that pointer. We rely on the documented layout (comm: ncclComm_t is the first field of the pub struct Comm) and read it via a layout-fragile pointer cast, gated behind a #[repr(C)] shadow type guarded with a static assertion on offset and size.

If cudarc upgrades break this assumption the static assertions will fail at compile time, and we’ll move to a vendored Comm constructor.

Structs§

PreMulSumOp
PreMulSum custom reduce op: AllReduce-equivalent with a per-tensor scalar premultiplier living in device memory. Construct via PreMulSumOp::new; destroy via PreMulSumOp::destroy before the comm goes away.