Expand description
Thread-block cluster launches + Distributed Shared Memory (DSM) helpers.
Hopper introduced a fourth launch dimension: a cluster of thread
blocks. Blocks within a cluster can synchronise via cluster.sync
and read each other’s shared memory through the DSM unit. The host
has to launch with cudaLaunchKernelExC (the older
cudaLaunchKernel lacks the cluster-dim field).
This module ships:
ClusterDim— a(x, y, z)cluster size, validated against the 8-block portable limit (Hopper) / 16-block limit (BlackwellcudaLaunchAttributeNonPortableClusterSizeAllowed).LaunchSpec— grid + block + cluster + shared-memory bytes + stream, plus optional non-portable opt-in.- [
launch_with_cluster] (gated onhopper) — safe wrapper aroundcudaLaunchKernelExC.
Structs§
- Cluster
Dim - Cluster dimensions. Hopper supports up to 8 blocks per cluster (portable). Blackwell allows 16 with the non-portable opt-in.
- Launch
Spec - Full launch specification for a cluster-aware kernel.
Enums§
Functions§
- dsm_
total_ bytes - Distributed-shared-memory helper: byte count needed to allocate
per_blockbytes in every block of a cluster of sizecluster.