Skip to main content

Module cluster

Module cluster 

Source
Expand description

Thread-block cluster launches + Distributed Shared Memory (DSM) helpers.

Hopper introduced a fourth launch dimension: a cluster of thread blocks. Blocks within a cluster can synchronise via cluster.sync and read each other’s shared memory through the DSM unit. The host has to launch with cudaLaunchKernelExC (the older cudaLaunchKernel lacks the cluster-dim field).

This module ships:

  • ClusterDim — a (x, y, z) cluster size, validated against the 8-block portable limit (Hopper) / 16-block limit (Blackwell cudaLaunchAttributeNonPortableClusterSizeAllowed).
  • LaunchSpec — grid + block + cluster + shared-memory bytes + stream, plus optional non-portable opt-in.
  • [launch_with_cluster] (gated on hopper) — safe wrapper around cudaLaunchKernelExC.

Structs§

ClusterDim
Cluster dimensions. Hopper supports up to 8 blocks per cluster (portable). Blackwell allows 16 with the non-portable opt-in.
LaunchSpec
Full launch specification for a cluster-aware kernel.

Enums§

ClusterError

Functions§

dsm_total_bytes
Distributed-shared-memory helper: byte count needed to allocate per_block bytes in every block of a cluster of size cluster.