A synthetic micro-benchmark that measures peak compute, bandwidth, and matrix throughput of GPUs and CPUs
clpeak — "Compute Latency PEAK". A synthetic micro-benchmark that measures the peak achievable performance of GPU compute devices. It exercises tight vector / MAD / MMA loops and vendor-SDK GEMM libraries (cuBLASLt on NVIDIA, MPS on Apple) to expose what the hardware is capable of — from raw ALU peaks to near-vendor-advertised matrix throughput.
clpeak began as an OpenCL-only tool and is now a multi-backend benchmark — OpenCL, Vulkan, CUDA, ROCm/HIP, Metal, oneAPI/SYCL, plus a native CPU backend — run back-to-back on the same hardware, so cross-stack differences (driver lowering, instruction scheduling, extension exposure) surface alongside the raw peak numbers.