This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.
HPCToolkit: GPU Profiling
We invite you to consult the best practices for code profiling for general advice on performance analysis on Jean Zay.
Description
HPCToolkit allows profiling GPU (CUDA) applications by sampling, then analysing the results with hpcviewer.
For a timeline and CUDA kernels oriented analysis on Jean Zay, see also:
Installed Versions
The module command provides access to the versions of HPCToolkit (including the -cuda variants) and HPCViewer.
To display the available versions:
$ module avail hpctoolkit hpcviewer
hpctoolkit/2020.08.03 hpctoolkit/2024.01.1 hpctoolkit/2024.01.1-python3.9
hpctoolkit/2020.08.03-cuda hpctoolkit/2024.01.1-cuda hpctoolkit/2024.01.1-python3.10
hpcviewer/2020.07 hpcviewer/2024.02
For the GPU case, load a -cuda version, for example:
$ module load hpctoolkit/2024.01.1-cuda
$ module load hpcviewer/2024.02
Execution and Collection
Example of a Slurm script for an MPI + CUDA code:
#!/bin/bash
#SBATCH --job-name=hpctoolkit_gpu
#SBATCH --output=%x.%j.out
#SBATCH --error=%x.%j.err
#SBATCH --ntasks=4
#SBATCH --ntasks-per-node=4
#SBATCH --gres=gpu:4
#SBATCH --cpus-per-task=10
#SBATCH --hint=nomultithread
#SBATCH --time=00:20:00
module purge
module load ...
module load hpctoolkit/2024.01.1-cuda
set -x
# Collecte des mesures
srun hpcrun ./my_gpu_exe
After execution, a measurement directory of type hpctoolkit-*-measurements is generated.
The choice of events/metrics depends on the version of HPCToolkit and your application. Consult hpcrun --help to adapt the collection.
Building the Analysis Database
Once the collection is complete, build the analysis database:
$ hpcstruct ./my_gpu_exe
$ hpcprof -S ./my_gpu_exe.hpcstruct -o hpctoolkit-my_gpu_exe-database ./hpctoolkit-my_gpu_exe-measurements
Adjust the names according to the directories/files actually generated during your execution.
Visualisation with HPCViewer
Using hpcviewer requires an SSH connection with X11 forwarding (ssh -X).
Launch the graphical interface on the analysis database:
$ module load hpcviewer/2024.02
$ hpcviewer hpctoolkit-my_gpu_exe-database
You can adjust the JVM memory if necessary:
$ hpcviewer -jh 3g hpctoolkit-my_gpu_exe-database
The graphical interface may be slow with X11 forwarding from a login node. You can use a visualisation node or transfer the analysis database to your local machine.