This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.
Nsight Systems
We invite you to consult the best practices for code profiling for general advice on performance analysis on Jean Zay.
Descriptionâ
Nsight Systems is a performance analysis tool from NVIDIA.
It provides a global timeline view of the execution to correlate CPU and GPU activity: computation phases, communications, memory transfers, synchronisations and idle periods.
It is particularly useful for identifying imbalances between processes, kernel launch latencies, or costly host-device transfers in MPI/CUDA/OpenACC applications.
Nsight Systems complements Nsight Compute: Nsight Systems is used to locate problematic areas in the application, then Nsight Compute allows detailed analysis of the targeted kernels.
It has a graphical user interface (GUI) but can also be used from the command line.
Installed Versionsâ
The module command provides access to the various versions of Nsight Systems.
To display the available versions:
$ module avail nvidia-nsight-systems
nvidia-nsight-systems/2021.1.1 nvidia-nsight-systems/2021.4.1 nvidia-nsight-systems/2022.5.1
nvidia-nsight-systems/2021.2.1 nvidia-nsight-systems/2022.1.1
Usageâ
To use, for example, version 2022.1.1 of Nsight Systems, you need to load the corresponding module:
$ module load nvidia-nsight-systems/2022.1.1
Once the appropriate module is loaded, using Nsight Systems involves two steps:
- Running your program in Nsight Systems (from the command line);
- Visualising/Analysing the results with the graphical interface.
Executionâ
The simplest way is to launch the execution from the command line in your Slurm scripts: just add the command nsys profile before the name of your executable (with options to select the type of sampling to be performed).
- For the command
nsys profileto be recognised, the appropriate module must be loaded beforehand (see above) either in the environment of your interactive session or in the environment of your job. - For help with the options of the command
nsys profile, simply typensys profile --help.
During execution, Nsight Systems writes its files to the current directory. By default, these files are named report#.qdrep where # is an incremented number to avoid overwriting any existing files. The file name can be specified via the option -o <report_file> and may contain markers %q{VARIABLE_ENVIRONNEMENT} which will be replaced by the value of the specified environment variable.
-
If the file already exists, the execution will fail to avoid overwriting previous results. Therefore, you must ensure, before launching the execution, that the file specified by the option
-odoes not exist or use (with caution) the option-fto force the overwriting of existing files. -
By default, Nsight Systems uses the system directory
/tmpwhich is very limited in size to store temporary data. To provide Nsight Systems with a larger workspace, it is essential to define the TMPDIR variable. For example, to use the JOBSCRATCH directory (specific to each job and destroyed at the end of it):
export TMPDIR=$JOBSCRATCH
# Pour contourner un bogue dans les versions actuelles de Nsight Systems
# il est également nécessaire de créer un lien symbolique permettant de
# faire pointer le répertoire /tmp/nvidia vers TMPDIR
ln -sfn $JOBSCRATCH /tmp/nvidia
Here is an example of a submission script for an MPI + OpenACC code initiating 4 processes:
#!/bin/bash
#SBATCH --job-name=nsight_systems # Nom arbitraire du travail Slurm
#SBATCH --output=%x.%j.out # Fichier de sortie standard du travail
#SBATCH --error=%x.%j.err # Fichier d'erreur standard du travail
#SBATCH --ntasks=4 # Nombre de processus MPI demandes
#SBATCH --ntasks-per-node=4 # nombre de tache MPI par noeud (= nombre de GPU par noeud)
#SBATCH --gres=gpu:4 # nombre de GPU par noeud
#SBATCH --cpus-per-task=10 # nombre de coeurs CPU par tache (un quart du noeud ici)
# La ligne ci-dessous peut ĂȘtre dĂ©commentĂ©e pour passer en mode exclusif et disposer
# d'un accÚs complet aux compteurs matériels (cf. les bonnes pratiques pour le profilage)
##SBATCH --exclusive -C prof
# /!\ Attention, la ligne suivante est trompeuse mais dans le vocabulaire
# de Slurm "multithread" fait bien référence à l'hyperthreading.
#SBATCH --hint=nomultithread # 1 processus MPI par coeur physique (pas d'hyperthreading)
#SBATCH --time=00:20:00 # Temps du job hh:mm:ss (20mn ici)
# Chargement des modules de votre choix
module load ...
# Chargement de Nsight Systems
module load nvidia-nsight-systems/2021.2.1
# Echo des commandes
set -x
# Pour ne pas utiliser le /tmp
export TMPDIR=$JOBSCRATCH
# Pour contourner un bogue dans les versions actuelles de Nsight Systems
# il est également nécessaire de créer un lien symbolique permettant de
# faire pointer le répertoire /tmp/nvidia vers TMPDIR
ln -sfn $JOBSCRATCH /tmp/nvidia
# Profiling en mode OpenACC avec génération d'un fichier de résultats
# par processus ("report_rank0.qdrep", "report_rank1.qdrep", etc)
srun nsys profile -t openacc -o "report_rank%q{SLURM_PROCID}" ./my_bin_exe
Visualisation/Analysis of Resultsâ
The results are visualised with the command nsys-ui <report_file> by replacing <report_file> with the name of a previously generated analysis report.
Using nsys-ui on Jean Zay requires an SSH connection with X11 forwarding (e.g. ssh -X).
The graphical interface may be slow when used from a Jean Zay login node with X11 forwarding enabled with ssh -X. It is possible to use a visualisation node of Jean Zay or to install Nsight Systems on your machine and transfer the reports to it for analysis.
Documentationâ
The complete documentation is available on the NVIDIA website.