Skip to main content
⚠ INFORMATION
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.

Nsight Systems

We invite you to consult the best practices for code profiling for general advice on performance analysis on Jean Zay.

Description​

Nsight Systems is a performance analysis tool from NVIDIA.

It provides a global timeline view of the execution to correlate CPU and GPU activity: computation phases, communications, memory transfers, synchronisations and idle periods.

It is particularly useful for identifying imbalances between processes, kernel launch latencies, or costly host-device transfers in MPI/CUDA/OpenACC applications.

Nsight Systems complements Nsight Compute: Nsight Systems is used to locate problematic areas in the application, then Nsight Compute allows detailed analysis of the targeted kernels.

It has a graphical user interface (GUI) but can also be used from the command line.

Installed Versions​

The module command provides access to the various versions of Nsight Systems.

To display the available versions:

$ module avail nvidia-nsight-systems
nvidia-nsight-systems/2021.1.1 nvidia-nsight-systems/2021.4.1 nvidia-nsight-systems/2022.5.1
nvidia-nsight-systems/2021.2.1 nvidia-nsight-systems/2022.1.1

Usage​

To use, for example, version 2022.1.1 of Nsight Systems, you need to load the corresponding module:

$ module load nvidia-nsight-systems/2022.1.1

Once the appropriate module is loaded, using Nsight Systems involves two steps:

  1. Running your program in Nsight Systems (from the command line);
  2. Visualising/Analysing the results with the graphical interface.

Execution​

The simplest way is to launch the execution from the command line in your Slurm scripts: just add the command nsys profile before the name of your executable (with options to select the type of sampling to be performed).

important
  • For the command nsys profile to be recognised, the appropriate module must be loaded beforehand (see above) either in the environment of your interactive session or in the environment of your job.
  • For help with the options of the command nsys profile, simply type nsys profile --help.

During execution, Nsight Systems writes its files to the current directory. By default, these files are named report#.qdrep where # is an incremented number to avoid overwriting any existing files. The file name can be specified via the option -o <report_file> and may contain markers %q{VARIABLE_ENVIRONNEMENT} which will be replaced by the value of the specified environment variable.

Attention
  • If the file already exists, the execution will fail to avoid overwriting previous results. Therefore, you must ensure, before launching the execution, that the file specified by the option -o does not exist or use (with caution) the option -f to force the overwriting of existing files.

  • By default, Nsight Systems uses the system directory /tmp which is very limited in size to store temporary data. To provide Nsight Systems with a larger workspace, it is essential to define the TMPDIR variable. For example, to use the JOBSCRATCH directory (specific to each job and destroyed at the end of it):

export TMPDIR=$JOBSCRATCH
# Pour contourner un bogue dans les versions actuelles de Nsight Systems
# il est également nécessaire de créer un lien symbolique permettant de
# faire pointer le répertoire /tmp/nvidia vers TMPDIR
ln -sfn $JOBSCRATCH /tmp/nvidia

Here is an example of a submission script for an MPI + OpenACC code initiating 4 processes:

job_nsys_mpi.slurm
#!/bin/bash
#SBATCH --job-name=nsight_systems # Nom arbitraire du travail Slurm
#SBATCH --output=%x.%j.out # Fichier de sortie standard du travail
#SBATCH --error=%x.%j.err # Fichier d'erreur standard du travail
#SBATCH --ntasks=4 # Nombre de processus MPI demandes
#SBATCH --ntasks-per-node=4 # nombre de tache MPI par noeud (= nombre de GPU par noeud)
#SBATCH --gres=gpu:4 # nombre de GPU par noeud
#SBATCH --cpus-per-task=10 # nombre de coeurs CPU par tache (un quart du noeud ici)
# La ligne ci-dessous peut ĂȘtre dĂ©commentĂ©e pour passer en mode exclusif et disposer
# d'un accÚs complet aux compteurs matériels (cf. les bonnes pratiques pour le profilage)
##SBATCH --exclusive -C prof
# /!\ Attention, la ligne suivante est trompeuse mais dans le vocabulaire
# de Slurm "multithread" fait bien référence à l'hyperthreading.
#SBATCH --hint=nomultithread # 1 processus MPI par coeur physique (pas d'hyperthreading)
#SBATCH --time=00:20:00 # Temps du job hh:mm:ss (20mn ici)

# Chargement des modules de votre choix
module load ...
# Chargement de Nsight Systems
module load nvidia-nsight-systems/2021.2.1

# Echo des commandes
set -x

# Pour ne pas utiliser le /tmp
export TMPDIR=$JOBSCRATCH
# Pour contourner un bogue dans les versions actuelles de Nsight Systems
# il est également nécessaire de créer un lien symbolique permettant de
# faire pointer le répertoire /tmp/nvidia vers TMPDIR
ln -sfn $JOBSCRATCH /tmp/nvidia

# Profiling en mode OpenACC avec génération d'un fichier de résultats
# par processus ("report_rank0.qdrep", "report_rank1.qdrep", etc)
srun nsys profile -t openacc -o "report_rank%q{SLURM_PROCID}" ./my_bin_exe

Visualisation/Analysis of Results​

The results are visualised with the command nsys-ui <report_file> by replacing <report_file> with the name of a previously generated analysis report. Using nsys-ui on Jean Zay requires an SSH connection with X11 forwarding (e.g. ssh -X).

Attention

The graphical interface may be slow when used from a Jean Zay login node with X11 forwarding enabled with ssh -X. It is possible to use a visualisation node of Jean Zay or to install Nsight Systems on your machine and transfer the reports to it for analysis.

Documentation​

The complete documentation is available on the NVIDIA website.

Your opinion matters!

To give your feedback, report an error, or suggest an improvement, click here:

quick anonymous questionnaire

This questionnaire is temporary and will take less than a minute, so take the opportunity!