Skip to main content
⚠ INFORMATION
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.

Nsight Compute

We invite you to consult the best practices for code profiling for general advice on performance analysis on Jean Zay.

Description

Nsight Compute is an NVIDIA tool dedicated to the fine profiling of CUDA kernels.

Unlike Nsight Systems (global timeline view), Nsight Compute focuses on the kernel level and allows precise identification of GPU bottlenecks.

It provides detailed metrics on:

  • SM occupancy and warp efficiency;
  • memory usage (global memory, L2 cache, shared memory);
  • stalls and throughput limitations (compute-bound vs memory-bound).

It can be used via the command line (ncu) to collect measurements, and then via a graphical interface (ncu-ui) to explore reports, compare executions, and guide kernel optimisations.

Installed Versions

The module command provides access to the various versions of Nsight Compute.

To display the available versions:

$ module avail nvidia-nsight-compute
nvidia-nsight-compute/2020.3.1 nvidia-nsight-compute/2022.1.0 nvidia-nsight-compute/2023.3.1.0
nvidia-nsight-compute/2021.3.0 nvidia-nsight-compute/2022.4.0 nvidia-nsight-compute/2024.3.2.0

Usage

To use, for example, version 2024.3.2.0 of Nsight Compute, you need to load the corresponding module:

$ module load nvidia-nsight-compute/2024.3.2.0

Once the module is loaded, usage is in two steps:

  • Run your application with ncu (data collection);
  • Visualise/analyse the reports with ncu-ui.

Command Line Execution (CLI)

Data collection is done with ncu in your Slurm scripts.

important
  • For the ncu command to be recognised, the appropriate module must be loaded beforehand.
  • Nsight Compute profiling can significantly slow down execution. It is recommended to profile reduced and targeted cases.
  • To see all options, use ncu --help.

Here is an example of a submission script for a multi-GPU MPI code (4 processes, 4 GPUs), with one report file per MPI rank:

job_ncu_mpi.slurm
#!/bin/bash
#SBATCH --job-name=nsight_compute
#SBATCH --output=%x.%j.out
#SBATCH --error=%x.%j.err
#SBATCH --ntasks=4
#SBATCH --ntasks-per-node=4
#SBATCH --gres=gpu:4
#SBATCH --cpus-per-task=10
##SBATCH --exclusive -C prof
#SBATCH --hint=nomultithread
#SBATCH --time=00:20:00

module load ...
module load nvidia-nsight-compute/2024.3.2.0

set -x

# Exemple: un rapport .ncu-rep par processus MPI
srun bash -c 'ncu -f -o ncu_report_rank${SLURM_PROCID} ./my_bin_exe'
Attention

Avoid profiling all iterations of a large production case. Instead, target a representative portion (reduced dataset, critical phase, targeted kernels).

Visualisation/Analysis of Results (GUI)

The reports generated by ncu are .ncu-rep files. Using ncu-ui on Jean Zay requires an SSH connection with X11 forwarding (e.g. ssh -X).

You can open them with:

ncu-ui ncu_report_rank0.ncu-rep
Attention

The graphical interface may be slow with X11 forwarding from a login node. You can use a visualisation node or transfer the reports to your local machine.

Documentation

The complete documentation is available on the NVIDIA website.

Your opinion matters!

To give your feedback, report an error, or suggest an improvement, click here:

quick anonymous questionnaire

This questionnaire is temporary and will take less than a minute, so take the opportunity!