Skip to main content
⚠ INFORMATION
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.

VTune

We invite you to consult the best practices for code profiling for general advice on performance analysis on Jean Zay.

Description​

VTune Amplifier is a performance analysis tool that is part of the Intel Parallel Studio XE suite. It has a graphical user interface (GUI) but can also be used from the command line.

Installed Versions​

  • Several versions of VTune are available depending on the versions of Intel Parallel Studio XE installed on the machine.
  • VTune Amplifier is perfectly suited for hybrid MPI + OpenMP applications.

Usage​

The module command allows access to the various versions of VTune.

To display the available versions, you have two options:

  • via the corresponding modulefiles:
$ module av intel-vtune
intel-vtune/18.0.1 intel-vtune/18.0.5 intel-vtune/19.0.2
intel-vtune/19.0.4(default) intel-vtune/19.0.5
  • or via the Intel environments:
$ module av intel-all
intel-all/16.0.4 intel-all/18.0.1 intel-all/18.0.5 intel-all/19.0.2
intel-all/19.0.4(default) intel-all/19.0.5

To use, for example, version 19.0.4 of VTune, you can use:

  • either the corresponding modulefile:
$ module load intel-vtune/19.0.4
  • or the corresponding Intel environment:
$ module load intel-all/19.0.4
INFO

Note that the intel-all/XX.Y.Z modules contain the complete Intel environments with compilers, the MKL math library, the MPI library, and code analysis tools.

Once the appropriate module is loaded, using VTune involves two steps:

  • Execution of your program via VTune (from the command line);
  • Visualisation/Analysis of the results with the graphical interface.

Execution​

The simplest way is to launch the execution from the command line in your Slurm scripts: just add the command amplxe-cl right before the name of your executable (with options to select the type of sampling to be performed).

Attention

We strongly recommend using the same version of the Intel environment for compiling, executing, and analysing your code.

Remarks:

  • For the command amplxe-cl to be recognised, the appropriate modulefile must have been loaded beforehand (see above) either in the environment of your interactive session or in the environment of your job.
  • To get help regarding the options of the command amplxe-cl, simply type amplxe-cl --help.
Attention

The various versions of VTune do not use the Intel sampling drivers because they are not installed on the compute nodes, but the tool Perf. As a result, some analyses, which can be enabled via the option -collect, are not available on Jean Zay: for example, the option -collect memory-access which allows memory bandwidth measurements.

During execution, VTune writes its files to a directory whose name can be specified via the option -r nom_repertoire. Make sure to specify a directory located in one of your permanent (such as WORK) or semi-temporary (such as SCRATCH) disk spaces. Note that VTune will suffix the directory name with the name of the execution node: there will be as many directories as there are execution nodes.

Attention
  • If the directory already exists, the execution will fail to avoid overwriting previous results. Therefore, you must ensure, before launching the execution, that the directory specified by the option -r does not exist.
  • By default, VTune uses the system directory /tmp which is very limited in size to create the data. To ensure that VTune has a larger workspace, it is advisable to define the TMPDIR variable. For example, to use the SCRATCH directory:
export TMPDIR=$SCRATCH

Here is an example of a submission job for a purely MPI code initiating 4 MPI processes:

job_vtune_mpi.slurm
#!/bin/bash
#SBATCH --job-name=vtune_run_mpi # Nom arbitraire du travail Slurm
#SBATCH --output=%x.%j.out # Fichier de sortie standard du travail
#SBATCH --error=%x.%j.err # Fichier d'erreur standard du travail
#SBATCH --ntasks=4 # Nombre de processus MPI demandes
# /!\ Attention, la ligne suivante est trompeuse mais dans le vocabulaire
# de Slurm "multithread" fait bien référence à l'hyperthreading.
#SBATCH --hint=nomultithread # 1 processus MPI par coeur physique (pas d'hyperthreading)
# La ligne ci-dessous peut ĂȘtre dĂ©commentĂ©e pour passer en mode exclusif et disposer
# d'un accÚs complet aux compteurs matériels (cf. les bonnes pratiques pour le profilage)
##SBATCH --exclusive -C prof
#SBATCH --time=0:20:00 # Temps du job hh:mm:ss (0h20mn ici)

## Choix d'un environnement Intel disposant de VTune
module load intel-all/19.0.4

## Echo des commandes
set -x

## Execution dans le SCRATCH
cd $SCRATCH

# Selection d'un repertoire different a chaque run.
export DIR_PROF=profiling.$SLURM_NTASKS.$SLURM_JOB_ID

# Pour ne pas utiliser le /tmp
export TMPDIR=$SCRATCH

## Echantillonnage de type hotspots du binaire ./my_bin_exe avec
## ecriture des resultats dans le repertoire $WORKDIR/$DIR_PROF
## et limitation de la quantite de donnees collectees (ici 4000 MB)
srun amplxe-cl -q -collect hotspots -r $DIR_PROF -data-limit=4000 -- ./my_bin_exe

Analysis/Visualisation of Results​

The analysis of the results is done using the graphical interface amplxe-gui by indicating the name of the directory containing the files written during the execution of amplxe-cl.

To use the same version 19.0.4 as in the job example above, you should use the following commands:

$ module load intel-vtune/19.0.4
$ amplxe-gui resultdir
Attention

The Intel environment used at this stage must be the same as the one used for sampling.

Documentation​

Your opinion matters!

To give your feedback, report an error, or suggest an improvement, click here:

quick anonymous questionnaire

This questionnaire is temporary and will take less than a minute, so take the opportunity!