This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.
VTune
We invite you to consult the best practices for code profiling for general advice on performance analysis on Jean Zay.
Descriptionâ
VTune Amplifier is a performance analysis tool that is part of the Intel Parallel Studio XE suite. It has a graphical user interface (GUI) but can also be used from the command line.
Installed Versionsâ
- Several versions of VTune are available depending on the versions of Intel Parallel Studio XE installed on the machine.
- VTune Amplifier is perfectly suited for hybrid MPI + OpenMP applications.
Usageâ
The module command allows access to the various versions of VTune.
To display the available versions, you have two options:
- via the corresponding modulefiles:
$ module av intel-vtune
intel-vtune/18.0.1 intel-vtune/18.0.5 intel-vtune/19.0.2
intel-vtune/19.0.4(default) intel-vtune/19.0.5
- or via the Intel environments:
$ module av intel-all
intel-all/16.0.4 intel-all/18.0.1 intel-all/18.0.5 intel-all/19.0.2
intel-all/19.0.4(default) intel-all/19.0.5
To use, for example, version 19.0.4 of VTune, you can use:
- either the corresponding modulefile:
$ module load intel-vtune/19.0.4
- or the corresponding Intel environment:
$ module load intel-all/19.0.4
Note that the intel-all/XX.Y.Z modules contain the complete Intel environments with compilers, the MKL math library, the MPI library, and code analysis tools.
Once the appropriate module is loaded, using VTune involves two steps:
- Execution of your program via VTune (from the command line);
- Visualisation/Analysis of the results with the graphical interface.
Executionâ
The simplest way is to launch the execution from the command line in your Slurm scripts: just add the command amplxe-cl right before the name of your executable (with options to select the type of sampling to be performed).
We strongly recommend using the same version of the Intel environment for compiling, executing, and analysing your code.
Remarks:
- For the command
amplxe-clto be recognised, the appropriate modulefile must have been loaded beforehand (see above) either in the environment of your interactive session or in the environment of your job. - To get help regarding the options of the command
amplxe-cl, simply typeamplxe-cl --help.
The various versions of VTune do not use the Intel sampling drivers because they are not installed on the compute nodes, but the tool Perf. As a result, some analyses, which can be enabled via the option -collect, are not available on Jean Zay: for example, the option -collect memory-access which allows memory bandwidth measurements.
During execution, VTune writes its files to a directory whose name can be specified via the option -r nom_repertoire. Make sure to specify a directory located in one of your permanent (such as WORK) or semi-temporary (such as SCRATCH) disk spaces. Note that VTune will suffix the directory name with the name of the execution node: there will be as many directories as there are execution nodes.
- If the directory already exists, the execution will fail to avoid overwriting previous results. Therefore, you must ensure, before launching the execution, that the directory specified by the option
-rdoes not exist. - By default, VTune uses the system directory
/tmpwhich is very limited in size to create the data. To ensure that VTune has a larger workspace, it is advisable to define the TMPDIR variable. For example, to use the SCRATCH directory:
export TMPDIR=$SCRATCH
Here is an example of a submission job for a purely MPI code initiating 4 MPI processes:
#!/bin/bash
#SBATCH --job-name=vtune_run_mpi # Nom arbitraire du travail Slurm
#SBATCH --output=%x.%j.out # Fichier de sortie standard du travail
#SBATCH --error=%x.%j.err # Fichier d'erreur standard du travail
#SBATCH --ntasks=4 # Nombre de processus MPI demandes
# /!\ Attention, la ligne suivante est trompeuse mais dans le vocabulaire
# de Slurm "multithread" fait bien référence à l'hyperthreading.
#SBATCH --hint=nomultithread # 1 processus MPI par coeur physique (pas d'hyperthreading)
# La ligne ci-dessous peut ĂȘtre dĂ©commentĂ©e pour passer en mode exclusif et disposer
# d'un accÚs complet aux compteurs matériels (cf. les bonnes pratiques pour le profilage)
##SBATCH --exclusive -C prof
#SBATCH --time=0:20:00 # Temps du job hh:mm:ss (0h20mn ici)
## Choix d'un environnement Intel disposant de VTune
module load intel-all/19.0.4
## Echo des commandes
set -x
## Execution dans le SCRATCH
cd $SCRATCH
# Selection d'un repertoire different a chaque run.
export DIR_PROF=profiling.$SLURM_NTASKS.$SLURM_JOB_ID
# Pour ne pas utiliser le /tmp
export TMPDIR=$SCRATCH
## Echantillonnage de type hotspots du binaire ./my_bin_exe avec
## ecriture des resultats dans le repertoire $WORKDIR/$DIR_PROF
## et limitation de la quantite de donnees collectees (ici 4000 MB)
srun amplxe-cl -q -collect hotspots -r $DIR_PROF -data-limit=4000 -- ./my_bin_exe
Analysis/Visualisation of Resultsâ
The analysis of the results is done using the graphical interface amplxe-gui by indicating the name of the directory containing the files written during the execution of amplxe-cl.
To use the same version 19.0.4 as in the job example above, you should use the following commands:
$ module load intel-vtune/19.0.4
$ amplxe-gui resultdir
The Intel environment used at this stage must be the same as the one used for sampling.
Documentationâ
- The Getting Started Guide for VTune at Intel
- All the documentation on Intel's website