Jean Zay: VTune

Introduction

The VTune Amplifier is a performance analysis tool which is part of the Intel Parallel Studio XE suite.
It has a graphical user interface (GUI) but can also be used from the command line.

Installed versions

  • Several versions of VTune are accessible depending on the versions of Intel Parallel Studio XE installed on the machine.
  • VTune Amplifier is perfectly adapted to the MPI + OpenMP hybrid applications.

Usage

The module command allows access to the different versions of VTune.

To display the different versions available, you have two possibilities:

  • Via the corresponding modulefiles
    $ module av intel-vtune
      intel-vtune/18.0.1          intel-vtune/18.0.5          intel-vtune/19.0.2
      intel-vtune/19.0.4(default) intel-vtune/19.0.5

    or

  • Via the Intel environments
    $ module av intel-all
      intel-all/16.0.4          intel-all/18.0.1          intel-all/18.0.5          intel-all/19.0.2
      intel-all/19.0.4(default) intel-all/19.0.5

For example, to use the VTune 19.0.4 version, you can use either:

  • The corresponding modulefile
    $ module load intel-vtune/19.0.4

    or

  • The corresponding Intel environment
    $ module load intel-all/19.0.4

Note that the intel-all/XX.Y.Z modules contain the complete Intel environments including the compilers, MKL mathematical library, the MPI library and the code analysis tools.

After the chosen module is loaded, the usage of VTune is done in two steps:

  1. Execution of your program via Vtune (in the command line).
  2. Visualisation/Analysis of results with the graphical interface.

Execution

The easiest way to launch the execution is through the command line of your Slurm scripts: Simply add the amplxe-cl command just before the name of your execution file (with the proper options for selecting the type of sampling to be implemented).

Important: We strongly recommend that you use the same Intel environment version for the compilation, execution and analysis of your code.

Comments:

  • For the amplxe-cl command to be recognized, the appropriate modulefile must first be loaded (see above) either in the environment of your interactive session or in your job environment.
  • To obtain help concerning the options of the amplxe-cl command, simply type amplxe-cl --help.

Important: The different VTune versions do not use the Intel sampling drivers (they are not installed on the compute nodes), but rather, the Perf tool. Consequently, certain analyses, activated via the -collect option, are not available on Jean Zay: For example, the -collect memory-access option which would allow effectuating measurements of the memory bandwidth.

During the execution, VTune writes its files in a directory whose name can be specified via the option -r directory_name. Remember to specify a directory located in one of your permanent disk spaces (such as the WORK) or a temporary space (such as the SCRATCH). Note that VTune will add a suffix to the directory name which corresponds to the name of the execution node: As such, there will be as many directories as there are execution nodes.

Caution: If the directory already exists, the execution will fail in order to avoid overwriting the preceding results. Before launching the execution, therefore, you must make sure that the directory specified with the -r option does not already exist .

Caution: by default, VTune uses the system directory /tmp which is very limited in size to manufacture the data. In order for VTune to have a larger workspace, it is advisable to define the TMPDIR variable. For example, to use the SCRATCH directory:

export TMPDIR=$SCRATCH

Here is a job submission example for a purely MPI code initiating 4 MPI processes:

job_vtune_mpi.slurm
#!/bin/bash
#SBATCH --job-name=vtune_run_mpi    # Arbitrary name of Slurm job
#SBATCH --output=%x.%j.out          # Standard job output file
#SBATCH --error=%x.%j.err           # Standard job error file
#SBATCH --ntasks=4                  # Number of requested MPI processes
# /!\ Caution: In Slurm vocabulary, "multithread" (in following line) refers to hyperthreading.
#SBATCH --hint=nomultithread    # 1 MPI process per physical core (no hyperthreading)
#SBATCH --time=0:20:00              # Job time hh:mm:ss (0h20mn here)
 
## Choice of Intel environment having VTune
module load intel-all/19.0.4
 
## Echo of commands
set -x
 
## Execution in the SCRATCH
cd $SCRATCH
 
# Selection of a different directory at each run
export DIR_PROF=profiling.$SLURM_NTASKS.$SLURM_JOB_ID
 
# To not use the /tmp
export TMPDIR=$SCRATCH
 
## Hotspot type sampling of the ./my_bin_exe binary with 
## writing results in the $WORKDIR/$DIR_PROF directory
## and limiting the quantity of collected data (here 4000 MB)
srun amplxe-cl -q -collect hotspots -r $DIR_PROF -data-limit=4000 -- ./my_bin_exe

Analysis/visualisation of results

The results analysis is done via the amplxe-gui graphical interface by specifying to it the name of the directory which contains the files written during the execution of amplxe-cl.

To use the same 19.0.4 version as in the job example above, you must use the following commands:

$ module load intel-vtune/19.0.4
$ amplxe-gui resultdir

Important: The Intel environment used in this step must be the same as the one used to do the sampling.

Documentation