Jean Zay: SCALASCA

Description

SCALASCA is a graphical performance analysis tool for parallel applications which was developed by JSC (Jülich Supercomputing Centre). It analyses application behaviours and easily identifies the highly time-consuming parts. This tool is particularly adapted to the study of parallel executions.

Installed versions

  • SCALASCA 2.4 with the GCC 8.2.0 compiler
  • SCALASCA 2.4 with the GCC 9.1.0 compiler
  • SCALASCA 2.4 with the Intel 2019.4 compiler

Important: SCALASCA is perfectly adapted to the MPI + multithread/OpenMP hybrid applications up to the level of MPI_THREAD_MULTIPLE for a profile and MPI_THREAD_FUNNELED for a trace.

Usage

The module command provides access to SCALASCA. Correct usage of SCALASCA also requires loading Score-P. To work with this tool, therefore, it is necessary to execute the following commands:

$ module load scalasca
$ module load scorep

Using SCALASCA is done in three steps:

  1. Instrumentation of the application
  2. Execution of the instrumented application
  3. Analysis/visualisation of the results

These three steps are explained in detail below:

Instrumentation

SCALASCA functions by modifying your application to insert its own measurement procedures during compilation.

Each application can be instrumented either automatically or manually. We will only discuss the automatic instrumentation of a pure MPI application (without OpenMP) in this document. To know the manual instrumentation procedure, refer to the SCALASCA PDF manual.

To instrument your application automatically, you simply need to add the skin command in front of the compiler name (leaving one blank space between them):

$ skin mpif90 my_code.f90

Important: Performance is measured between the calls to MPI_Init and MPI_Finalize. Any other operations will not be included in the measurements.

Caution: Usage of Scalasca brings additional costs in execution time, memory and disk occupation.

Execution

The execution is done by adding the scan command in front of command srun in your Slurm scripts.

A profile is a summary of the execution: By default, only one profile is collected.

To obtain a full trace of the events and not just a single profile, you simply need to use the -t option. This option is very useful because it allows Scalasca to identify various performance problems which will be highlighted during the visualisation of results. Be aware, however, that this option greatly increases the disk space needed.

At each execution, SCALASCA writes its files in a directory whose name is generated in the following way:

scorep_NOMAPPLI_RANKSPERNODEpNPROCxNTHR_TYPE

with

  • NOMAPPLI: name of the executable file
  • RANKSPERNODE: number of processes per node
  • NPROC: total number of processes
  • NTHR: number of threads per process
  • TYPE: sum for a profil, trace for a trace

Caution: If the directory already exists, the execution will fail to prevent overwriting the preceding results. Therefore, before generating a directory, you must make sure that it does not already exist.

The following is an example of a job submission:

scalasca_mpi.slurm
#!/bin/bash
#SBATCH --job-name=scalasca_run    # Name of the job
#SBATCH --ntasks=40                # Total number of MPI processes
#SBATCH --ntasks-per-node=40       # Number of MPI processes per node
# /!\ Caution: In Slurm vocabulary, "multithread" in the following line refers to 
# hyperthreading.
#SBATCH --hint=nomultithread       # 1 thread per physical core 
# (no hyperthreading)
#SBATCH --time=01:00:00            # Maximum execution time requested (HH:MM:SS)
#SBATCH --output=scalasca%j.out    # Name of the output file
#SBATCH --error=scalasca%j.out     # Name of the error file (here common with the output)
 
# go in the submission directory
cd ${SLURM_SUBMIT_DIR}
 
# clean out modules loaded in interactive and inherited by default
module purge
 
# load modules
module load ...
module load scalasca
module load scorep
 
# echo of launched commands
set -x
 
# execution of the code
scan srun ./my_appli my_args

Submit this script via the sbatch command:

$ sbatch scalasca_mpi.slurm

Analysis/visualisation of results

The results analysis is done with the help of the Cube graphical interface. To launch it, you simply need to type the following commands in interactive mode :

$ module load scalasca
$ module load scorep
$ module load cube
$ square repertoire_sortie_scalasca/profile.cubex

The interface is divided into three parts: On the left are presented the different measurements effectuated; in the middle is the call tree; on the right the topology is displayed.

By editing the left panel to include or exclude certain entries, it is possible to see a general synthesis of the performance. The editing selections made in this panel are recovered by the 2 other panels which allows identifying the application's significantly time-consuming parts. If the execution is carried out in trace mode, SCALASCA can identify certain behaviours causing performance loss (sending messages in disorder, load imbalance, …).

Because the visualisation passes by a graphical application, it is not always practical to use SCALASCA directly from a Jean Zay front end. To avoid this problem, it is possible to install the Cube visualisation tool (downloadable from the SCALASCA official Web site) on a PC under Linux or any UNIX machine, and run the graphical interface from this machine.

Documentation