⚠ INFORMATION
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.

SCALASCA

We invite you to consult the best practices for code profiling for general advice on performance analysis on Jean Zay.

Description

SCALASCA is a graphical performance analysis tool for parallel applications; it was developed by the JSC (Jülich Supercomputing Centre). It allows you to analyse the behaviour of an application and easily identify its critical parts. This tool is particularly suitable for studying massively parallel executions.

Installed versions

The module command provides access to the various versions of HPCToolkit and HPCViewer.

To display the available versions:

$ module avail scalasca
scalasca/2.4-mpi  scalasca/2.5-mpi  scalasca/2.6-mpi

Attention

SCALASCA is perfectly suited to hybrid MPI and multithreaded/OpenMP applications up to level MPI_THREAD_MULTIPLE for a profile and up to level MPI_THREAD_FUNNELED for a trace.

Usage

The module command provides access to SCALASCA; appropriate use of SCALASCA also requires loading Score-P:

Before working with this tool, you must execute the following commands:

$ module load scalasca
$ module load scorep

Using SCALASCA involves three steps:

Instrumentation of the application;
Execution of the instrumented application;
Analysis/visualisation of the results.

Instrumentation

SCALASCA works by modifying your application to insert its own measurement procedures during compilation.

Any application can be instrumented either automatically or manually. Only the automatic instrumentation of a "pure" MPI application (without OpenMP) will be covered in this document. For the manual procedure, refer to the SCALASCA PDF manual.

To automatically instrument your application, simply add the command skin (leaving a space) before the compiler name:

  $ skin mpif90 my_code.f90

Attention

Using SCALASCA incurs additional costs in execution time, memory usage, and disk space.
Performance is measured between calls to MPI_Init and MPI_Finalize: any operation performed outside these calls will not be taken into account.

Execution

Execution is done by adding the command scan just before the command srun in your Slurm scripts.

A profile is a summary of the execution: by default, only a profile is collected.

To obtain a complete trace of events rather than a simple profile, use the option -t. This option is very useful as it allows SCALASCA to identify various performance issues that will be highlighted during visualisation.

Attention

This option significantly increases the disk space requirements of an execution.

For each execution, SCALASCA writes its files to a directory whose name is generated as follows:

scorep_NOMAPPLI_RANKSPERNODEpNPROCxNTHR_TYPE

with

NOMAPPLI, the name of the executable,
RANKSPERNODE, the number of processes per node,
NPROC, the total number of processes,
NTHR, the number of threads per process,
TYPE, sum for a profile, trace for a trace.

Attention

If the directory already exists, the execution will fail to avoid overwriting previous results. You must ensure that it does not exist beforehand.

Here is an example of a submission job:

scalasca_mpi.slurm
#!/bin/bash
#SBATCH --job-name=scalasca_run    # nom du job
#SBATCH --ntasks=40                # Nombre total de processus MPI
#SBATCH --ntasks-per-node=40       # Nombre de processus MPI par noeud
# /!\ Attention, la ligne suivante est trompeuse mais dans le vocabulaire
# de Slurm "multithread" fait bien référence à l'hyperthreading.
#SBATCH --hint=nomultithread       # 1 thread par coeur physique (pas d'hyperthreading)
# La ligne ci-dessous peut être décommentée pour passer en mode exclusif et disposer
# d'un accès complet aux compteurs matériels (cf. les bonnes pratiques pour le profilage)
##SBATCH --exclusive -C prof
#SBATCH --time=01:00:00            # Temps d’exécution maximum demandé (HH:MM:SS)
#SBATCH --output=scalasca%j.out    # Nom du fichier de sortie
#SBATCH --error=scalasca%j.out     # Nom du fichier d'erreur (ici commun avec la sortie)

# on se place dans le répertoire de soumission
cd ${SLURM_SUBMIT_DIR}

# nettoyage des modules charges en interactif et hérites par défaut
module purge

# chargement des modules
module load ...
module load scalasca
module load scorep

# echo des commandes lancées
set -x

# exécution du code
scan srun ./my_appli my_args

Submission of this script via the command sbatch:

$ sbatch scalasca_mpi.slurm

Analysis/visualisation of results

The analysis of the results is done using the graphical interface Cube. To launch it, simply type the following commands interactively:

$ module load scalasca
$ module load scorep
$ module load cube
$ square repertoire_sortie_scalasca/profile.cubex

The interface is divided into three panels:

On the left, the different measurements made;
In the middle, the call tree;
On the right, the topology.

By expanding or reducing the different entries in the left panel, you can get a more or less synthetic view of the performance. The choices made in this panel are reflected in the other two and thus allow you to identify the critical points of the application.

If the execution was done in trace mode, SCALASCA can identify certain behaviours responsible for performance losses (messages sent out of order, load imbalance...).

As the visualisation is done through a graphical application, it is not always convenient to use it directly from a Jean Zay login node. To circumvent this problem, it is possible to install the Cube visualisation tool (downloadable from the official SCALASCA website) on a Linux PC or any UNIX machine to run it.

Description​

Installed versions​

Usage​

Instrumentation​

Execution​

Analysis/visualisation of results​

Documentation​

Description

Installed versions

Usage

Instrumentation

Execution

Analysis/visualisation of results

Documentation