This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.
SCALASCA
We invite you to consult the best practices for code profiling for general advice on performance analysis on Jean Zay.
Descriptionâ
SCALASCA is a graphical performance analysis tool for parallel applications; it was developed by the JSC (JĂŒlich Supercomputing Centre). It allows you to analyse the behaviour of an application and easily identify its critical parts. This tool is particularly suitable for studying massively parallel executions.
Installed versionsâ
The module command provides access to the various versions of HPCToolkit and HPCViewer.
To display the available versions:
$ module avail scalasca
scalasca/2.4-mpi scalasca/2.5-mpi scalasca/2.6-mpi
SCALASCA is perfectly suited to hybrid MPI and multithreaded/OpenMP applications up to level MPI_THREAD_MULTIPLE for a profile and up to level MPI_THREAD_FUNNELED for a trace.
Usageâ
The module command provides access to SCALASCA; appropriate use of SCALASCA also requires loading Score-P:
Before working with this tool, you must execute the following commands:
$ module load scalasca
$ module load scorep
Using SCALASCA involves three steps:
- Instrumentation of the application;
- Execution of the instrumented application;
- Analysis/visualisation of the results.
Instrumentationâ
SCALASCA works by modifying your application to insert its own measurement procedures during compilation.
Any application can be instrumented either automatically or manually. Only the automatic instrumentation of a "pure" MPI application (without OpenMP) will be covered in this document. For the manual procedure, refer to the SCALASCA PDF manual.
To automatically instrument your application, simply add the command skin (leaving a space) before the compiler name:
$ skin mpif90 my_code.f90
- Using SCALASCA incurs additional costs in execution time, memory usage, and disk space.
- Performance is measured between calls to
MPI_InitandMPI_Finalize: any operation performed outside these calls will not be taken into account.
Executionâ
Execution is done by adding the command scan just before the command srun in your Slurm scripts.
A profile is a summary of the execution: by default, only a profile is collected.
To obtain a complete trace of events rather than a simple profile, use the option -t. This option is very useful as it allows SCALASCA to identify various performance issues that will be highlighted during visualisation.
This option significantly increases the disk space requirements of an execution.
For each execution, SCALASCA writes its files to a directory whose name is generated as follows:
scorep_NOMAPPLI_RANKSPERNODEpNPROCxNTHR_TYPE
with
NOMAPPLI, the name of the executable,RANKSPERNODE, the number of processes per node,NPROC, the total number of processes,NTHR, the number of threads per process,TYPE,sumfor a profile,tracefor a trace.
If the directory already exists, the execution will fail to avoid overwriting previous results. You must ensure that it does not exist beforehand.
Here is an example of a submission job:
#!/bin/bash
#SBATCH --job-name=scalasca_run # nom du job
#SBATCH --ntasks=40 # Nombre total de processus MPI
#SBATCH --ntasks-per-node=40 # Nombre de processus MPI par noeud
# /!\ Attention, la ligne suivante est trompeuse mais dans le vocabulaire
# de Slurm "multithread" fait bien référence à l'hyperthreading.
#SBATCH --hint=nomultithread # 1 thread par coeur physique (pas d'hyperthreading)
# La ligne ci-dessous peut ĂȘtre dĂ©commentĂ©e pour passer en mode exclusif et disposer
# d'un accÚs complet aux compteurs matériels (cf. les bonnes pratiques pour le profilage)
##SBATCH --exclusive -C prof
#SBATCH --time=01:00:00 # Temps dâexĂ©cution maximum demandĂ© (HH:MM:SS)
#SBATCH --output=scalasca%j.out # Nom du fichier de sortie
#SBATCH --error=scalasca%j.out # Nom du fichier d'erreur (ici commun avec la sortie)
# on se place dans le répertoire de soumission
cd ${SLURM_SUBMIT_DIR}
# nettoyage des modules charges en interactif et hérites par défaut
module purge
# chargement des modules
module load ...
module load scalasca
module load scorep
# echo des commandes lancées
set -x
# exécution du code
scan srun ./my_appli my_args
Submission of this script via the command sbatch:
$ sbatch scalasca_mpi.slurm
Analysis/visualisation of resultsâ
The analysis of the results is done using the graphical interface Cube. To launch it, simply type the following commands interactively:
$ module load scalasca
$ module load scorep
$ module load cube
$ square repertoire_sortie_scalasca/profile.cubex
The interface is divided into three panels:
- On the left, the different measurements made;
- In the middle, the call tree;
- On the right, the topology.
By expanding or reducing the different entries in the left panel, you can get a more or less synthetic view of the performance. The choices made in this panel are reflected in the other two and thus allow you to identify the critical points of the application.
If the execution was done in trace mode, SCALASCA can identify certain behaviours responsible for performance losses (messages sent out of order, load imbalance...).
As the visualisation is done through a graphical application, it is not always convenient to use it directly from a Jean Zay login node. To circumvent this problem, it is possible to install the Cube visualisation tool (downloadable from the official SCALASCA website) on a Linux PC or any UNIX machine to run it.