Scalasca is a graphical performance analysis tool for parallel applications. It was developed by JSC (Jülich Supercomputing Centre). It analyses application behaviours and easily identifies the highly time-consuming parts. This tool is perfectly adapted to the study of parallel executions.

Versions installed

  • SCALASCA 1.4.3
  • SCALASCA 2.0
  • SCALASCA 2.1 (default version)


  • Scalasca is perfectly adapted to the MPI and multithread/OpenMP hybrid applications up to the level of MPI_THREAD_MULTIPLE for a profile and MPI_THREAD_FUNNELED for a trace.
  • Files generated by SCALASCA 1.4.3 and 2.x are not compatible. Traces and profiles have to be analysed with the versions that created them.


The module command provides access to Scalasca. Before working with Scalasca, therefore, it is necessary to execute the following command:

module load scalasca

Using Scalasca consists of three steps:

  1. Application instrumentation
  2. Execution of the instrumented application
  3. Analysis/visualisation of the results


Scalasca functions by modifying your application in order to insert the Scalasca measurement procedures.

All applications can be instrumented either automatically or manually. We will only discuss the automatic instrumentation of a pure MPI application (without OpenMP) in this document. For the manual procedure, refer to the Scalasca manual.

To instrument your application, you just need to add the command skin (or scalasca -instrument) in front of the compiler name (leaving one blank space between them):

skin mpiifort my_code.f95


  • The performance between the calls to MPI_Init and MPI_Finalize is measured. Any other operations will not be included in the measurement.
  • The usage of Scalasca brings additional costs in execution time, memory and disc occupation.


The execution is done by adding the command scan poe (or scalasca -analyze poe) just in front of the name of your executable file in your LoadLeveler scripts. For the command to be recognized, you must also have loaded the Scalasca module in your job.

By default, only one profile is collected: A profile is a summary of the execution.

To obtain a trace of the events and not a simple profile, you just need to use the -t option. Be aware, however, that this option greatly increases the need for disc space. This option is very useful because it allows Scalasca to identify various performance problems which will then be highlighted during the visualisation of results.

Each time the instrumented application is executed, the files will be written in a directory; the name of this directory is generated in the following manner:

  • APPNAME = the name of the executable file
  • NPROC = the number of processes (In OpenMp or hybrid, this equals the number of processes x number of threads.)
  • TYPE = sum for a profile, trace for a trace

Attention: If the directory already exists, the execution will fail (prevents overwriting the preceding results). Therefore, before generating a directory, you must make sure that the directory does not already exist.

The following is an example of a job submission:

# @ job_name = scalasca_run
# Standard output of the job
# @ output = $(job_name).$(jobid)
# Standard error of the job
# @ error = $(output)
# @ job_type = parallel
# @ total_tasks = 8
# Elapsed time maximum request
# @ wall_clock_limit = 1:00:00
# @ queue

set -x

module load scalasca
scan -t poe ./my_appli

Analysis/visualization of results

The results analysis is done with the help of the graphical interface square (or scalasca -examine). To launch this graphical interface, you just need to type:

module load scalasca
square repertoire_sortie_scalasca

The interface is divided into three sections: On the left are presented the different measurements carried out, in the middle the call tree is given, and on the right is the topology.

By increasing or reducing the different entries of the left panel, it is possible to have a more or less synthetic view of the performance. The choices made in this panel are carried over to the 2 other panels; this allows for the identification of the application's significantly time-consuming parts. If the execution is carried out in trace mode, Scalasca can identify certain behaviour sources causing performance loss (sending messages in disorder, load imbalance, …).

Because the visualization is done via a graphical application, it is not always practical to use this directly from Ada. To get around this problem, it is possible to install the visualization tool CUBE (downloadable on the official SCALASCA site) on a PC using Linux (or any UNIX machine).