Turing : Memory measurement tools

Introduction

On Turing, the memory limit is 16 GiB per node. This limit is divided by the value specified in the parameter --ranks-per-node. Thus, if you launch 32 MPI jobs per node, the memory available for each process is a maximum of 512 MiB.

The idrmemmpi module

The idrmemmpi library allows displaying the maximum consumption of an MPI program for the duration of its execution. This display takes place at the moment when the execution stops. You must load the idrmemmpi module before the compilation of your program:

$ module load idrmemmpi
$ mpixlf90_r main.f90 -o main

No change is necessary in the submission script. The elapsed time, the maximum memory consumption (HeapMax) as well as the process rank on which this maximum is reached are obtained on the standard output at the end of the execution.

IdrisMemMPI v1.4 : elapsed = 0.32 s, HeapMax = 497 mB on rank 1  

If you wish to display the actual consumption at a precise place, you just insert a call to the idr_print_mem routine.

  CALL idr_print_mem()

Using the libhpcidris library

IDRIS has developed the libhpcidris library which allows determining the memory utilisation of an application by instrumenting it.

A module is available. It allows positioning the paths for the library, the include files and the Fortran modules.

$ module load libhpcidris

The following is an example of what you obtain on the standard output when using the subroutine HPCIDRIS_F03_MPI_memusage_print in Fortran or HPCIDRIS_MPI_memusage_print in C :

-------------------------------------------------------------------------------------
                         MEMORY USAGE (libhpcidris version 4.0)
                       (C)2009-2013 Philippe WAUTELET, IDRIS-CNRS
-------------------------------------------------------------------------------------
                       |  Max (position)  |  Min (position)  |  Average  |    Sum
-------------------------------------------------------------------------------------
Used by application    |  800.3MiB( 1023) |   19.1MiB(    1) |  410.1MiB |  410.1GiB
Available              | 1011.0MiB(    2) |  147.2MiB( 1008) |  613.1MiB |  613.1GiB
-------------------------------------------------------------------------------------
Text                   |    6.3MiB(    0) |    6.3MiB(    0) |    6.3MiB |  401.1MiB
Data                   |  946.1KiB(    0) |  946.1KiB(    0) |  946.1KiB |  946.1MiB
BSS                    |  901.9KiB(    0) |  901.9KiB(    0) |  901.9KiB |  901.9MiB
Stack                  |   19.4KiB(    0) |   19.4KiB(    0) |   19.4KiB |   19.4MiB
Heap                   |  792.3MiB( 1023) |   11.0MiB(    1) |  402.0MiB |  402.0GiB
Reserved by kernel     |   16.0MiB(    0) |   16.0MiB(    0) |   16.0MiB |    1.0GiB
Shared                 |   64.0MiB(    0) |   64.0MiB(    0) |   64.0MiB |    4.0GiB
Max heap               |  796.3MiB( 1023) |   15.0MiB(    1) |  406.0MiB |  406.0GiB
-------------------------------------------------------------------------------------

Measurements made and their explanations

Memory used by the application

Gives an estimation of the memory used by the various processes of the executed application. It corresponds to the sum of the text, data, BSS, stack and heap zones.

Available memory

Estimation of the amount of memory remaining for each process. It corresponds to the total memory to which a process has access minus the memory used.

Attention : The memory actually available may be larger because the heap can contain holes.

The total memory to which a process has access is not necessarily the same for all the processes because a part of the memory space of each node is reserved for particular usages (zones reserved by kernel, shared or text).

Text, data and BSS zones

Text, data and BSS zones correspond respectively to the memory used to store the executable code, the non-zero initialised global variables and the zero, or non-initialised, global variables at the start of the application.

These zones are found in their particular memory pages.

Comment: The text zone is shared by all the processes running on the same compute node.

Stack

The stack is the memory which typically corresponds to what is allocated by the function or subroutine calls and their non-persistant local variables.

Heap

The heap is the memory which corresponds to dynamic memory allocations (allocate calls in Fortran and calloc/malloc in C).

The size given is the total heap size. However, this can contain holes (memory fragmentation phenomenon). It is possible, therefore, to have available memory at the interior of this space.

Note: The MPI library uses the heap. If the values obtained seem to be overestimated, this is probably the reason.

Memory reserved by the kernel

The memory reserved by the kernel corresponds to what the CNK operating system needs. This is always 16 MiB per compute node.

Shared memory

The shared memory is a memory space which can be used with the SHMEM programming approach. The default memory is 64 MiB and can be modified with the environment variable BG_SHAREDMEMSIZE (to specify with the - -envs option of the runjob command; the value is given in megabytes).

The shared memory is available for all the compute node processes.

Max heap

The value of the maximum heap size reached before or at the moment of the call. This can give an idea of the maximum memory which was used by the application. (Attention: The stack is not taken into account in the Max heap.)

Utilisation of libhpcidris in a Fortran program

All the functionalities for measuring occupied memory are available by loading the hpcidris Fortran module. In your Fortran source files, you just need to add the following line in all the places where you use this library:

use hpcidris

The available subroutines are the following:

  • HPCIDRIS_F03_MPI_memusage_print(comm,level): Displays memory occupation information for all the processes. It requires 2 arguments: The communicator to be used (this is a collective communication and must be called by all its members) and the level of detail:
    • 0: Summary with max, min, average and total values and the process ranks having the max and min values
    • 1: Details of memory occupation according to allocation type (text, data, BSS, heap, …)
    • 2: Details for each process
  • HPCIDRIS_F03_memusage_print: Details the memory usage of the process which calls it. (Attention: This subroutine is not MPI and each process calling it will display its values, resulting in a risk of obtaining mixed outputs.)
  • HPCIDRIS_F03_memusage_get(mu): Recovers memory usage in a HPCIDRIS_F03_memusage data structure (see example below).
  • HPCIDRIS_F03_memusage_print_at_exit(): Displays the memory usage of the process which calls it when the process finishes running (if it finishes in a regular way). This call can be made at any time in your application. (Attention: This subroutine is not MPI and each process calling it will display its values, resulting in a risk of obtaining mixed outputs.)

The HPCIDRIS_F03_memusage data structure, obtained by the subroutine HPCIDRIS_F03_memusage_get, is detailed here:

  type :: HPCIDRIS_F03_memusage
    integer         :: ppn
    integer(kind=8) :: totalmem
    integer(kind=8) :: text
    integer(kind=8) :: data
    integer(kind=8) :: bss
    integer(kind=8) :: stack
    integer(kind=8) :: heap
    integer(kind=8) :: reserved
    integer(kind=8) :: persist
    integer(kind=8) :: guard
    integer(kind=8) :: shared
    integer(kind=8) :: stackavail
    integer(kind=8) :: heapavail
    integer(kind=8) :: totalused
    integer(kind=8) :: heapmax
    !integer(kind=8) :: maxrss
  end type HPCIDRIS_F03_memusage

Utilisation of libhpcidris in a C program

All the functionalities for measuring occupied memory are available by including the hpcidris.h or hpcidris_mem.h file. In your C program, you just need to add the following line everywhere you use this library:

#include ''hpcidris.h''

The available functions are the following :

  • HPCIDRIS_MPI_memusage_print(comm,level): Displays memory occupation information for all the processes. It requires 2 arguments : The communicator to be used (this is a collective communication and must be called by all its members) and the level of detail:
    • 0: Summary with max, min, average and total values and the process ranks having the max and min values
    • 1: Details of occupation according to allocation type (text, data, BSS, heap, …)
    • 2: Details for each process
  • HPCIDRIS_memusage_print: Details the memory usage of the process which calls it. (Attention: This subroutine is not MPI and each process calling it will display its values, resulting in a risk of obtaining mixed outputs.)
  • HPCIDRIS_memusage_get(mu): Recovers memory usage in a struct HPCIDRIS_memusage data structure (see example below).
  • HPCIDRIS_memusage_print_at_exit() : Displays the memory usage of the process which calls it when the process finishes running (if it finishes in a regular way). This call can be made at any time in your application. (Attention: This subroutine is not MPI and each process calling it will display its values, resulting in a risk of obtaining mixed outputs.)

The struct HPCIDRIS_memusage data structure, obtained by the function HPCIDRIS_memusage_get, is detailed here:

struct HPCIDRIS_memusage {
  int ppn;              // Processes per node

  long long totalmem;   // Total memory

  long long text;       // Text zone
  long long data;       // Data
  long long bss;        // BSS
  long long stack;      // Stack
  long long heap;       // Heap usage

  long long reserved;   // Reserved by kernel
  long long persist;    // Persistent memory
  long long guard;      // Heap guardpage
  long long shared;     // Shared memory (shmem)

  long long stackavail; // Stack available to the process
  long long heapavail;  // Heap available to the process
  long long totalused;  // Used memory (without reserved by kernel)

  long long heapmax;    // Maximum heap
};