FPMPI2: MPI Profiling Tool

Description

FPMPI2 is a profiling library of MPI calls developed by ANL. Version 2.2 is available on Ada.

It assembles in only one text file:

  • A list of MPI subroutines called
  • Message sizes
  • Time spent by each subroutine in the MPI calls as well as time lost due to desynchronisation
  • Quantities of data transferred between all the processes

Usage

  • Load the FPMPI2 module by the command: module load fpmpi2.
  • Compile with the option -g.
  • Execute the application normally; an fpmpi_profile.txt file is generated at the end of the execution.

Example

$ cat fpmpi_profile.txt

MPI Routine Statistics (FPMPI2 Version 2.2)
Options: FPMPI enabled, Collective sync, Collect destinations,
Explanation of data:
Times are the time to perform the operation, e.g., the time for MPI_Send
Average times are the average over all processes, e.g., sum (time on each
process) / number of processes
Min and max values are over all processes
(Data is always average/min/max)
Amount of data is computed in bytes.  For point-to-point operations,
it is the data sent or received.  For collective operations, it is the
data contibuted to the operation.  E.g., for an MPI_Bcast, the amount of
data is the number of bytes provided by the root, counted only at the root.
For synchronizing collective operations, the average, min, and max time
spent synchronizing is shown next.
Calls by message size shows the fraction of calls that sent messages of a
particular size.  The bins are
0 bytes, 1-4 bytes, 5-8 bytes, 9-16, 17-32, 33-64, -128, -256, -512, -1024
 -4K, -8K, -16K, -32K, -64K, -128K, -256K, -512K, -1M, -4M, -8M, -16M,
 -32M, -64M, -128M, -256M, -512M, -1GB, >1GB.
Each bin is represented by a single digit, representing the 10's of percent
of messages within this bin.  A 0 represents precisely 0, a . (period)
represents more than 0 but less than 10%.  A * represents 100%.
Messages by message size shows similar information, but for the total
message size.

The experimental topology information shows the 1-norm distance that the
longest point-to-point message travelled, by process.

MPI_Pcontrol may be used to control the collection of data.  Use the values
defined in fpmpi.h, such as FPMPI_PROF_COLLSYNC, to control what data is
collected or reported by FPMPI2.
Command: ...

Date:           Tue Jan 22 11:16:31 2013
Processes:      16
Execute time:   37.37
Timing Stats: [seconds] [min/max]       [min rank/max rank]
  wall-clock: 37.37 sec 37.365183 / 37.387304   14 / 2
        user: 36.63 sec 36.608434 / 36.649428   11 / 6
         sys: 0.2375 sec        0.216967 / 0.262960     6 / 8

Memory Usage Stats (RSS) [min/max KB]:  225192/233804

                  Average of sums over all processes
Routine                 Calls       Time Msg Length    %Time by message length
                                                    0.........1........1........
                                                              K        M
MPI_Allreduce       :       5   0.000242         40 00*0000000000000000000000000
MPI_Gather          :       2   0.000596         16 00*0000000000000000000000000
MPI_Sendrecv        :      40       1.36    6.4e+06 0000000000000000*00000000000

Details for each MPI routine
                  Average of sums over all processes
                                                   % by message length
                                (max over          0.........1........1........
                                 processes [rank])           K        M
MPI_Allreduce:
        Calls     :          5            5 [   0] 00*0000000000000000000000000
        Time      :   0.000242     0.000279 [  15] 00*0000000000000000000000000
        Data Sent :         40           40 [   0]
        SyncTime  :       1.68         3.19 [   3] 00*0000000000000000000000000
        By bin    : 5-8 [5,5]   [  0.000197,  0.000279] [     0.188,      3.19]
MPI_Gather:
        Calls     :          2            2 [   0] 00*0000000000000000000000000
        Time      :   0.000596      0.00416 [   8] 00*0000000000000000000000000
        Data Sent :         16           16 [   0]
        By bin    : 5-8 [2,2]   [  1.48e-05,   0.00416]
MPI_Sendrecv:
        Calls     :         40           40 [   0] 0000000000000000*00000000000
        Time      :       1.36         2.92 [  10] 0000000000000000*00000000000
        Data Sent :    6.4e+06      6400000 [   0]
        By bin    : 131073-262144       [40,40] [     0.164,      2.92]
        Partners  :          3 max 4(at 5) min 2(at 0)

Summary of target processes for point-to-point communication:
1-norm distance of point-to-point with an assumed 2-d topology
(Maximum distance for point-to-point communication from each process)
  1  1  1  1
  1  1  1  1
  1  1  1  1
  1  1  1  1
Data volume for each rank:   source     dest       bytes,...
0       1       1600000,        4       1600000,
1       0       1600000,        2       1600000,        5       1600000,
2       1       1600000,        3       1600000,        6       1600000,
3       2       1600000,        7       1600000,
4       0       1600000,        5       1600000,        8       1600000,
5       1       1600000,        4       1600000,        6       1600000,        9       1600000,
6       2       1600000,        5       1600000,        7       1600000,        10      1600000,
7       3       1600000,        6       1600000,        11      1600000,
8       4       1600000,        9       1600000,        12      1600000,
9       5       1600000,        8       1600000,        10      1600000,        13      1600000,
10      6       1600000,        9       1600000,        11      1600000,        14      1600000,
11      7       1600000,        10      1600000,        15      1600000,
12      8       1600000,        13      1600000,
13      9       1600000,        12      1600000,        14      1600000,
14      10      1600000,        13      1600000,        15      1600000,
15      11      1600000,        14      1600000,

Documentation