Turing : Call profiling tools

GNU gprof

Description

GNU gprof is an application profiling tool.

Its functions are :

  • Identification of the most time consuming functions or subroutines through a sampling technique
  • Listing the functions and subroutines in descending order by number of times called and by their execution times
  • Print a call tree

Utilisation

  • Compile with the option -pg (compilation and linking)
  • Execute the application normally (introduces a little overhead)
  • The execution generates a gmon.out.X file by process
  • The analysis is done via the command /bgsys/drivers/ppcfloor/gnu-linux/bin/powerpc64-bgq-linux-gprof
  • The environment variable BG_GMON_RANK_SUBSET allows choosing the processes to follow (ranks 0 to 31 by default). It is in format M, M:N or M:N:S to either choose the process M, an interval going from M to N, or an interval going from M to N by step S.
  • The environment variable BG_GMON_START_THREAD_TIMERS allows following the threads for multithreaded applications (default: no following of threads, all to follow all of them, nocomm for all except the MPI threads).

Example

Here is an example of using GNU gprof on a parallel application:

$ mpixlf90_r -pg profiling_code.f90
$ runjob --envs BG_GMON_RANK_SUBSET="1:2048:2" -np 20148 --rank-per-node 32 --exe a.out
$ powerpc64-bgq-linux-gprof nom_executable gmon.out.0
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 52.31     45.36    45.36   400000     0.00     0.00  .__hydro_utils_NMOD_riemann
 13.27     56.87    11.51      640     0.02     0.13  .__hydro_principal_NMOD_godunov
  9.51     65.12     8.25   400000     0.00     0.00  .__hydro_utils_NMOD_trace
  6.50     70.76     5.64   400000     0.00     0.00  .__hydro_utils_NMOD_slope
  6.01     75.97     5.22   400000     0.00     0.00  .__hydro_utils_NMOD_constoprim
  4.99     80.30     4.33   500000     0.00     0.00  .__hydro_utils_NMOD_eos
  2.24     82.24     1.94   400000     0.00     0.00  .__hydro_utils_NMOD_cmpflx
  1.42     83.47     1.23      160     0.01     0.01  .__hydro_principal_NMOD_cmpdt
  0.74     84.11     0.64                             .__mcount_internal
  0.42     84.47     0.36                             .PAMI_Context_trylock_advancev
  0.28     84.71     0.24                             .SpinWaitTaskSwitchBGQ
  0.27     84.94     0.23                             .PAMI::Interface::Context::advance(...
...
...
...
granularity: each sample hit covers 2 byte(s) for 0.01% of 86.71 seconds

index % time    self  children    called     name
                0.00   10.45       1/8           ._xlsmpParRegionSetup_TPO [6]
                0.00   73.13       7/8           .ThdCode [3]
[1]     96.4    0.00   83.58       8         .hydro_main$$OL$$1 [1]
               11.51   69.92     640/640         .__hydro_principal_NMOD_godunov [2]
                1.23    0.87     160/160         .__hydro_principal_NMOD_cmpdt [10]
                0.05    0.00       8/8           .__hydro_principal_NMOD_init_hydro [30]
                0.00    0.00       8/8           .__omp_domain_decomposition_NMOD_init_omp [56]
-----------------------------------------------
               11.51   69.92     640/640         .hydro_main$$OL$$1 [1]
[2]     93.9   11.51   69.92     640         .__hydro_principal_NMOD_godunov [2]
               45.36    0.00  400000/400000      .__hydro_utils_NMOD_riemann [4]
                8.25    5.64  400000/400000      .__hydro_utils_NMOD_trace [5]
                5.22    3.46  400000/400000      .__hydro_utils_NMOD_constoprim [7]
                1.94    0.00  400000/400000      .__hydro_utils_NMOD_cmpflx [11]
                0.05    0.00  200000/200000      .__omp_domain_decomposition_NMOD_sync_x [29]
                0.01    0.00     640/640         .__hydro_utils_NMOD_make_boundary [37]
                0.00    0.00  200000/200000      .__omp_domain_decomposition_NMOD_sync_y [47]
                0.00    0.00     640/640         .allocate_work_space [48]
                0.00    0.00     640/640         .deallocate_work_space [49]
-----------------------------------------------

[3]     84.3    0.00   73.13                 .ThdCode [3]
                0.00   73.13       7/8           .hydro_main$$OL$$1 [1]
-----------------------------------------------
               45.36    0.00  400000/400000      .__hydro_principal_NMOD_godunov [2]
[4]     52.3   45.36    0.00  400000         .__hydro_utils_NMOD_riemann [4]
-----------------------------------------------
                8.25    5.64  400000/400000      .__hydro_principal_NMOD_godunov [2]
[5]     16.0    8.25    5.64  400000         .__hydro_utils_NMOD_trace [5]
                5.64    0.00  400000/400000      .__hydro_utils_NMOD_slope [8]
-----------------------------------------------
...
...
...

The results are divided into two sections: Section 1 shows the routines/functions ranked in decreasing order of consumed computation time (excluding the time consumed by the routines/functions called by them).

Section 2 shows the functions ranked in decreasing order of time consumed both by themselves and by the functions called by them (inclusive time). This section also gives the sum number of calls made to a function as well as the number of calls made from this function, the functions which were called by it and how many times. For each called function, it shows the total number of times it was called.

You will find detailed explanations about the numbers given by this command in the outputs which it generates.

Attention : The numbers given are only approximate as the measurements are done by a sampling technique.