Skip to main content
⚠ INFORMATION
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.

Best practices for code profiling

This page aims to list the best practices to follow when profiling code to analyse its performance. Some advice is general and applies regardless of the machine used, while other advice is specific to the Jean Zay machine.

Choosing a test case

The bottlenecks of a code can vary widely depending on the configuration and size of the test case used or the number of MPI processes chosen for execution.

  • Use a test case that is representative of the jobs you run (or plan to run) on Jean Zay as part of your project.
  • If you have very different configurations, it may be interesting to profile several different configurations to get a better idea of the impact of the test case on performance.

Compilation

Most code profiling tools require that debugging symbols be present in the analysed executable to function. In some cases, profiling will be possible without debugging symbols, but the results obtained will be difficult to exploit (absence of function names, line numbers, etc.).

  • Use the same compilers and compilation options as your production jobs.
  • Enable debugging mode during compilation (usually with the option -g) but make sure to explicitly specify the desired optimisation level (e.g. -g -O3) as many compilers disable optimisations by default when debugging mode is enabled. The NVIDIA compilers support the option -gopt which allows you to enable debugging mode without changing the optimisation level.

Execution

  • If you are using less than one node, it may be interesting to use the exclusive mode (option --exclusive of Slurm) for your job to be sure that there are no effects related to potential node sharing.
  • For security reasons, access to certain hardware performance counters is restricted by default. When your job is exclusive, you can obtain more complete code profiles by specifying the Slurm constraint prof (option -C prof of Slurm) which allows full access to the performance counters (kernel parameter perf_event_paranoid=-1).

Your opinion matters!

To give your feedback, report an error, or suggest an improvement, click here:

quick anonymous questionnaire

This questionnaire is temporary and will take less than a minute, so take the opportunity!