Ada: Execution of an OpenMP/multi-thread parallel code in batch

The jobs are managed on all the nodes by the software LoadLeveler.  They are distributed into classes principally in function of the Elapsed time, the number of cores, and the memory requested.

You can consult the structure of classes on Ada.

To submit a multi-thread job in batch, it is necessary to do the following:

  • Create a script submission. Here is an example recorded in the file openmp.ll :
  $ more openmp.ll
  
  # Arbitrary name of the LoadLeveler job
  # @ job_name = OpenMP
  # Standard output file of the job
  # @ output   = $(job_name).$(jobid)
  # Output error file of the job
  # @ error    = $(job_name).$(jobid)
  # Type of job
  # @ job_type = serial
  # Number of threads requested (here 4)
  # @ parallel_threads = 4
  # Maximum Elapsed time in hh:mm:ss (here 1h30mn)
  # @ wall_clock_limit = 1:30:00
  # @ queue
  # To have the command echoes
  set -x
  # Temporary job directory
  cd $TMPDIR
  # The variable LOADL_STEP_INITDIR is automatically set by
  # LoadLeveler to the directory in which you type the command llsubmit
  cp $LOADL_STEP_INITDIR/a.out .
  # The maximum STACK memory (4MB default) used (16 MB here) by
  # the private variables of each thread.
  export KMP_STACKSIZE=16m
  # It is also possible to use OMP_STACKSIZE
  #export OMP_STACKSIZE=16M
  # Execution
  ./a.out
  • Submit this script (only from Ada) via the command llsubmit :
 $ llsubmit openmp.ll

Remarks:

  • In this example, we are supposing that the executable file a.out is found in the submission directory, that is, the directory from which we enter the command llsubmit (the LOADL_STEP_INITDIR is automatically referenced by LoadLeveler).
  • The computation output file of the OpenMP.numero_job is also found in the submission directory.  It is created at the beginning of the job execution; editing or modifying this file during job execution can disrupt it.
  • The keyword parallel_threads indicates the number of reserved cores:  one thread per core.   The denomination parallel_threads takes on all its meaning in the execution of hybrid programs (MPI+OpenMP/Pthreads) for which it is necessary to indicate both the number of MPI processes and the number of threads for each MPI process.
  • The default memory reserved per core (or per thread) is 3.5 GB. At the maximum, this value can be set to 7 GB. The keyword #@as_limit specifies a per process limit. If the process generates 4 threads and needs 7 GB per thread, the memory limit must be set to 4 * 7gb : # @ as_limit = 28gb.
  • The private OpenMP variables are stocked in STACK memory associated with each thread.  Each of them is limited by default to 4MB.  In order to go beyond this limit and, for example, go up to 16MB per thread, it is necessary to use the environment variable KMP_STACKSIZE (=16m) or OMP_STACKSIZE (=16M).  Note that the value of OMP_STACKSIZE is automatically set at the same value as KMP_STACKSIZE when the latter has already been positioned.
  • The keyword # @ environment allows the setting of environment variables for LoadLeveler.  However, this keyword should not be used to set certain variables of OpenMP or of multi-threading (such as OMP_NUM_THREADS) because these are automatically determined and set by LoadLeveler at the beginning of the job execution.