Ada: Execution of a hybrid MPI/OpenMP parallel code in batch

The job scheduler LoadLeveler is responsible for the scheduling of batch jobs on all of the nodes.  Jobs are put into a batch class principally according to the Elapsed time, number of cores and memory requested. You can consult the structure of Ada classes here.

Attention : Since March 4, 2014, we have set (by default) the variable MP_USE_BULK_XFER to yes to enable RDMA. This feature allows increasing the performance of collective mpi communications and of computation-communication overlapping. However, some codes may be less efficient when this variable is set to yes. You can disable RDMA for your code by setting the variable to no just before the execution of your binary (export MP_USE_BULK_XFER=no or setenv MP_USE_BULK_XFER no).

In order to submit a hybrid job in batch, you must:

  • Create a job script following the example hybrid.ll above:
hybrid.ll
# Arbitrary name of LoadLeveler job
# @ job_name = Hybrid
# Job standard output file
# @ output   = $(job_name).$(jobid)
# Job standard error file
# @ error    = $(job_name).$(jobid)
# Type of job
# @ job_type = parallel
# Number of MPI processes requested
# @ total_tasks = 8
# Number of OpenMP or POSIX threads requested per MPI process
# @ parallel_threads = 4
# Job time in hh:mm:ss (1h 30mn here)
# @ wall_clock_limit = 01:30:00
# @ queue
# To have the command echoes
set -x
# Temporary job folder where program runs
cd $TMPDIR
# The LOADL_STEP_INITDIR variable is automatically set by LoadLeveler
# to directory from which the command llsubmit is issued.
cp $LOADL_STEP_INITDIR/a.out .
# To set maximum STACK memory used by private variables
# of each thread to 16MB (default is 4MB).
export KMP_STACKSIZE=16m
# It is also possible to use OMP_STACKSIZE variable
#export OMP_STACKSIZE=16M
# Execution of a hybrid parallel program (MPI + threads).
poe ./a.out
  • Submit this script via the llsubmit command:
 $ llsubmit hybrid.ll

Comments:

  • In this example, let us suppose that the executable file a.out is found in the submission directory.  This is the directory from which you enter the llsubmit command and it is automatically referenced by LOADL_STEP_INITDIR LoadLeveler variable.
  • The output file of the computation Hybrid.$(jobid) is also found in the submission directory; it is created at the beginning of the job execution.  You should not edit or modify this file while the job is running.  The keyword parallel_threads indicates to the job scheduler the desired number of threads per MPI process. CAUTION:  The indicated number of MPI processes (total_tasks) and number of threads per MPI process (parallel_threads) must result in a total number of reserved cores (total_tasks * parallel_threads) inferior or equal to 2048.
  • Memory:  The default value is 3.5 GB by reserved core (therefore, by thread).  If you request more than 64 cores (keyword:  total_tasks * parallel_threads > 64), then you cannot go beyond this memory limit.  Otherwise, the maximum value that you can request is 7.0 GB per core which is specified via the keyword:  as_limit.  Note that you should specify a limit per MPI process corresponding to (parallel_threads * 7.0 GB).  For example, if each MPI process generates 4 threads:  # @ as_limit = 28.00gb.
  • The private OpenMP variables are stored in thread stack memory associated with each thread.  Each thread is limited by default to 4MB.  To go beyond this limit and, for example, go up to 16MB per thread, it is necessary to use the environment variable KMP_STACKSIZE (=16m) or OMP_STACKSIZE (=16M).  Note that the value of OMP_STACKSIZE is automatically set at the same value as KMP_STACKSIZE when the latter has already been positioned.
  • The keyword # @ environment allows the setting of environment variables for LoadLeveler.  However, this keyword should not be used to set certain specific variables of OpenMP or of multi-threading (such as OMP_NUM_THREADS) because these are automatically determined and set by LoadLeveler at the beginning of the job execution.
  • If your job contains relatively long sequential commands (such as pre- and post-processing and transfers or archiving of large files), then the use of multistep jobs may be justified.