Adapp : Execution of an MPI parallel code in batch

Jobs are managed on all of the nodes by the LoadLeveler software. They are distributed in the classes principally in function of the Elapsed time, the number of cores and the memory requested. You can consult the overall class structure on Adapp.

Attention : Since 4 March 2014, we have default positioned the MP_USE_BULK_XFER variable at yes in order to activate the RDMA. This functionality increases the collective communication performance as well as the computation/communication overlapping. However, some codes can have reduced performance when this variable is positioned at yes. You can disactivate the RDMA for your code by setting the variable at no just before the execution of your binary (export MP_USE_BULK_XFER=no or setenv MP_USE_BULK_XFER no).

To submit an MPI job in batch from Adapp, you must:

  • Create a submission script. Here is an example stocked in the file mpi.ll :

    mpi.ll
    # Arbitrary name for the LoadLeveler job
    # @ job_name    = Mpi
    # Standard output file for the job
    # @ output      = $(job_name).$(jobid)
    # Output error file for the job
    # @ error       = $(job_name).$(jobid)
    # Job type
    # @ job_type    = parallel
    # Specific to Adapp
    # @ requirements = (Feature == "prepost")
    # Number of threads requested (here 16)
    # @ total_tasks = 16
    # Max. Elapsed Time for the entire job in hh:mm:ss (here 10mn)
    # @ wall_clock_limit = 00:10:00
    # @ queue
     
    # to print an echo of each command in the output file
    set -x
    # Temporary directory for execution
    cd $TMPDIR
    # The LOADL_STEP_INITDIR variable is automatically set by
    # LoadLeveler, its value is the directory where the llsubmit command was typed
    cp $LOADL_STEP_INITDIR/a.out .
    cp $LOADL_STEP_INITDIR/input.data .
    # Execution of the MPI program
    poe ./a.out
  • Submit this script via la command llsubmit :

    $ llsubmit mpi.ll

Comments:

  • Do not forget the keyword # @ requirements = (Feature == “prepost”); otherwise, your calculation will be carried out on the “normal” Ada compute nodes.
  • In this example, let us suppose that the executable file a.out is found in the submission directory which is the directory where we entered the command llsubmit (the variable LOADL_STEP_INITDIR is automatically set by LoadLeveler).
  • The output file of the Mpi.jobid_number computation is also found in the submission directory. It is created as soon as the job execution begins; editing or modifying it while the job is running can disrupt the execution.
  • Memory: The default value is 3,5 GB per reserved core (therefore, by MPI task). The maximum value that you can request is 30.0 GB per core, reserved via the keyword as_limit. For example, to request 7 GB per core: # @ as_limit = 7.0gb.
  • The Elapsed time limit associated with the keyword wall_clock_limit is relative to the entire job. There is a maximum of 20 hours in the Adapp classes.
  • Attention : Specific to Adapp, you cannot go beyond the limit of 32 cores for the pre- and post-processing classes.
  • If your job contains relatively long sequential commands (multiple processes, transfers or archiving of large files, …), the use of multistep jobs may be justified.