Turing:  Code execution in multi-step jobs

Attention:  If you are making file transfers with Ergon, we advise you to consult the page multi-step jobs with file transfers.

A job often includes a preparation phase for data and file transfers between different directories (for transfers with Ergon, see above).  Then there is a parallel computing phase, followed by a phase of treatment of the results.

It would be regrettable to monopolize hundreds of Blue Gene/Q cores during phases which are purely sequential.  Therefore, we are going to use the LoadLeveler notion of steps:  The three phases described above can be strung together in the same job.  The two sequential phases will be carried out on the front-end machine (Power7) and the computing phase on the Blue Gene/Q compute nodes.

Important notes:

  • On Turing, the TMPDIR directory is retained between the different steps of the same job.
  • The front-end was not intended to do large pre- and post-processing since Adapp is dedicated to that purpose; therefore, the resources allotted to the sequential classes are deliberately limited. 
  • The sequential classes of Turing are not billed.

Below is an example of a job stringing together the data preparation, execution of an MPI programme with 2048 tasks on 1024 cores, and finally, the archiving of results.  The submission file, calling it job_multi.ll, is the following:

  #=========== Global directives ===========
  #@ shell    = /bin/bash
  #@ job_name = test_multi-steps
  #@ output   = $(job_name).$(step_name).$(jobid)
  #@ error    = $(output)
  #=========== Step 1 directives ===========
  #======= Sequential preprocessing ========
  #@ step_name = sequential_preprocessing
  #@ job_type  = serial
  #@ wall_clock_limit = 0:15:00
  #@ queue
  #=========== Step 2 directives ===========
  #============= Parallel step =============
  #@ step_name  = parallel_step
  #@ dependency = (sequential_preprocessing == 0)
  # (executed only if previous step completed without error)
  #@ job_type   = bluegene
  #@ bg_size    = 64
  #@ wall_clock_limit = 1:00:00
  #@ queue
  #=========== Step 3 directives ===========
  #======= Sequential postprocessing =======
  #@ step_name  = sequential_postprocessing
  #@ dependency = (parallel_step >= 0)
  # (executed even if previous step completed with an error)
  #@ job_type   = serial
  #@ wall_clock_limit = 0:15:00
  #@ queue
  case $LOADL_STEP_NAME in
    #============ Step 1 commands ============
    #======= Sequential preprocessing ========
    sequential_preprocessing )
      set -ex
      cd $tmpdir
      cp $LOADL_STEP_INITDIR/test/src/coucou_MPI.f .
      mpixlf90_r coucou_MPI.f -o coucou_MPI.exe
      tar xvf $WORKDIR/mydata/data.tar
      ls -l
    #============ Step 2 commands ============
    #============= Parallel step =============
    parallel_step )
      set -x
      cd $tmpdir
      runjob --ranks-per-node 32 --np 2048 --mapping ABCDET : ./coucou_MPI.exe my_args
    #============ Step 3 commands ============
    #======= Sequential postprocessing =======
    sequential_postprocessing )
      set -x
      cd $tmpdir
      tar cvf $WORKDIR/results/result.tar *.dat

This job is divided into two main sections: 

The first section (Global directives) contains all of the directives intended for the LoadLeveler queue manager; the second shows the commands to be executed in the different steps.  These two sections could also be mixed together but this presents a risk of impairing job legibility and comprehension.

The LoadLeveler directives for each step must finish with the directive #@ queue.

The #@ dependency directive of the second step indicates that this should only be executed if Step 1 has finished correctly (return value equals zero).   In Step 3, the dependence chosen is such that the step will be executed even if there is an error in Step 2.  This can be very useful for recuperating partial results after an application crash or in case of going beyond the Elapsed time (wall clock limit).  It is necessary to specify the dependency directives if you want to insure that the different steps are executed one after the other; without this directive, all the steps can begin independently of each other as soon as the resources needed are available.

As further explanation, the submission of this job actually creates three “sub-jobs” (or steps), each having the same script shell but different directives.  In order for each of the sub-jobs to execute different commands, it is necessary to make a jump statement for each step by using the sub-job name (stored in the variable LOADL_STEP_NAME).  This is done by using a case.  The usage is simple;  you just need to follow the example.  Don't forget to add ;; (double semi-colons) to separate each list of commands, and esac at the end.

In order to submit such a job containing three steps, go into the directory containing job_multi.ll and type:

llsubmit job_multi.ll

Note the usage in Step 1 of the command set  -ex.  This permits interrupting the step while it is running as soon as a command sends back a return code different than zero.  In the same way, taking into account the dependency relationship #@ dependency = (sequential_preprocessing == 0), Step 2 will not be executed unless all of the previous commands (in Step 1) have been executed correctly.  If the command set  -ex is not used, the dependency test of the step being run is done on the return code of the last command of the preceding step. 

It should also be noted that at the starting up of each step, the directory by default is the one where the job was submitted;  it is here that the output files will be written.  Furthermore, it is indispensible to specify a different output file for each step.  If not, the output of the last step overwrites the preceding output (see the output line in the submission file).  Details of the different options of the command runjob are given on this page.