Jean Zay: Execution of multi-step and cascade jobs

Using the step notion with Slurm

Some users have developed complex processing chains (data flows) which consists of stringing jobs which could have different characteristics (number of cores, calculation time and memory needed). The job output files are often used as the input files of the following job which adds relationships of interdependency between the jobs. Slurm manages this problem in a simple and effective way: Each step is defined in a Slurm job which has its own resources (number of cores, memory, time). A multi-step job consists of defining as many steps as there are jobs to execute and also defining the interdependency relationships between these steps. In this way, the reserved resources correspond exactly to the resources used at each step.

Job chaining

To submit a multi-step job on Jean Zay, you must:

  • Create a Bash script which submits several Slurm jobs (one job per step): At the submission of each one of the computing steps, you recover the corresponding JOB_ID which is transmitted during the submission of the following step. The JOB_ID is the fourth field in the information returned by the sbatch command (cut command).
    In the following example, four steps are submitted; each step (except the first one) depends on the preceding step and will not execute unless the preceding step has completely finished without any problem (--dependency=afterok).

    multi_steps.bash
    #!/bin/bash
    JID_JOB1=`sbatch  job1.slurm | cut -d " " -f 4`
    JID_JOB2=`sbatch  --dependency=afterok:$JID_JOB1 job2.slurm | cut -d " " -f 4`
    JID_JOB3=`sbatch  --dependency=afterok:$JID_JOB2 job3.slurm | cut -d " " -f 4`
    sbatch  --dependency=afterok:$JID_JOB3 job4.slurm

    Important: This script is not a Slurm job. It is a Bash script to be launched in the following way:

    $ chmod +x multi_steps.bash
    $ ./multi_steps.bash


  • Write all the steps (jobN.slurm) as if they were independent jobs: Each step submitted via the sbatch command is a classic Slurm job as those described in the documentation available in the index sections: Execution/commands of a CPU code or Execution/Commands of a GPU code. In this way, you can independently specify the partition, the QoS, the CPU time and the number of nodes necessary for each step.

Caution:

  • As these are independent jobs, the value of the $JOBSCRATCH variable will be different in each step. The files needing to be shared between two steps should not, therefore, be saved in this JOBSCRATCH space but in the semi-permanent directory SCRATCH, or a permanent directory such as the WORK but with a slower bandwidth: See all the characteristics of the disk spaces.
  • In case one of the chained jobs fails, the remaining jobs will stay pending with reason “DependencyNeverSatisfied”. They will never be executed and you need to cancel them manually using the scancel command. If you want those jobs to be automatically cancelled, you need to add the –kill-on-invalid-dep=yes option when submitting them.
    In the following example, this option is used to run a job (job2.slurm) only if the previous one (job1.slurm) failed (–dependency=afternotok:$JID_JOB1) and prevent it from remaining pending if the previous job ended well (–kill-on-invalid-dep=yes). In addition, the last job (job3.slurm) will be executed if one of the previous two (job1.slurm or job2.slurm) has completed successfully (–dependency=afterok:$JID_JOB1?afterok:$JIB_JOB2):

    multi_steps.bash
    #!/bin/bash
    JID_JOB1=`sbatch job1.slurm | cut -d " " -f 4`
    JID_JOB2=`sbatch --dependency=afternotok:$JID_JOB1 --kill-on-invalid-dep=yes job2.slurm | cut -d " " -f 4`
    sbatch --dependency=afterok:$JID_JOB1?afterok:$JIB_JOB2 job3.slurm