Skip to main content
⚠ INFORMATION
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.

Running Multi-Stage Jobs

Using the Concept of Stages with Slurm

Some users have developed complex processing chains (data flows) that involve chaining jobs with different characteristics (number of cores, computation time, and required memory). The output files of one job are often used as input files for the next job, adding interdependencies between jobs. Slurm allows managing this issue in a simple and efficient way: each stage is defined in a file to be executed with its own resources (number of cores, memory, time). A multi-stage job will consist of defining as many stages as jobs to be executed, as well as the interdependencies between these stages: this way, at each stage, the reserved resources correspond exactly to the resources used.

Chaining Jobs

To submit a multi-stage job on Jean Zay, you need to:

  • Create a bash script that submits multiple Slurm jobs (one job per stage): at the submission of each stage of the computation, retrieve the JOB_ID corresponding to it to transmit it during the submission of the next stage: the JOB_ID of the job is the fourth field in the return of the sbatch command (hence command cut).

    In the following example, four stages are submitted, each stage (except the first one) depends on the previous stage and will only execute if the previous one has completed successfully (--dependency=afterok).

    multi_steps.bash
    #!/bin/bash
    JID_JOB1=`sbatch job1.slurm | cut -d " " -f 4`
    JID_JOB2=`sbatch --dependency=afterok:$JID_JOB1 job2.slurm | cut -d " " -f 4`
    JID_JOB3=`sbatch --dependency=afterok:$JID_JOB2 job3.slurm | cut -d " " -f 4`
    sbatch --dependency=afterok:$JID_JOB3 job4.slurm
    Attention

    This script is not a Slurm job!

    It is a bash script to be run as follows:

    # Ajout du droit Unix d'execution
    chmod +x multi_steps.bash
    # Execution du script
    ./multi_steps.bash
  • Write all stages (jobN.slurm with N = 1, 2, 3) as independent jobs: each stage submitted via the sbatch command is a standard Slurm job, as described in the documentation available in the sections Execution/Job Control. This way, you can independently specify the partition, QoS, CPU time, and the number of nodes required for each stage.

Attention
  • Since these are independent jobs, the $JOBSCRATCH variable will be valued differently in each stage. Files that need to be shared between two stages should therefore not be saved in this JOBSCRATCH space but in the semi-temporary SCRATCH directory, or even a permanent directory like WORK, but with lower bandwidth: see all the characteristics of the disk spaces.

  • In case of failure of one of the jobs in the chain, the following jobs will remain pending by default with the reason "DependencyNeverSatisfied" but will never be able to execute. You must then delete them using the scancel command. If you want these jobs to be automatically cancelled in case of failure, you must specify the --kill-on-invalid-dep=yes option when submitting them.

    In the following example, this option is used to execute a job (job2.slurm) only if the previous one (job1.slurm) has failed (--dependency=afternotok:$JID_JOB1) and to avoid it remaining pending if the previous job has completed successfully (--kill-on-invalid-dep=yes). Additionally, the last job (job3.slurm) will be executed if either of the two previous jobs (job1.slurm or job2.slurm) has completed successfully (--dependency=afterok:$JID_JOB1?afterok:$JIB_JOB2):

    #!/bin/bash
    JID_JOB1=`sbatch job1.slurm | cut -d " " -f 4`
    JID_JOB2=`sbatch --dependency=afternotok:$JID_JOB1 --kill-on-invalid-dep=yes job2.slurm | cut -d " " -f 4`
    sbatch --dependency=afterok:$JID_JOB1?afterok:$JIB_JOB2 job3.slurm

Examples of Chained Jobs Using the STORE

Data Extraction from the STORE Before a Computation

  • Submission script extract_data.slurm for the data preparation phase:
#SBATCH --job-name=Extraction # nom du travail
#SBATCH --partition=archive # on utilise la partition "archive" ou "prepost" qui a accès au STORE
#SBATCH --ntasks=1 # on a un travail séquentiel
#SBATCH --hint=nomultithread # 1 processus MPI par coeur physique (pas d'hyperthreading)
#SBATCH --time 02:00:00 # temps d execution maximum demande (HH:MM:SS)
#SBATCH --output=%x_%j.out # fichier de sortie et d'erreur (%x = nom du travail, %j = numéro du travail)
# Si vous disposez de plusieurs projets ou de plusieurs types d'heures,
# vous devez spécifier la comptabilité à utiliser même si les heures ne
# seront pas décomptées pour les partitions "archive" et "prepost".
##SBATCH --account=...

# Extraction de données depuis le STORE vers le SCRATCH
cd $SCRATCH/mon_calcul
tar -xvf $STORE/monarchive.tar
  • Submission script compute.slurm for the computation phase:
#SBATCH --job-name=Calcul # nom du travail
# Directives Slurm à compléter en fonction de votre calcul
...
# Calcul utilisant les données préalablement extraites du STORE vers le SCRATCH
cd $SCRATCH/mon_calcul
srun ...
  • Commands to execute to chain the jobs:
#!/bin/bash
# Soumission du job d'extraction avec récupération du JobIb
JOB_ID_EXTRACT_DATA=`sbatch extract_data.slurm | cut -d " " -f 4`
# Puis on n'exécute le calcul que si l'extraction des données s'est faite avec succès
sbatch --dependency=afterok:$JOB_ID_EXTRACT_DATA compute.slurm

Data Archiving to the STORE After a Computation

  • Submission script compute.slurm for the computation phase:
#SBATCH --job-name=Calcul # nom du travail
# Directives Slurm à compléter en fonction de votre calcul
...
# Calcul produisant des données à archiver à long terme
cd $SCRATCH/mon_calcul
srun ...
  • Submission script archive_data.slurm for the data archiving phase:
#SBATCH --job-name=Archivage # nom du travail
#SBATCH --partition=archive # on utilise la partition "archive" ou "prepost" qui a accès au STORE
#SBATCH --ntasks=1 # on a un travail séquentiel
#SBATCH --hint=nomultithread # 1 processus MPI par coeur physique (pas d'hyperthreading)
#SBATCH --time 02:00:00 # temps d execution maximum demande (HH:MM:SS)
#SBATCH --output=%x_%j.out # fichier de sortie et d'erreur (%x = nom du travail, %j = numéro du travail)
# Si vous disposez de plusieurs projets ou de plusieurs types d'heures,
# vous devez spécifier la comptabilité à utiliser même si les heures ne
# seront pas décomptées pour les partitions "archive" et "prepost".
##SBATCH --account=...

# Archivage de données depuis le SCRATCH vers le STORE
cd $SCRATCH/mon_calcul
tar -cvf $STORE/monarchive.tar resultats*.h5
  • Commands to execute to chain the jobs:
#!/bin/bash
# Soumission du calcul avec récupération du JobIb
JOB_ID_COMPUTE=`sbatch compute.slurm | cut -d " " -f 4`
# Puis on n'exécute l'archivage que si le calcul s'est terminé avec succès
sbatch --dependency=afterok:$JOB_ID_COMPUTE archive_data.slurm

Your opinion matters!

To give your feedback, report an error, or suggest an improvement, click here:

quick anonymous questionnaire

This questionnaire is temporary and will take less than a minute, so take the opportunity!