This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.
Running Multi-Stage Jobs
Using the Concept of Stages with Slurm
Some users have developed complex processing chains (data flows) that involve chaining jobs with different characteristics (number of cores, computation time, and required memory). The output files of one job are often used as input files for the next job, adding interdependencies between jobs. Slurm allows managing this issue in a simple and efficient way: each stage is defined in a file to be executed with its own resources (number of cores, memory, time). A multi-stage job will consist of defining as many stages as jobs to be executed, as well as the interdependencies between these stages: this way, at each stage, the reserved resources correspond exactly to the resources used.
Chaining Jobs
To submit a multi-stage job on Jean Zay, you need to:
-
Create a bash script that submits multiple Slurm jobs (one job per stage): at the submission of each stage of the computation, retrieve the
JOB_IDcorresponding to it to transmit it during the submission of the next stage: theJOB_IDof the job is the fourth field in the return of thesbatchcommand (hence commandcut).In the following example, four stages are submitted, each stage (except the first one) depends on the previous stage and will only execute if the previous one has completed successfully (
--dependency=afterok).multi_steps.bash#!/bin/bashJID_JOB1=`sbatch job1.slurm | cut -d " " -f 4`JID_JOB2=`sbatch --dependency=afterok:$JID_JOB1 job2.slurm | cut -d " " -f 4`JID_JOB3=`sbatch --dependency=afterok:$JID_JOB2 job3.slurm | cut -d " " -f 4`sbatch --dependency=afterok:$JID_JOB3 job4.slurmAttentionThis script is not a Slurm job!
It is a bash script to be run as follows:
# Ajout du droit Unix d'executionchmod +x multi_steps.bash# Execution du script./multi_steps.bash -
Write all stages (
jobN.slurmwith N = 1, 2, 3) as independent jobs: each stage submitted via thesbatchcommand is a standard Slurm job, as described in the documentation available in the sections Execution/Job Control. This way, you can independently specify the partition, QoS, CPU time, and the number of nodes required for each stage.
-
Since these are independent jobs, the
$JOBSCRATCHvariable will be valued differently in each stage. Files that need to be shared between two stages should therefore not be saved in this JOBSCRATCH space but in the semi-temporary SCRATCH directory, or even a permanent directory like WORK, but with lower bandwidth: see all the characteristics of the disk spaces. -
In case of failure of one of the jobs in the chain, the following jobs will remain pending by default with the reason "DependencyNeverSatisfied" but will never be able to execute. You must then delete them using the
scancelcommand. If you want these jobs to be automatically cancelled in case of failure, you must specify the--kill-on-invalid-dep=yesoption when submitting them.In the following example, this option is used to execute a job (job2.slurm) only if the previous one (job1.slurm) has failed (
--dependency=afternotok:$JID_JOB1) and to avoid it remaining pending if the previous job has completed successfully (--kill-on-invalid-dep=yes). Additionally, the last job (job3.slurm) will be executed if either of the two previous jobs (job1.slurm or job2.slurm) has completed successfully (--dependency=afterok:$JID_JOB1?afterok:$JIB_JOB2):#!/bin/bashJID_JOB1=`sbatch job1.slurm | cut -d " " -f 4`JID_JOB2=`sbatch --dependency=afternotok:$JID_JOB1 --kill-on-invalid-dep=yes job2.slurm | cut -d " " -f 4`sbatch --dependency=afterok:$JID_JOB1?afterok:$JIB_JOB2 job3.slurm
Examples of Chained Jobs Using the STORE
Data Extraction from the STORE Before a Computation
- Submission script
extract_data.slurmfor the data preparation phase:
#SBATCH --job-name=Extraction # nom du travail
#SBATCH --partition=archive # on utilise la partition "archive" ou "prepost" qui a accès au STORE
#SBATCH --ntasks=1 # on a un travail séquentiel
#SBATCH --hint=nomultithread # 1 processus MPI par coeur physique (pas d'hyperthreading)
#SBATCH --time 02:00:00 # temps d execution maximum demande (HH:MM:SS)
#SBATCH --output=%x_%j.out # fichier de sortie et d'erreur (%x = nom du travail, %j = numéro du travail)
# Si vous disposez de plusieurs projets ou de plusieurs types d'heures,
# vous devez spécifier la comptabilité à utiliser même si les heures ne
# seront pas décomptées pour les partitions "archive" et "prepost".
##SBATCH --account=...
# Extraction de données depuis le STORE vers le SCRATCH
cd $SCRATCH/mon_calcul
tar -xvf $STORE/monarchive.tar
- Submission script
compute.slurmfor the computation phase:
#SBATCH --job-name=Calcul # nom du travail
# Directives Slurm à compléter en fonction de votre calcul
...
# Calcul utilisant les données préalablement extraites du STORE vers le SCRATCH
cd $SCRATCH/mon_calcul
srun ...
- Commands to execute to chain the jobs:
#!/bin/bash
# Soumission du job d'extraction avec récupération du JobIb
JOB_ID_EXTRACT_DATA=`sbatch extract_data.slurm | cut -d " " -f 4`
# Puis on n'exécute le calcul que si l'extraction des données s'est faite avec succès
sbatch --dependency=afterok:$JOB_ID_EXTRACT_DATA compute.slurm
Data Archiving to the STORE After a Computation
- Submission script
compute.slurmfor the computation phase:
#SBATCH --job-name=Calcul # nom du travail
# Directives Slurm à compléter en fonction de votre calcul
...
# Calcul produisant des données à archiver à long terme
cd $SCRATCH/mon_calcul
srun ...
- Submission script
archive_data.slurmfor the data archiving phase:
#SBATCH --job-name=Archivage # nom du travail
#SBATCH --partition=archive # on utilise la partition "archive" ou "prepost" qui a accès au STORE
#SBATCH --ntasks=1 # on a un travail séquentiel
#SBATCH --hint=nomultithread # 1 processus MPI par coeur physique (pas d'hyperthreading)
#SBATCH --time 02:00:00 # temps d execution maximum demande (HH:MM:SS)
#SBATCH --output=%x_%j.out # fichier de sortie et d'erreur (%x = nom du travail, %j = numéro du travail)
# Si vous disposez de plusieurs projets ou de plusieurs types d'heures,
# vous devez spécifier la comptabilité à utiliser même si les heures ne
# seront pas décomptées pour les partitions "archive" et "prepost".
##SBATCH --account=...
# Archivage de données depuis le SCRATCH vers le STORE
cd $SCRATCH/mon_calcul
tar -cvf $STORE/monarchive.tar resultats*.h5
- Commands to execute to chain the jobs:
#!/bin/bash
# Soumission du calcul avec récupération du JobIb
JOB_ID_COMPUTE=`sbatch compute.slurm | cut -d " " -f 4`
# Puis on n'exécute l'archivage que si le calcul s'est terminé avec succès
sbatch --dependency=afterok:$JOB_ID_COMPUTE archive_data.slurm