This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.
Running a collection of similar jobs (job array)
Principle of the job array
Job arrays allow a user to submit a collection of similar jobs at once.
At IDRIS, job arrays are only allowed for jobs submitted via the sbatch command because they can generate a very large number of jobs.
They are distinguished by the use of the Slurm directive #SBATCH --array, which allows you to specify the indices, or a range of indices, for the jobs in the collection as indicated below:
- For a collection of jobs with indices ranging successively from 0 to NB_JOBS (i.e. 0, 1, 2, ... , NB_JOBS):
#SBATCH --array=0-NB_JOBS
- For a collection of jobs with indices varying from 0 to at most NB_JOBS in steps of STEP_SIZE (i.e. 0, STEP_SIZE, 2*STEP_SIZE, etc.):
#SBATCH --array=0-NB_JOBS:STEP_SIZE
- For a collection of N jobs with predefined indices J_1, J_2, ... , J_N:
#SBATCH --array=J_1,J_2,...,J_N
To avoid saturating the supercomputer, the maximum number of jobs executed simultaneously from a job array can be specified using the separator %.
For example, use the following syntax to ensure that no more than NB_MAX_RUNNING_JOBS jobs are running at any given time (where NB_MAX_RUNNING_JOBS is chosen by you):
#SBATCH --array=0-NB_JOBS%NB_MAX_RUNNING_JOBS
Variables specific to job arrays
When using job arrays, certain Slurm environment variables can be used in the shell script to customise the various jobs in the same collection, for example, so that each job in the collection uses different input and/or output directories.
The following environment variables are automatically set by Slurm:
SLURM_JOB_ID: the internal job identifierSLURM_ARRAY_JOB_ID: also the job array identifierSLURM_ARRAY_TASK_ID: the index specific to each job in the collection (can be seen as a job counter)SLURM_ARRAY_TASK_COUNT: the total number of jobs in the collection that will be executed.SLURM_ARRAY_TASK_MIN: the smallest index of all jobs in the collectionSLURM_ARRAY_TASK_MAX: the largest index of all jobs in the collection
Additionally, with job arrays, two additional options are available to specify the names of the input and output files for each job in the #SBATCH --output=... and #SBATCH --error=... directives:
%Awhich is automatically replaced by the value ofSLURM_ARRAY_JOB_ID%awhich is automatically replaced by the value ofSLURM_ARRAY_TASK_ID.
Remarks:
-
By default, the output file name format for a job array is
slurm-%A_%a.out. -
In bash, the variables specific to job arrays can be retrieved as follows:
echo ${SLURM_JOB_ID}
echo ${SLURM_ARRAY_JOB_ID}
echo ${SLURM_ARRAY_TASK_ID}
echo ${SLURM_ARRAY_TASK_COUNT}
echo ${SLURM_ARRAY_TASK_MIN}
echo ${SLURM_ARRAY_TASK_MAX}
- For Python scripts, the variables specific to job arrays can be retrieved as follows:
import os
slurm_job_id=int(os.environ["SLURM_JOB_ID"])
slurm_array_job_id=int(os.environ["SLURM_ARRAY_JOB_ID"])
slurm_array_task_id=int(os.environ["SLURM_ARRAY_TASK_ID"])
slurm_array_task_count=int(os.environ["SLURM_ARRAY_TASK_COUNT"])
slurm_array_task_min=int(os.environ["SLURM_ARRAY_TASK_MIN"])
slurm_array_task_max=int(os.environ["SLURM_ARRAY_TASK_MAX"])
Examples of use
The examples below concern executions on the CPU partition. The principle remains the same for executions on the GPU partitions.
- Example of a submission script for 20 identical jobs with a maximum of 5 jobs running simultaneously (execution in batches of 5 jobs):
#!/bin/bash
#SBATCH --job-name=job-array # nom du job
#SBATCH --ntasks=1 # Nombre total de processus MPI
#SBATCH --ntasks-per-node=1 # Nombre de processus MPI par noeud
# Dans le vocabulaire Slurm "multithread" fait référence à l'hyperthreading.
#SBATCH --hint=nomultithread # 1 processus MPI par coeur physique (pas d'hyperthreading)
#SBATCH --time=00:01:00 # Temps d’exécution maximum demande (HH:MM:SS)
#SBATCH --output=%x_%A_%a.out # Nom du fichier de sortie contenant l'ID et l'indice
#SBATCH --error=%x_%A_%a.out # Nom du fichier d'erreur (ici commun avec la sortie)
#SBATCH --array=0-19%5 # 20 travaux en tout mais 5 travaux max dans la file
# on se place dans le répertoire de soumission
cd ${SLURM_SUBMIT_DIR}
# nettoyage des modules charges en interactif et herites par defaut
module purge
# chargement des modules
module load ...
# echo des commandes lancées
set -x
# Execution du binaire "mon_exe" avec des donnees differentes pour chaque travail
# La valeur de ${SLURM_ARRAY_TASK_ID} est differente pour chaque travail.
srun ./mon_exe < fichier${SLURM_ARRAY_TASK_ID}.in > fichier${SLURM_ARRAY_TASK_ID}.out
- Example of a submission script for 3 identical jobs with indices 1, 3 and 8 respectively:
#!/bin/bash
#SBATCH --job-name=job-array # nom du job
#SBATCH --ntasks=1 # Nombre total de processus MPI
#SBATCH --ntasks-per-node=1 # Nombre de processus MPI par noeud
# Dans le vocabulaire Slurm "multithread" fait référence à l'hyperthreading.
#SBATCH --hint=nomultithread # 1 processus MPI par coeur physique (pas d'hyperthreading)
#SBATCH --time=00:01:00 # Temps d’exécution maximum demande (HH:MM:SS)
#SBATCH --output=%x_%A_%a.out # Nom du fichier de sortie contenant l'ID et l'indice
#SBATCH --error=%x_%A_%a.out # Nom du fichier d'erreur (ici commun avec la sortie)
#SBATCH --array=1,3,8 # 3 travaux en tout ayant les indices 1, 3 et 8
# on se place dans le répertoire de soumission
cd ${SLURM_SUBMIT_DIR}
# nettoyage des modules charges en interactif et herites par defaut
module purge
# chargement des modules
module load ...
# echo des commandes lancées
set -x
# Execution du binaire "mon_exe" avec des donnees differentes pour chaque travail
# La valeur de ${SLURM_ARRAY_TASK_ID} est differente pour chaque travail.
srun ./mon_exe < fichier${SLURM_ARRAY_TASK_ID}.in > fichier${SLURM_ARRAY_TASK_ID}.out
- Example of a submission script for 6 identical jobs with indices between 0 and 11 in steps of 2:
#!/bin/bash
#SBATCH --job-name=job-array # nom du job
#SBATCH --ntasks=1 # Nombre total de processus MPI
#SBATCH --ntasks-per-node=1 # Nombre de processus MPI par noeud
# Dans le vocabulaire Slurm "multithread" fait référence à l'hyperthreading.
#SBATCH --hint=nomultithread # 1 processus MPI par coeur physique (pas d'hyperthreading)
#SBATCH --time=00:01:00 # Temps d’exécution maximum demande (HH:MM:SS)
#SBATCH --output=%x_%A_%a.out # Nom du fichier de sortie contenant l'ID et l'indice
#SBATCH --error=%x_%A_%a.out # Nom du fichier d'erreur (ici commun avec la sortie)
#SBATCH --array=0-11:2 # 6 travaux ayant les indices 0, 2, 4, 6, 8, et 10
# on se place dans le répertoire de soumission
cd ${SLURM_SUBMIT_DIR}
# nettoyage des modules charges en interactif et herites par defaut
module purge
# chargement des modules
module load ...
# echo des commandes lancées
set -x
# Execution du binaire "mon_exe" avec des donnees differentes pour chaque travail
# La valeur de ${SLURM_ARRAY_TASK_ID} est differente pour chaque travail.
srun ./mon_exe < fichier${SLURM_ARRAY_TASK_ID}.in > fichier${SLURM_ARRAY_TASK_ID}.out
Job control command
A job of type job array must be executed via the sbatch command due to the large number of jobs it can generate:
sbatch job_array.slurm
The monitoring of these jobs is done with the squeue command, which then returns appropriate information. For example, for a job array consisting of 7 jobs executed in batches of 2 jobs on the cpu_p1 partition:
- The first call to
squeuereturns:Here, we can see that the first 2 jobs are running and the other 5 are pending.$ squeue -J 305813JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)305813_[2-6%2] cpu_p1 job-array mylogin PD 0:00 1 (JobArrayTaskLimit)305813_0 cpu_p1 job-array mylogin R 0:00 1 r7i1n0305813_1 cpu_p1 job-array mylogin R 0:00 1 r8i6n3 - When the first 2 jobs are finished, a second call to
squeuereturns:Now, we can see that the next 2 jobs are running and there are only 3 left pending. Note that there is no longer any trace of the first 2 completed jobs.$ squeue -J 305813JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)305813_[4-6%2] cpu_p1 job-array mylogin PD 0:00 1 (JobArrayTaskLimit)305813_2 cpu_p1 job-array mylogin R 0:05 1 r7i1n0305813_3 cpu_p1 job-array mylogin R 0:05 1 r8i6n3
To delete a job array, you must use the scancel command. There are several ways to proceed:
- To cancel the entire collection, indicate its identifier
${SLURM_ARRAY_JOB_ID}, with the example above, this gives:$ scancel 305813 - To cancel the execution of a particular job, indicate the collection identifier
${SLURM_ARRAY_JOB_ID}and the job index${SLURM_ARRAY_TASK_ID}, with the example above, this gives:$ scancel 305813_2 - To cancel the execution of a series of jobs, indicate the collection identifier
${SLURM_ARRAY_JOB_ID}and a range of indices (here from 4 to 6). With the example above, this gives:$ scancel 305813_[4-6]
Documentation
Slurm documentation on job arrays