⚠ INFORMATION
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.

Running a collection of similar jobs (job array)

Principle of the job array

Job arrays allow a user to submit a collection of similar jobs at once.

Attention

At IDRIS, job arrays are only allowed for jobs submitted via the sbatch command because they can generate a very large number of jobs.

They are distinguished by the use of the Slurm directive #SBATCH --array, which allows you to specify the indices, or a range of indices, for the jobs in the collection as indicated below:

For a collection of jobs with indices ranging successively from 0 to NB_JOBS (i.e. 0, 1, 2, ... , NB_JOBS):
```
#SBATCH --array=0-NB_JOBS
```
For a collection of jobs with indices varying from 0 to at most NB_JOBS in steps of STEP_SIZE (i.e. 0, STEP_SIZE, 2*STEP_SIZE, etc.):
```
#SBATCH --array=0-NB_JOBS:STEP_SIZE
```
For a collection of N jobs with predefined indices J_1, J_2, ... , J_N:
```
#SBATCH --array=J_1,J_2,...,J_N
```

Remark

To avoid saturating the supercomputer, the maximum number of jobs executed simultaneously from a job array can be specified using the separator %.

For example, use the following syntax to ensure that no more than NB_MAX_RUNNING_JOBS jobs are running at any given time (where NB_MAX_RUNNING_JOBS is chosen by you):

#SBATCH --array=0-NB_JOBS%NB_MAX_RUNNING_JOBS

Variables specific to job arrays

When using job arrays, certain Slurm environment variables can be used in the shell script to customise the various jobs in the same collection, for example, so that each job in the collection uses different input and/or output directories.

The following environment variables are automatically set by Slurm:

SLURM_JOB_ID: the internal job identifier
SLURM_ARRAY_JOB_ID: also the job array identifier
SLURM_ARRAY_TASK_ID: the index specific to each job in the collection (can be seen as a job counter)
SLURM_ARRAY_TASK_COUNT: the total number of jobs in the collection that will be executed.
SLURM_ARRAY_TASK_MIN: the smallest index of all jobs in the collection
SLURM_ARRAY_TASK_MAX: the largest index of all jobs in the collection

Additionally, with job arrays, two additional options are available to specify the names of the input and output files for each job in the #SBATCH --output=... and #SBATCH --error=... directives:

%A which is automatically replaced by the value of SLURM_ARRAY_JOB_ID
%a which is automatically replaced by the value of SLURM_ARRAY_TASK_ID.

Remarks:

By default, the output file name format for a job array is slurm-%A_%a.out.
In bash, the variables specific to job arrays can be retrieved as follows:

echo ${SLURM_JOB_ID}
echo ${SLURM_ARRAY_JOB_ID}
echo ${SLURM_ARRAY_TASK_ID}
echo ${SLURM_ARRAY_TASK_COUNT}
echo ${SLURM_ARRAY_TASK_MIN}
echo ${SLURM_ARRAY_TASK_MAX}

For Python scripts, the variables specific to job arrays can be retrieved as follows:

import os
slurm_job_id=int(os.environ["SLURM_JOB_ID"])
slurm_array_job_id=int(os.environ["SLURM_ARRAY_JOB_ID"])
slurm_array_task_id=int(os.environ["SLURM_ARRAY_TASK_ID"])
slurm_array_task_count=int(os.environ["SLURM_ARRAY_TASK_COUNT"])
slurm_array_task_min=int(os.environ["SLURM_ARRAY_TASK_MIN"])
slurm_array_task_max=int(os.environ["SLURM_ARRAY_TASK_MAX"])

Examples of use

Preliminary remark

The examples below concern executions on the CPU partition. The principle remains the same for executions on the GPU partitions.

Example of a submission script for 20 identical jobs with a maximum of 5 jobs running simultaneously (execution in batches of 5 jobs):

#!/bin/bash
#SBATCH --job-name=job-array   # nom du job
#SBATCH --ntasks=1             # Nombre total de processus MPI
#SBATCH --ntasks-per-node=1    # Nombre de processus MPI par noeud
# Dans le vocabulaire Slurm "multithread" fait référence à l'hyperthreading.
#SBATCH --hint=nomultithread   # 1 processus MPI par coeur physique (pas d'hyperthreading)
#SBATCH --time=00:01:00        # Temps d’exécution maximum demande (HH:MM:SS)
#SBATCH --output=%x_%A_%a.out  # Nom du fichier de sortie contenant l'ID et l'indice
#SBATCH --error=%x_%A_%a.out   # Nom du fichier d'erreur (ici commun avec la sortie)
#SBATCH --array=0-19%5         # 20 travaux en tout mais 5 travaux max dans la file

# on se place dans le répertoire de soumission
cd ${SLURM_SUBMIT_DIR}

# nettoyage des modules charges en interactif et herites par defaut
module purge

# chargement des modules
module load ...

# echo des commandes lancées
set -x

# Execution du binaire "mon_exe" avec des donnees differentes pour chaque travail
# La valeur de ${SLURM_ARRAY_TASK_ID} est differente pour chaque travail.
srun ./mon_exe < fichier${SLURM_ARRAY_TASK_ID}.in > fichier${SLURM_ARRAY_TASK_ID}.out

Example of a submission script for 3 identical jobs with indices 1, 3 and 8 respectively:

#!/bin/bash
#SBATCH --job-name=job-array   # nom du job
#SBATCH --ntasks=1             # Nombre total de processus MPI
#SBATCH --ntasks-per-node=1    # Nombre de processus MPI par noeud
# Dans le vocabulaire Slurm "multithread" fait référence à l'hyperthreading.
#SBATCH --hint=nomultithread   # 1 processus MPI par coeur physique (pas d'hyperthreading)
#SBATCH --time=00:01:00        # Temps d’exécution maximum demande (HH:MM:SS)
#SBATCH --output=%x_%A_%a.out  # Nom du fichier de sortie contenant l'ID et l'indice
#SBATCH --error=%x_%A_%a.out   # Nom du fichier d'erreur (ici commun avec la sortie)
#SBATCH --array=1,3,8          # 3 travaux en tout ayant les indices 1, 3 et 8

# on se place dans le répertoire de soumission
cd ${SLURM_SUBMIT_DIR}

# nettoyage des modules charges en interactif et herites par defaut
module purge

# chargement des modules
module load ...

# echo des commandes lancées
set -x

# Execution du binaire "mon_exe" avec des donnees differentes pour chaque travail
# La valeur de ${SLURM_ARRAY_TASK_ID} est differente pour chaque travail.
srun ./mon_exe < fichier${SLURM_ARRAY_TASK_ID}.in > fichier${SLURM_ARRAY_TASK_ID}.out

Example of a submission script for 6 identical jobs with indices between 0 and 11 in steps of 2:

#!/bin/bash
#SBATCH --job-name=job-array   # nom du job
#SBATCH --ntasks=1             # Nombre total de processus MPI
#SBATCH --ntasks-per-node=1    # Nombre de processus MPI par noeud
# Dans le vocabulaire Slurm "multithread" fait référence à l'hyperthreading.
#SBATCH --hint=nomultithread   # 1 processus MPI par coeur physique (pas d'hyperthreading)
#SBATCH --time=00:01:00        # Temps d’exécution maximum demande (HH:MM:SS)
#SBATCH --output=%x_%A_%a.out  # Nom du fichier de sortie contenant l'ID et l'indice
#SBATCH --error=%x_%A_%a.out   # Nom du fichier d'erreur (ici commun avec la sortie)
#SBATCH --array=0-11:2         # 6 travaux ayant les indices 0, 2, 4, 6, 8, et 10

# on se place dans le répertoire de soumission
cd ${SLURM_SUBMIT_DIR}

# nettoyage des modules charges en interactif et herites par defaut
module purge

# chargement des modules
module load ...

# echo des commandes lancées
set -x

# Execution du binaire "mon_exe" avec des donnees differentes pour chaque travail
# La valeur de ${SLURM_ARRAY_TASK_ID} est differente pour chaque travail.
srun ./mon_exe < fichier${SLURM_ARRAY_TASK_ID}.in > fichier${SLURM_ARRAY_TASK_ID}.out

Job control command

A job of type job array must be executed via the sbatch command due to the large number of jobs it can generate:

sbatch job_array.slurm

The monitoring of these jobs is done with the squeue command, which then returns appropriate information. For example, for a job array consisting of 7 jobs executed in batches of 2 jobs on the cpu_p1 partition:

The first call to squeue returns:

$ squeue -J 305813
               JOBID PARTITION      NAME     USER ST       TIME  NODES NODELIST(REASON)
      305813_[2-6%2]    cpu_p1 job-array  mylogin PD       0:00      1 (JobArrayTaskLimit)
            305813_0    cpu_p1 job-array  mylogin  R       0:00      1 r7i1n0
            305813_1    cpu_p1 job-array  mylogin  R       0:00      1 r8i6n3

Here, we can see that the first 2 jobs are running and the other 5 are pending.

When the first 2 jobs are finished, a second call to squeue returns:

$ squeue -J 305813
             JOBID PARTITION      NAME     USER ST       TIME  NODES NODELIST(REASON)
    305813_[4-6%2]    cpu_p1 job-array  mylogin PD       0:00      1 (JobArrayTaskLimit)
          305813_2    cpu_p1 job-array  mylogin  R       0:05      1 r7i1n0
          305813_3    cpu_p1 job-array  mylogin  R       0:05      1 r8i6n3

Now, we can see that the next 2 jobs are running and there are only 3 left pending. Note that there is no longer any trace of the first 2 completed jobs.

To delete a job array, you must use the scancel command. There are several ways to proceed:

To cancel the entire collection, indicate its identifier ${SLURM_ARRAY_JOB_ID}, with the example above, this gives:
```
$ scancel 305813
```
To cancel the execution of a particular job, indicate the collection identifier ${SLURM_ARRAY_JOB_ID} and the job index ${SLURM_ARRAY_TASK_ID}, with the example above, this gives:
```
$ scancel 305813_2
```
To cancel the execution of a series of jobs, indicate the collection identifier ${SLURM_ARRAY_JOB_ID} and a range of indices (here from 4 to 6). With the example above, this gives:
```
$ scancel 305813_[4-6]
```

Documentation

Slurm documentation on job arrays

Principle of the job array​

Variables specific to job arrays​

Remarks:​

Examples of use​

Job control command​

Documentation​