Skip to main content
⚠ INFORMATION
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.

Running a parallel code in batch

Jobs are managed across all nodes by the Slurm software.

If your code requires the MPI CUDA-aware functionality, you should refer to the pageRunning a batch MPI CUDA-aware and GPUDirect multi-GPU job.

To submit a parallel job in batch on Jean Zay, you need to:

  1. Create a submission script:
job_mpi.slurm
#!/bin/bash
#SBATCH --job-name=TravailMPI # nom du job
#SBATCH --nodes=2 # Nombre total de noeuds
#SBATCH --ntasks-per-node=40 # Nombre de processus MPI par noeud
# /!\ Attention, la ligne suivante est trompeuse mais dans le vocabulaire
# de Slurm "multithread" fait bien référence à l'hyperthreading.
#SBATCH --hint=nomultithread # 1 processus MPI par coeur physique (pas d'hyperthreading)
#SBATCH --time=00:10:00 # Temps d’exécution maximum demande (HH:MM:SS)
#SBATCH --output=TravailMPI%j.out # Nom du fichier de sortie (%j est remplacé par le numéro du travail)
#SBATCH --error=TravailMPI%j.out # Nom du fichier d'erreur (ici commun avec la sortie)

# on se place dans le répertoire de soumission
cd ${SLURM_SUBMIT_DIR}

# nettoyage des modules charges en interactif et herites par defaut
module purge

# chargement des modules
module load ... # par exemple intel-all/19.0.4

# echo des commandes lancées
set -x

# exécution du code
srun ./exec_mpi
Warning

The current configuration of the machine does not allow using hyperthreading (execution of 80 MPI processes on the 40 cores of a compute node) with a purely MPI code.

  1. Submit this script using the sbatch command:
sbatch mon_script.slurm

Remarks:

  • We recommend compiling and running your code in the same software environment by loading the same modules.
  • The --hint=nomultithread option ensures the reservation of physical cores (no hyperthreading).
  • The memory allocated for the job is proportional to the number of CPU cores requested. For example, if you request 1/4 of the physical CPU cores of a node, you will have access to 1/4 of its RAM memory. It is important to be consistent with the configuration of the nodes used to avoid overcharging of hours, while benefiting from the memory to which you are entitled. You can consult our documentation on this subject: Memory allocation with Slurm.
  • In these examples, it is assumed that the executable or Python script used is located in the submission directory, i.e., the directory in which you are located when using the sbatch command: the $SLURM_SUBMIT_DIR variable is automatically set by Slurm.
  • The output file of the computation will also be located in the submission directory. It is created at the beginning of the job execution; editing or modifying it during the job execution may disrupt it.
  • The module purge is made indispensable by Slurm's default behaviour: the modules you have loaded in your environment at the time you run sbatch are taken into account in the submitted job.
  • The use of the srun command is indispensable when you request a multi-task execution. We discourage the use of mpirun on Jean Zay, only srun guarantees a distribution in accordance with the resource specifications requested in your submission file.
  • The /lustre/fshomisc/sup/pub/idrtools/bind_gpu.sh script allows you to associate a different GPU with each process. It is not necessary to use it if your code explicitly manages the association of processes with GPUs. Caution, this script is very basic and can only handle the simple case of 1 GPU for 1 MPI task. It therefore only works for a number of tasks per node less than or equal to the number of GPUs per node.
  • All jobs have resources defined by a partition and a "Quality of Service" QoS set by default in Slurm. You can modify these limits by specifying a partition and/or a QoS as indicated in our documentation detailing the partitions and QoS for CPU and GPU.
  • For multi-project accounts as well as those with CPU and GPU hours, it is essential to specify the hour allocation on which to deduct the computing hours of the job as indicated in our documentation detailing themanagement of computing hours to ensure that the hours consumed by your jobs are deducted from the correct allocation.

Your opinion matters!

To give your feedback, report an error, or suggest an improvement, click here:

quick anonymous questionnaire

This questionnaire is temporary and will take less than a minute, so take the opportunity!