Jean Zay: Execution of an MPI parallel code in batch

Jobs are managed on all of the nodes by the Slurm software.

To submit an MPI job in batch on Jean Zay, you must:

  • Create a submission script. Here is an example saved in the intel_mpi.slurm file:
    intel_mpi.slurm
    #!/bin/bash
    #SBATCH --job-name=TravailMPI      # name of job
    #SBATCH --ntasks=80                # total number of MPI processes
    #SBATCH --ntasks-per-node=40       # number of MPI processes per node
    # /!\ Caution, "multithread" in Slurm vocabulary refers to hyperthreading.
    #SBATCH --hint=nomultithread       # 1 MPI process per physical core (no hyperthreading)
    #SBATCH --time=00:10:00            # maximum execution time requested (HH:MM:SS)
    #SBATCH --output=TravailMPI%j.out  # name of output file
    #SBATCH --error=TravailMPI%j.out   # name of error file (here, in common with output)
     
    # go into the submission directory
    cd ${SLURM_SUBMIT_DIR}
     
    # clean out the modules loaded in interactive and inherited by default
    module purge
     
    # loading modules
    module load intel-all/19.0.4
     
    # echo of launched commands
    set -x
     
    # code execution
    srun ./exec_mpi
  • Submit this script via the sbatch command:
    $ sbatch intel_mpi.slurm

Caution: The current configuration of the machine does not allow using hyperthreading (execution of 80 MPI processes on the 40 physical cores of a compute node) with a purely MPI code.

Comments:

  • We recommend that you compile and execute your codes under the same Intel environment: Use exactly the same command module load intel/… at the execution and at the compilation.
  • The module purge is made necessary by the Slurm default behaviour: Any modules which are loaded in your environment at the moment when you launch sbatch will be passed to the submitted job.
  • In this example, we assume that the exec_mpi executable file is found in the submission directory which is the directory in which we enter the sbatch command: The SLURM_SUBMIT_DIR variable is automatically recovered by Slurm.
  • The computation output file TravailMPI_numero_job.out is also found in the submission directory. It is created at the start of the job execution: Editing or modifying it while the job is running can disrupt the execution.
  • To avoid errors from the automatic task distribution, we recommend that you use srun to execute your code instead of mpirun. This guarantees a distribution which conforms to the specifications of the requested resources in your submission file.
  • All jobs have resources defined in Slurm per partition and per QoS (Quality of Service) by default. You can modify the limits by specifying another partition and / or QoS as shown in our documentation detailing the partitions and QoS.
  • For multi-project users and those having both CPU and GPU hours, it is necessary to specify the project accounting (hours allocation of the project) for which to count the computing hours of the job as indicated in our documentation detailing the computing hours accounting.
  • We strongly recommend that you consult our documentation detailing the computing hours accounting to ensure that the hours consumed by your jobs are deducted from the correct accounting.