Jean Zay: Execution of a hybrid MPI/OpenMP job in batch

Batch jobs are managed on all the nodes by the software Slurm .

To submit a hybrid MPI + OpenMP batch job on Jean Zay, it is necessary to:

  • Create a submission script. The following is an example saved in the intel_mpi_omp.slurm file:
    intel_mpi_omp.slurm
    #!/bin/bash
    #SBATCH --job-name=Hybrid          # name of job
    #SBATCH --ntasks=8             # name of the MPI process
    #SBATCH --cpus-per-task=10     # number of OpenMP threads
    # /!\ Caution, "multithread" in Slurm vocabulary refers to hyperthreading.
    #SBATCH --hint=nomultithread   # 1 thread per physical core (no hyperthreading)
    #SBATCH --time=00:10:00            # maximum execution time requested (HH:MM:SS)
    #SBATCH --output=Hybride%j.out     # name of output file
    #SBATCH --error=Hybride%j.out      # name of error file (here, common with the output file)
     
    # go into the submission directory
    cd ${SLURM_SUBMIT_DIR}
     
    # clean out the modules loaded in interactive and inherited by default
    module purge
     
    # loading modules
    module load intel-all/19.0.4
     
    # echo of launched commands
    set -x
     
    # number of OpenMP threads
    export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK 
    # OpenMP binding
    export OMP_PLACES=cores
     
    # code execution
    srun ./exec_mpi_omp
  • Submit this script via the sbatch command:
    $ sbatch intel_mpi_omp.slurm

Comments:

  • We recommend that you compile and execute your codes under the same Intel environment: Use exactly the same command module load intel… at the execution and at the compilation.
  • The module purge is made necessary by Slurm default behaviour: Any modules which are loaded in your environment at the moment when you launch sbatch will be passed to the submitted job.
  • In this example, we assume that the exec_mpi executable file is found in the submission directory which is the directory in which we enter the sbatch command: The SLURM_SUBMIT_DIR variable is automatically recovered by Slurm.
  • The computation output file Hybride<numero_job>.out is also found in the submission directory. It is created at the start of the job execution: Editing or modifying it while the job is running can disrupt the execution.
  • To avoid errors from the automatic task distribution, we recommend that you use srun to execute your code instead of mpirun. This guarantees a distribution which conforms to the specifications of the requested resources in your submission file.
  • All jobs have resources defined in Slurm per partition and per QoS (Quality of Service) by default. You can modify the limits by specifying another partition and / or QoS as shown in our documentation detailing the partitions and Qos.
  • For multi-project users and those having both CPU and GPU hours, it is necessary to specify the project accounting (hours allocation of the project) on which to count the computing hours of the job as indicated in our documentation detailing the project hours management.
  • We strongly recommend that you consult our documentation detailing the project hours management to ensure that the hours consumed by your jobs are deducted from the correct accounting.