![image](/z-tools/newune/content/images/seconde/2.jpg)
Table des matières
Jean Zay: GPU Slurm partitions
The partitions available
All the DARI or Dynamic Access projects with GPU hours have Slurm partitions available to them on Jean Zay.
Since Dec. 8, 2020, projects with V100 GPU hours have access by default to a new partition which permits using all types of four-GPU accelerated nodes with 160 GB of memory (which corresponds to combining the former gpu_p1
and gpu_p3
partitions). By default, the execution time is 10 minutes and cannot exceed 100 hours (--time=HH:MM:SS
≤ 100:00:00: see below).
This new partition includes both the Nvidia V100 GPUs having 16 GB of memory and the Nvidia V100 GPUs having 32 GB of memory; If you wish to be limited to only one type of GPU, you must specify this by adding one of the following SLURM directives to your scripts:
#SBATCH -C v100-16g
# to select nodes having GPUs with 16 GB of memory (i.e.gpu_p3
)#SBATCH -C v100-32g
# to select nodes having GPUs with 32 GB of memory (i.e.gpu_p1
)
If you previously explicitly specified one of the gpu_p1
or gpu_p3
partitions in your submission scripts, you must replace the corresponding SLURM directive #SBATCH --partition=...
with one of the two directives above.
Important note: If your job can run on either type of the above nodes, we recommend that you do not specify a type of node (neither -C v100-16g
nor -C v100-32g
) in order to limit the waiting time of your jobs in the queue.
Other available partitions:
- The gpu_p2 partition is accessible to all researchers. This partition allows launching jobs on the Jean Zay eight-GPU accelerated nodes. These nodes are equipped with Nvidia V100 GPUs with 32 GB of memory. The execution time by default is 10 minutes and it cannot exceed 100 hours (
--time=HH:MM:SS
≤ 100:00:00; see below). - The gpu_p5 partition is only accessible to the researchers who have requested A100 GPU hours via Dynamic Access (AD) or Regular Access (DARI projects). It allows calculations to be launched on the 52 Jean Zay octo-GPU accelerated nodes which are equipped with Nvidia A100 GPUs interconnected by a SXM4 socket and having 80 GB of memory per GPU. By default, the execution time is 10 minutes and cannot exceed 20 hours (
--time=HH:MM:SS
≤ 20:00:00. Note, as shown below, the Quality of Service “qos_gpu-t4” cannot therefore be used with this partition). To use this partition, you must specify the SLURM directive#SBATCH -C a100
in your scripts.
Warning: These nodes include EPYC 7543 Milan AMD processors (64 cores per node), unlike other nodes which feature Intel processors. You must therefore loadcpuarch/amd
module (module load cpuarch/amd
) first in order to have access to modules compatible with this partition and to recompile your codes. - The prepost partition allows launching a job on one of the Jean Zay pre-/post-processing nodes,
jean-zay-pp
: These calculations are not deducted from your allocation. The execution time by default is 2 hours and cannot exceed 20 hours (--time=HH:MM:SS
≤ 20:00:0, see below). - The visu partition allows launching a job on one of the Jean Zay visualization nodes,
jean-zay-visu
: These calculations are not deducted from your allocation. The execution time by default is 10 minutes and cannot exceed 4 hours (–time=HH:MM:SS ≤ 4:00:00, see below). - The archive partition is dedicated to data management (copying or moving files, creating archive files): The computing hours are not deducted from your allocation. The execution time by default is 2 hours and cannot exceed 20 hours (
--time=HH:MM:SS
≤ 20:00:00, see below). - The compil partition is dedicated to library and binary compilations which cannot be done on the front end because they require too much CPU time: The computing hours are not deducted from your allocation. The execution time by default is 2 hours and cannot exceed 20 hours (
--time=HH:MM:SS
≤ 20:00:00, see below).
Summary table: Accessing GPU compute partitions | ||
Node type desired | Corresponding Slurm option | |
---|---|---|
CPU | GPU | |
40 CPUs + usable RAM 160 GB | 4 V100 GPUs + RAM 16 or 32 GB | default (no option) |
40 CPUs + usable RAM 160 GB | 4 V100 GPUs + RAM 16 GB | -C v100-16g |
40 CPUs + usable RAM 160 GB | 4 V100 GPUs + RAM 32 GB | -C v100-32g |
24 CPUs + usable RAM 360 or 720 GB | 8 V100 GPUs + RAM 32 GB | --partition=gpu_p2 |
24 CPUs + usable RAM 360 GB | 8 V100 GPUs + RAM 32 GB | --partition=gpu_p2s |
24 CPUs + usable RAM 720 GB | 8 V100 GPUs + RAM 32 GB | --partition=gpu_p2l |
64 CPUs + usable RAM 468 GB | 8 A100 GPUs + RAM 80 GB | -C a100 |
Important: Be careful about the partition default time limits which are intentionally low. For a long execution, you should specify a time limit for the execution which must stay inferior to the maximum time authorised for the partition and the Quality of Service (QoS) used. To specify the time limits you must use either:
- The Slurm directive
#SBATCH --time=HH:MM:SS
in your job, or - The option
--time=HH:MM:SS
of the commandssbatch
,salloc
orsrun
.
The default GPU partition does not need to be specified to be used by all jobs requesting GPUs. All the other partitions, however, must be explicitly specified to be used. For example, to specify the prepost partition, you can use either:
- The Slurm directive
#SBATCH --partition=prepost
in your job, or - The option
--partition=prepost
of the commandssbatch
,salloc
orsrun
commands.
Warning: Since 11 October 2019, any job requiring more than one node runs in exclusive mode: The nodes are not shared. This implies that the hours invoiced are calculated on the basis of the totality of the requistioned nodes, including those which were only partially exploited.
For example, the reservation of 41 CPU cores (or 1 node + 1 core) on the cpu_p1 partition results in the invoicing of 80 CPU cores (or 2 nodes). In the same way, reserving 5 GPUs (or 1 four-GPU node + 1 GPU) on the default GPU partition results in the invoicing of 8 GPUs (or 2 four-GPU nodes). However, the total memory of the reserved nodes is available in both cases (approximately 160 usable GBs per node).
Available QoS
For each job submitted on a compute partition (other than archive, compil, prepost and visu), you may specify a Quality of Service (QoS). The QoS determines the time/node limits and priority of your job.
- The default QoS for all the GPU jobs: qos_gpu-t3
- Maximum duration: 20h00 of elapsed time
- 512 GPU maximum per job
- 512 GPU maximum per user (all projects combined)
- 512 GPU maximum per project (all users combined)
- A QoS for longer executions, only available on V100 partitions, and which must be specified to be used (see below): qos_gpu-t4
- Maximum duration: 100h00 of elapsed time
- 16 GPU maximum per job
- 96 GPU maximum per user (all projects combined)
- 96 GPU maximum per project (all users combined)
- 256 GPU maximum for the totality of jobs requesting this QoS.
- a QoS reserved only for short executions carried out within the frameworks of code development or execution tests and which must be specified to be used (see below): qos_gpu-dev
- A maximum of 10 jobs simultaneously (running or pending) per user
- Maximum duration: 2h00 of elapsed time
- 32 GPU maximum per job
- 32 GPU maximum per user (all projects combined)
- 32 GPU maximum per project (all users combined)
- 512 GPU maximum for the totality of jobs requesting this QoS.
To specify a QoS which is different from the default one, you can either:
- Use the Slurm directive
#SBATCH --qos=qos_gpu-dev
(for example) in your job, or - Specify the
--qos=qos_gpu-dev
option of the commandssbatch
,salloc
orsrun
.
Summary table: GPU QoS limits | |||||
QoS | Elapsed time limit | Resource limit | |||
---|---|---|---|---|---|
per job | per user (all projects combined) | per project (all users combined) | per QoS | ||
qos_gpu-t3 (default) | 20h | 512 GPU | 512 GPU | 512 GPU | |
qos_gpu-t4 (V100) | 100h | 16 GPU | 96 GPU | 96 GPU | 256 GPU |
qos_gpu-dev | 2h | 32 GPU | 32 GPU max of 10 jobs simultaneously (running or pending) | 32 GPU | 512 GPU |