This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.
CPU Partitions
Available CPU partitions
All DARI projects (dynamic allocations (DA), regular allocations (RA), ...) with CPU hours have access to Slurm partitions defined on Jean Zay:
- The cpu_p1 partition
is automatically used if no partition is specified, for all jobs requesting
CPU hours.
By default, the execution time is 10 minutes and cannot exceed 100 hours (i.e.--time=HH:MM:SS≤ 100:00:00, see below). - The prepost partition allows you to run a job on one of Jean Zay's pre/post-processing nodes (
jean-zay-pp), where computing hours are not deducted from your allocation.
By default, the execution time is 2 hours and cannot exceed 20 hours (i.e.--time=HH:MM:SS≤ 20:00:00, see below). - The visu partition allows you to run a job on one of Jean Zay's visualisation nodes (
jean-zay-visu), where computing hours are not deducted from your allocation.
By default, the execution time is 10 minutes and cannot exceed 4 hours (i.e.--time=HH:MM:SS≤ 4:00:00, see below). - The archive partition is dedicated
to commands for managing data (copying or moving files, creating archives): computing hours are not deducted from your allocation.
By default, the execution time is 2 hours and cannot exceed 20 hours (i.e.--time=HH:MM:SS≤ 20:00:00, see below). - The compil partition is dedicated
to compiling codes and libraries that cannot be done on the login node because they
require too many resources (CPU time or memory): computing hours are not deducted from your allocation.
By default, the execution time is 2 hours and cannot exceed 20 hours (i.e.--time=HH:MM:SS≤ 20:00:00, see below). - The compil_H100 partition is dedicated
to compiling codes and libraries that will run on the Eviden H100 partition
(gpu_p6).
It contains a node equipped with an Intel processor identical to those in the Eviden H100 partition.
It has the same access characteristics as the
compilpartition.
The technical characteristics of each partition are presented here.
The default time limits for the partitions are deliberately low. For long executions, you must specify an execution time limit, which must remain below the maximum allowed for the partition and the QoS used (see below). You must then use:
- either the Slurm directive
#SBATCH --time=HH:MM:SSin your job, - or the option
--time=HH:MM:SSof the commandssbatch,sallocorsrun.
Since the cpu_p1 partition is the default partition, it does not need to be requested. However, all other partitions must be explicitly specified to be used.
For example, for the prepost partition, you can use:
- either the Slurm directive
#SBATCH --partition=prepostin your job, - or the option
--partition=prepostof the commandssbatch,sallocorsrun.
Any job requesting more than one node runs in exclusive mode: nodes are not shared. Using part of a node results in billing for the entire node. For example, reserving 41 cores (i.e. 1 node + 1 core) results in billing for 80 cores (i.e. 2 nodes). However, the full memory of the reserved nodes is then available (around 160 GB usable per node).
Available CPU QoS
For each job submitted in the cpu_p1 partition, you can specify a QoS (Quality of Service) that will determine the limits and priority of your job:
-
the default QoS for all CPU jobs: qos_cpu-t3
- maximum duration: 20h00 of Elapsed time,
- maximum of 10240 physical cores (256 nodes) per job,
- maximum of 20480 physical cores (512 nodes) per user (all projects combined),
- maximum of 20480 physical cores (512 nodes) per project (all users combined).
-
a QoS for longer executions that must be specified to be used (see below): qos_cpu-t4
- maximum duration: 100h00 of Elapsed time,
- maximum of 160 physical cores (4 nodes) per job,
- 640 physical cores (16 nodes) per user (all projects combined),
- maximum of 640 physical cores (16 nodes) per project (all users combined),
- maximum of 5120 physical cores (128 nodes) for all jobs requesting this QoS.
-
a QoS reserved solely for brief executions carried out as part of code development or execution tests and which must be specified to be used (see below): qos_cpu-dev
- a maximum of 10 jobs (running or pending) simultaneously per user,
- maximum duration: 2h00 of Elapsed time,
- maximum of 5120 physical cores (128 nodes) per job,
- maximum of 5120 physical cores (128 nodes) per user (all projects combined),
- maximum of 5120 physical cores (128 nodes) per project (all users combined),
- maximum of 10240 physical cores (256 nodes) for all jobs requesting this QoS.
To specify a QoS different from the default, you can choose to:
- use the Slurm directive
#SBATCH --qos=qos_cpu-devin your job, for example, - or specify the option
--qos=qos_cpu-devto the commandssbatch,sallocorsrun.
Summary table of CPU QoS limits
| QoS | Time limit | Resource limit per job | Limit per user (all projects combined) | Limit per project (all users combined) | Limit per QoS |
|---|---|---|---|---|---|
| qos_cpu‑t3 (default) | 20h | 10240 physical cores (256 nodes) | 20480 physical cores (512 nodes) | 20480 physical cores (512 nodes) | |
| qos_cpu‑t4 | 100h | 160 physical cores (4 nodes) | 640 physical cores (16 nodes) | 640 physical cores (16 nodes) | 5120 physical cores (128 nodes) |
| qos_cpu‑dev | 2h | 5120 physical cores (128 nodes) | 5120 physical cores (128 nodes), 10 jobs maximum (running or pending) simultaneously | 5120 physical cores (128 nodes) | 10240 physical cores (256 nodes) |