Table des matières
Jean Zay: Memory allocation with Slurm on GPU partitions
The Slurm options --mem
, --mem-per-cpu
and --mem-per-gpu
are currently disabled on Jean Zay because they do not allow you to properly configure the memory allocation per node of your job. The memory allocation per node is automatically determined by the number of reserved CPUs per node.
To adjust the amount of memory per node allocated to your job, you must adjust the number of CPUs reserved per task/process (in addition to the number of task and/or GPU) by specifying the following option in your batch scripts, or when using salloc
in interactive mode:
--cpus-per-task=... # --cpus-per-task=1 by default
Be careful, by default, --cpus-per-task=1
. Therefore, if you do not modify its value, as explained below, you will not be able to access all of the potentially accessible memory per reserved task/GPU. In particular, you risk quickly making memory overflows at the level of the processes running on the processors.
The maximum value that can be specified in --cpus-per-task=...
, depends on the number of processes/task requested per node (--ntasks-per-node=...
) and the profile of the used nodes (different total number of cores per node) which depends on the used partition.
Note that there can also be memory overflows at GPU level because they have individual memory whose size varies depending on the used partition.
On the default gpu partition
Each node of the default gpu partition offers 156 GB of usable memory and 40 CPU cores. The memory allocation is therefore computed automatically on the basis of:
- 156/40 = 3.9 GB per reserved CPU core if hyperthreading is deactivated (Slurm option
--hint=nomultithread
).
Each compute node of the default gpu partition is composed of 4 GPUs and 40 CPU cores: You can therefore reserve 1/4 of the node memory per GPU by requiring 10 CPUs (i.e. 1/4 of 40 cores) per GPU:
--cpus-per-task=10 # reserves 1/4 of the node memory per GPU (default gpu partition)
In this way, you have access to 3.9*10 = 39 GB of node memory per GPU if hyperthreading is deactivated (if not, half of that memory).
Note that you can request more than 39 GB of memory per GPU if necessary (need more memory per process). But this will generate overcharging for the job (allocation by Slurm of additional GPU resources that are not used): The GPU hours consumed by the job will then be calculated as if you had reserved more GPUs for the job but without them being used and therefore without benefit for the computation times (see comments at the bottom of the page).
On the gpu_p2 partition
The gpu_p2
partition is divided into two subpartitions:
- The
gpu_p2s
subpartition with 360 GB usable memory per node - The
gpu_p2l
subpartition with 720 GB usable memory per node
As each node of this partition contains 24 CPU cores, the memory allocation is automatically determined on the basis of:
- 360/24 = 15 GB per reserved CPU core on the
gpu_p2s
partition if hyperthreading is deactivated (Slurm option--hint=nomultithread
) - 720/24 = 30 GB per reserved CPU core on the
gpu_p2l
partition if hyperthreading is deactivated
Each compute node of the gpu_p2
partition contains 8 GPUs and 24 CPU cores: You can reserve 1/8 of the node memory per GPU by reserving 3 CPUs (i.e. 1/8 of 24 cores) per GPU:
--cpus-per-task=3 # reserves 1/8 of the node memory per GPU (gpu_p2 partition)
In this way, you have access to:
- 15*3 = 45 GB of node memory per GPU on the
gpu_p2s
partition - 30*3 = 90 GB of node memory per GPU on the
gpu_p2l
partition
if hyperthreading is deactivated (if not, half of that memory).
Note that you can request more than 45 GB (with gpu_p2s
) or 90 GB (with gpu_p2l
) of memory per GPU if necessary (need more memory per process). But this will generate overcharging of the job (allocation by Slurm of additional GPU resources that are not used): the GPU hours consumed by the job will then be calculated as if you had reserved more GPUs for the job but without them being used and therefore without benefit for the computation times (see comments at the bottom of the page).
On the gpu_p5 partition
Each node of the gpu_p5
partition offers 468 GB of usable memory and 64 CPU cores. The memory allocation is therefore computed automatically on the basis of:
- 468/64 = 7.3 GB per reserved CPU core if hyperthreading is deactivated (Slurm option
--hint=nomultithread
).
Each compute node of the gpu_p5
partition is composed of 8 GPUs and 64 CPU cores: You can therefore reserve 1/8 of the node memory per GPU by requiring 8 CPUs (i.e. 1/8 of 64 cores) per GPU:
--cpus-per-task=8 # reserves 1/8 of the node memory per GPU (gpu_p5 partition)
In this way, you have access to 8*7.3 = 58 GB of node memory per GPU if hyperthreading is deactivated (if not, half of that memory).
Note that you can request more than 58 GB of memory per GPU if necessary (need more memory per process). But this will generate overcharging for the job (allocation by Slurm of additional GPU resources that are not used): The GPU hours consumed by the job will then be calculated as if you had reserved more GPUs for the job but without them being used and therefore without benefit for the computation times (see comments at the bottom of the page).
Comments
- You can ask for more memory per GPU by increasing the value of
--cpus-per-task
as long as it does not exceed the total amount of memory available on the node. Be careful, the computing hours are counted proportionately. For example, if you ask for 1 GPU on the default gpu partition by specifying--ntasks=1 --gres=gpu:1 --cpus-per-task=20
, the invoice will be the same as for a job running on 2 GPUs due to--cpus-per-task=20
. - If you reserve a node in exclusive mode, you have access to the entire memory capacity of the node, regardless of the value of
--cpus-per-task
. The invoice will be the same as for a job running on an entire node.
- The amount of memory allocated to your job can be seen by running the command:
$ scontrol show job $JOBID # searches for value of the "mem" variable
Important: While the job is in the wait queue (PENDING), Slurm estimates the memory allocated to a job based on logical cores. Therefore, if you have reserved physical cores (with
--hint=nomultithread
), the value indicated can be two times inferior to the expected value. This is updated and becomes correct when the job is started. - To reserve resources on the
prepost
partition, you may refer to: Memory allocation with Slurm on CPU partitions. The GPU which is available on each node of theprepost
partition is automatically allocated to you without needing to specify the--gres=gpu:1
option.