Memory Allocation with Slurm

The Slurm options --mem, --mem-per-cpu and --mem-per-gpu are disabled on Jean Zay because they do not allow for proper configuration of the memory allocation per node of a job. This is therefore determined automatically from the number of CPU reserved per node.

The amount of memory allocated to your job is determined in proportion to the number of reserved CPU cores. To adjust this, you must therefore adjust the number of CPU reserved per task (or process) by specifying the following option in your batch scripts, or during an salloc in interactive mode:

--cpus-per-task=...     # --cpus-per-task=1 by default

The maximum value that can be indicated in --cpus-per-task depends on the number of processes/tasks requested per node (--ntasks-per-node) and the profile of the nodes used (total number of cores per node) which depends on the partition used.

Attention

On the CPU partition, the amount of memory allocated by default with --cpus-per-task=1 is sufficient for most launched jobs. Most users do not need to modify the value of this option. This page is for users who have a higher memory requirement.

For GPU partitions, since --cpus-per-task=1 by default, if you do not change this value as explained below, you will not be able to access the full amount of memory potentially accessible per task/GPU reserved. In particular, you risk quickly running out of memory on the CPU side.

Remark

Note that it can also happen that you run out of memory on the GPU side because they have individual memories whose size varies depending on the type of GPU and therefore the partition used. A solution is then to use more GPUs.

On the default CPU partition (cpu_p1)

The nodes of the default CPU partition cpu_p1 provide access to 156 GB of useful memory, for 40 CPU cores. The memory allocation is determined automatically at:

3.9 GB per CPU core when hyperthreading is disabled (Slurm option --hint=nomultithread)

For example, a job specifying --ntasks=1 --cpus-per-task=5 on the partition cpu_p1 will have access to 1 x 5 x 3.9 GB = 19.5 GB of memory if hyperthreading is disabled (count half otherwise).

Note that if your code is purely sequential or purely MPI, you can thus request more memory per task if necessary (need for more memory per process). But this will result in overcharging of the job (allocation by Slurm of additional CPU resources that are not used): the CPU hours consumed by the job will then be calculated as if you had reserved more CPUs for the job but without them being used and therefore without benefit for the computation times (see the remarks at the bottom of the page).

On the prepost partition

The nodes of the prepost partition provide access to 2.88 TB of useful memory, for 48 CPU cores. The memory allocation is determined automatically at:

60 GB per CPU core when hyperthreading is disabled (Slurm option --hint=nomultithread)

For example, a job specifying --ntasks=1 --cpus-per-task=12 on the partition prepost will have access to 1 x 12 x 60 GB = 720 GB of memory if hyperthreading is disabled (count half otherwise).

On the default quadri-GPU V100 partition (gpu_p13)

The nodes of the default quadri-GPU V100 partition gpu_p13 have 156 GB of useful memory and 40 CPU cores. The memory allocation is therefore determined automatically at:

156/40 = 3.9 GB per reserved CPU core when hyperthreading is disabled (Slurm option --hint=nomultithread)

Each compute node of the default GPU partition contains 4 GPUs and 40 CPU cores. You can therefore reserve 1/4 of the node's memory per GPU by requesting 10 CPUs (i.e. 1/4 of the 40 cores) per GPU:

--cpus-per-task=10     # reserving 1/4 of the memory per GPU

You thus reserve 3.9*10 = 39 GB of the node's memory per GPU, when hyperthreading is disabled (count half otherwise).

Note that you can request more than 39 GB of memory per GPU if necessary (need for more memory per process). But this will result in overcharging of the job (allocation by Slurm of additional GPU resources that are not used): the GPU hours consumed by the job will then be calculated as if you had reserved more GPUs for the job but without them being used and therefore without benefit for the computation times (see the remarks at the bottom of the page).

On the octo-GPU V100 partition (gpu_p2)

The octo-GPU V100 partition gpu_p2 is subdivided into two sub-partitions:

the sub-partition gpu_p2s with 360 GB of useful memory per node
the sub-partition gpu_p2l with 720 GB of useful memory per node

As each node of this partition contains 24 CPU cores, the memory allocation is determined automatically at:

320/24 = 15 GB per reserved CPU core on the partition gpu_p2s, when hyperthreading is disabled (Slurm option --hint=nomultithread)
720/24 = 30 GB per reserved CPU core on the partition gpu_p2l, when hyperthreading is disabled

Each compute node of the gpu_p2 partition contains 8 GPUs and 24 CPU cores. You can therefore reserve 1/8 of the node's memory per GPU by requesting 3 CPUs (i.e. 1/8 of the 24 cores) per GPU:

--cpus-per-task=3     # reserving 1/8 of the memory per GPU

You thus reserve, when hyperthreading is disabled (count the half otherwise):

45 GB of memory per GPU on the sub-partition gpu_p2s
90 GB of memory per GPU on the sub-partition gpu_p2l

Note that you can request more than 45 GB (on gpu_p2s) or 90 GB (on gpu_p2l) of memory per GPU if necessary (need for more memory per process). But this will result in overcharging of the job (allocation by Slurm of additional GPU resources that are not used): the GPU hours consumed by the job will then be calculated as if you had reserved more GPUs for the job but without them being used and therefore without benefit for the computation times (see the remarks at the bottom of the page).

On the octo-GPU A100 partition (gpu_p5)

The nodes of the octo-GPU A100 partition gpu_p5 have 468 GB of useful memory and 64 CPU cores. The memory allocation is therefore determined automatically at:

468/64 = 7.3 GB per reserved CPU core when hyperthreading is disabled (Slurm option --hint=nomultithread)

Each compute node of the gpu_p5 partition contains 8 GPUs and 64 CPU cores. You can therefore reserve 1/8 of the node's memory per GPU by requesting 8 CPUs (i.e. 1/8 of the 64 cores) per GPU:

--cpus-per-task=8     # reserving 1/8 of the memory per GPU

You thus reserve 8*7.3 = 58 GB of the node's memory per GPU, when hyperthreading is disabled (count half otherwise).

Note that you can request more than 58 GB of memory per GPU if necessary (need for more memory per process). But this will result in overcharging of the job (allocation by Slurm of additional GPU resources that are not used): the GPU hours consumed by the job will then be calculated as if you had reserved more GPUs for the job but without them being used and therefore without benefit for the computation times (see the remarks at the bottom of the page).

On the quadri-GPU H100 partition (gpu_p6)

The nodes of the quadri-GPU H100 partition gpu_p6 have 468 GB of useful memory and 96 CPU cores. The memory allocation is therefore determined automatically at:

468/96 = 4.87 GB per reserved CPU core when hyperthreading is disabled (Slurm option --hint=nomultithread)

Each compute node of the partition gpu_p6 contains 4 GPUs and 96 CPU cores. You can therefore reserve 1/4 of the node's memory per GPU by requesting 24 CPUs (i.e. 1/4 of the 96 cores) per GPU:

--cpus-per-task=24    # reserving 1/4 of the memory per GPU

You thus reserve 4*4.87 = 117 GB of the node's memory per GPU, when hyperthreading is disabled (count half otherwise).

Note that you can request more than 117 GB of memory per GPU if necessary (need for more memory per process). But this will result in overcharging of the job (allocation by Slurm of additional GPU resources that are not used): the GPU hours consumed by the job will then be calculated as if you had reserved more GPUs for the job but without them being used and therefore without benefit for the computation times (see the remarks at the bottom of the page).

Remarks

You can reserve more memory by increasing the value of --cpus-per-task, as long as the request does not exceed the total memory available on the node. Be careful, the computing hours will be deducted in proportion, for example:
- on the CPU partition, by specifying the options --ntasks=10 --cpus-per-task=2, 20 CPU cores will be reserved and therefore accounted for your job.
- if you reserve 1 GPU on the default GPU partition with the options --ntasks=1 --gres=gpu:1 --cpus-per-task=20, the equivalent of a job on 2 GPUs will be charged to you because of --cpus-per-task=20.
If you reserve a compute node exclusively, you have access to the entire memory of the node, regardless of the value given to --cpus-per-task. The equivalent of a job on the entire node is then charged to you.
For OpenMP codes: if the value given to the option --cpus-per-task does not coincide with the number of threads on which you wish to run your code, you must specify the environment variable: export OMP_NUM_THREADS=...
The amount of memory allocated for your job is visible by launching the Slurm command:
```
scontrol show job $JOBID # search for the value of the mem variable
```
Be careful, as long as the job is in the queue (status PENDING), Slurm estimates the memory allocated to a job based on logical cores. If you have reserved physical cores (with --hint=nomultithread), the value indicated may therefore be half the expected value. This is updated and becomes correct when the job starts.

On the default CPU partition (cpu_p1)​

On the prepost partition​

On the default quadri-GPU V100 partition (gpu_p13)​

On the octo-GPU V100 partition (gpu_p2)​

On the octo-GPU A100 partition (gpu_p5)​

On the quadri-GPU H100 partition (gpu_p6)​

Remarks​