Flash Info No 2024-27
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.
[English version below]
Hello,
Following today's maintenance (Tuesday 1st October), several changes may impact you.
- QoS name changes for the A100 partition
To better manage resource sharing on the machine, specific QoS have been defined for the A100 partition. If you explicitly used the QoS "qos_gpu-t3" or "qos_gpu-dev" in your job submissions targeting this partition, you will need to use "qos_gpu_a100-t3" or "qos_gpu_a100-dev" instead. The QoS "qos_gpu_a100-t3" is used by default and can be omitted.
The CPU and V100 partitions are not affected by this change.
The documentation has been updated accordingly: http://www.idris.fr/jean-zay/gpu/jean-zay-gpu-exec_partition_slurm.html#les_qos_disponibles.
- Use of QoS via JupyterHub
If you wish to specify a QoS when using the Slurm launcher on JupyterHub, you will now need to specify it manually in the "Extra #SBATCH directives" field.
- JupyterHub IP address change
The IP address of our JupyterHub instance has been modified. It is now 130.84.132.56. This change may impact you if your organisation applies IP address filtering for outgoing connections. If you encounter difficulties connecting to JupyterHub, we suggest contacting your IT service to inform them of this change.
As a reminder, the range of IP addresses used for IDRIS machines and services is as follows: 130.84.132.0/23. We recommend authorising the entire range rather than specific IP addresses to avoid being affected by future internal changes to our infrastructure.
- Opening of the H100 partition
Users who have already obtained H100 hours can now use them. You can refer to the example below:
#!/bin/bash
#SBATCH --job-name=mon_travail # nom du job
#SBATCH -A xyz@h100 # comptabilite a utiliser, avec xyz le trigramme de votre projet
#SBATCH -C h100 # pour cibler les noeuds H100
# Ici, reservation de 3x24=72 CPU (pour 3 taches) et de 3 GPU (1 GPU par tache) sur un seul noeud :
#SBATCH --nodes=1 # nombre de noeud
#SBATCH --ntasks-per-node=3 # nombre de tache MPI par noeud (= ici nombre de GPU par noeud)
#SBATCH --gres=gpu:3 # nombre de GPU par noeud (max 4 pour les noeuds H100)
# Sachant qu'ici on ne reserve qu'un seul GPU par tache (soit 1/4 des GPUs),
# l'ideal est de reserver 1/4 des CPU du noeud pour chaque tache:
#SBATCH --cpus-per-task=24 # nombre de CPU par tache (1/4 des CPUs ici)
# /!\ Attention, "multithread" fait reference a l'hyperthreading dans la terminologie Slurm
#SBATCH --hint=nomultithread # hyperthreading desactive
Note that the default modules are not compatible with the H100 partition. To use the software environment specific to this partition, you must load the "arch/h100" module: http://www.idris.fr/jean-zay/cpu/jean-zay-cpu-doc_module.html#modules_compatibles_avec_la_partition_gpu_p6. This must be done in your submission scripts as well as in your terminal if you need to compile codes.
If you do not yet have H100 hours, the project manager can make a request on the eDARI portal if necessary.
Do not hesitate to contact assist@idris.fr if needed.
Best regards, The IDRIS support team
Dear Jean Zay user,
Several changes might affect you after today's maintenance operations (Tuesday October 1st):
- QoS name changes for the A100 partition
In order to more precisely manage the resource sharing of the machine, specific QoS have been defined for the A100 partition. If you used to explicitly specify "qos_gpu-t3" or "qos_gpu-dev" in your Slurm jobs targeting the A100 partition, you now have to use "qos_gpu_a100-t3" or "qos_gpu_a100-dev" instead. Note that the "qos_gpu_a100-t3" QoS is used by default and may be omitted.
The CPU and V100 partitions are not affected by these changes.
The online documentation has been updated: http://www.idris.fr/eng/jean-zay/gpu/jean-zay-gpu-exec_partition_slurm-eng.html#available_qos
- Use of QoS through JupyterHub
If you wish to specify a QoS when using Slurm on JupyterHub, you now have to do it manually in the "Extra #SBATCH directives" field.
- JupyterHub IP address change
The IP address of our JupyterHub instance has been modified. It is now 130.84.132.56. This change might impact you if your institution applies an IP address filtering of outgoing connections. If you run into difficulties when connecting to JupyterHub, we invite you to contact your local administrator to mention this change.
As a reminder, the set of IP addresses used for the IDRIS machines and services is the following: 130.84.132.0/23. We recommend authorising the complete set rather than specific IP addresses so as not to be affected by potential future internal changes of our infrastructure.
- Opening of the H100 partition
Users who were already granted H100 computing hours may now use them. An example submission script is as follows:
#!/bin/bash
#SBATCH --job-name=my_job # job name
#SBATCH -A xyz@h100 # account to use, with xyz the 3 letter code of your project
#SBATCH -C h100 # to target H100 nodes
# Example reservation of 3x24=72 CPU (for 3 tasks) and 3 GPU (1 GPU per task) on one node:
#SBATCH --nodes=1 # number of nodes
#SBATCH --ntasks-per-node=3 # number of MPI tasks per node (= number of GPU requested per node here)
#SBATCH --gres=gpu:3 # number of GPU requested per node (max. 4 for H100 nodes)
# Since here only one GPU per task is requested (i.e., 1/4 of the available GPUs)
# the best way to proceed is to book 1/4 of the node's CPU for each task:
#SBATCH --cpus-per-task=24 # number of CPU per task (1/4 of the CPUs here)
# /!\ Caution, "multithread" in Slurm vocabulary refers to hyperthreading.
#SBATCH --hint=nomultithread # hyperthreading deactived
Note that the default modules are not compatible with the H100 partition. In order to use the software environment dedicated to this partition, you need to load the "arch/h100" module: http://www.idris.fr/eng/jean-zay/cpu/jean-zay-cpu-doc_module-eng.html#modules_compatible_with_gpu_p6_partition. This is needed for your submission scripts but also in your shell when compiling codes.
If you do not have H100 computing hours yet, your project manager may ask for supplementary hours ("au fil de l'eau") on the eDARI portal if necessary.
Do not hesitate to contact assist@idris.fr if needed.
Best regards, The IDRIS support team