Table des matières
Jean Zay: Eviden BullSequana XH3000, HPE SGI 8600 supercomputer
- copyright Photothèque CNRS/Cyril Frésillon
Jean Zay is a supercomputer having an Eviden BullSequana XH3000 part and a HPE SGI 8600 part forming a total of five partitions: one partition containing scalar nodes (having only CPUs), and four partitions containing accelerated nodes (hybrid nodes equipped with both CPUs and GPUs). All the HPE SGI 8600 compute nodes are interconnected by an Intel Omni-PAth network (OPA) and all the Eviden BullSequana XH3000 compute nodes are interconnected by an Infiniband network. All nodes access a parallel file system with very high bandwidth.
Following three successive extensions, the cumulated peak performance of Jean Zay reached 125.9 Pflop/s starting in July 2024.
For more information, please refer to our documentation concerning the usage of Jean Zay resources.
Hardware description
Access to the various hardware partitions of the machine depends on the type of job submitted (CPU or GPU) and the Slurm partition requested for its execution (see the details of the Slurm CPU partitions and the Slurm GPU partitions).
Scalar partition (or CPU partition)
Without specifying a CPU partition, or with the cpu_p1 partition, you will have access to the following resources:
- 720 scalar compute nodes with:
- 2 Intel Cascade Lake 6248 processors (20 cores at 2.5 GHz), or 40 cores per node
- 192 GB of memory per node
Note: Following the decommissioning of 808 CPU nodes on 5 February 2024, this partition went from 1528 nodes to 720 nodes.
Accelerated partitions (or GPU partitions)
Without indicating a GPU partition, or with the v100-16g or v100-32g constraint, you will have access to the following resources:
- 396 four-GPU accelerated compute nodes with:
- 2 Intel Cascade Lake 6248 processors (20 cores at 2.5 GHz), or 40 cores per node
- 192 GB of memory per node
- 126 nodes with 4 Nvidia Tesla V100 SXM2 16GB GPUs (with v100-16g)
- 270 nodes with 4 Nvidia Tesla V100 SXM2 32GB GPUs (with v100-32g)
Note: Following the decommissioning of 220 4-GPU V100 16 GB nodes (v100-16g) on 5 February 2024, this partition went from 616 nodes to 396 nodes.
With the gpu_p2, gpu_p2s or gpu_p2l partitions, you will have access to the following resources:
- 31 eight-GPU accelerated compute nodes with:
- 2 Intel Cascade Lake 6226 processors (12 cores at 2.7 GHz), or 24 cores per node
- 20 nodes with 384 GB of memory (with gpu_p2 or gpu_p2s)
- 11 nodes with 768 GB of memory (with gpu_p2 or gpu_p2l)
- 8 Nvidia Tesla V100 SXM2 32 GB GPUs
With the gpu_p5 partition (extension of June 2022 and accessible only with A100 GPU hours), you will have access to the following resources:
- 52 eight-GPU accelerated compute nodes with:
- 2 AMD Milan EPYC 7543 processors (32 cores at 2.80 GHz), or 64 cores per node
- 512 GB of memory per node
- 8 Nvidia A100 SXM4 80 GB GPUs
With the gpu_p6 partition (extension of summer 2024 and accessible only with H100 GPU hours), you will have access to the following resources:
- 364 four-GPU accelerated compute nodes with:
- 2 Intel Xeon Platinum 8468 processors (48 cores at 2,10 GHz) or 96 cores per node
- 512 GB of memory per node
- 4 Nvidia H100 SXM5 80 GB GPUs
Pre- and post-processing
With the prepost partition, you will have access to the following resources:
- 4 pre- and post-processing large memory nodes with:
- 4 Intel Skylake 6132 processors (12 cores at 3.2 GHz), or 48 cores per node
- 3 TB of memory per node
- 1 Nvidia Tesla V100 GPU
- An internal NVMe 1.5 TB disk
Visualization
With the visu partition, you will have access to the following resources:
- 5 scalar-type visualization nodes with:
- 2 Intel Cascade Lake 6248 processors (20 cores at 2.5 GHz), or 40 cores per node
- 192 GB of memory per node
- 1 Nvidia Quadro P6000 GPU
Compilation
With the compil partition, you will have access to the following resources:
- 4 pre- and post-processing large memory nodes (see above)
- 3 compilation nodes with:
- 1 Intel(R) Xeon(R) Silver 4114 processor (10 cores at 2.20 GHz)
- 96 GB of memory per node
Archiving
With the archive partition, you will have access to the following resources:
- 4 pre- and post-processing nodes (see above)
Additional characteristics
- Cumulated peak performance of 36.85 Pflop/s (until 5 February 2024)
- Omni-PAth interconnection network 100 Gb/s : 1 link per scalar node and 4 links per converged node
- IBM's Spectrum Scale parallel file system (ex-GPFS)
- Parallel storage device with a capacity of 2.5 PB SSD disks (GridScaler GS18K SSD) following the 2020 summer extension.
- Parallel storage device with disks having more than 30 PB capacity
- 5 frontal nodes
- 2 Intel Cascade Lake 6248 processors (20 cores at 2.5 GHz), or 40 cores per node
- 192 GB of memory per node
Basic Software description
Operating environment
- Red Hat version 8.6 (since 22/11/2023)
- Slurm version 23.02.6 (since 24/10/2023)