Jean Zay: HPE SGI 8600 supercomputer

jean-zay-annonce-01.jpg

  • copyright Photothèque CNRS/Cyril Frésillon

Jean Zay is an HPE SGI 8600 computer composed of two partitions: a partition containing scalar nodes, and a partition containing accelerated nodes which are hybrid nodes equipped with both CPUs and GPUs. All the compute nodes are interconnected by an Intel Omni-PAth network (OPA) and access a parallel file system with very high bandwidth.

Following two successive extensions, the cumulated peak performance of Jean Zay reached 36.85 Pflop/s starting in June 2022.

For more information, please refer to our documentation concerning the usage of Jean Zay resources.

Hardware description

Access to the various hardware partitions of the machine depends on the type of job submitted (CPU or GPU) and the Slurm partition requested for its execution (see the details of the Slurm CPU partitions and the Slurm GPU partitions).

Scalar partition (or CPU partition)

Without specifying a CPU partition, or with the cpu_p1 partition, you will have access to the following resources:

  • 720 scalar compute nodes with:
    • 2 Intel Cascade Lake 6248 processors (20 cores at 2.5 GHz), or 40 cores per node
    • 192 GB of memory per node

Note: Following the decommissioning of 808 CPU nodes on 5 February 2024, this partition went from 1528 nodes to 720 nodes.

Accelerated partition (or GPU partition)

Without indicating a GPU partition, or with the v100-16g or v100-32g constraint, you will have access to the following resources:

  • 396 four-GPU accelerated compute nodes with:
    • 2 Intel Cascade Lake 6248 processors (20 cores at 2.5 GHz), or 40 cores per node
    • 192 GB of memory per node
    • 126 nodes with 4 Nvidia Tesla V100 SXM2 16GB GPUs (with v100-16g)
    • 270 nodes with 4 Nvidia Tesla V100 SXM2 32GB GPUs (with v100-32g)

Note: Following the decommissioning of 220 4-GPU V100 16 GB nodes (v100-16g) on 5 February 2024, this partition went from 616 nodes to 396 nodes.

With the gpu_p2, gpu_p2s or gpu_p2l partitions, you will have access to the following resources:

  • 31 eight-GPU accelerated compute nodes with:
    • 2 Intel Cascade Lake 6226 processors (12 cores at 2.7 GHz), or 24 cores per node
    • 20 nodes with 384 GB of memory (with gpu_p2 or gpu_p2s)
    • 11 nodes with 768 GB of memory (with gpu_p2 or gpu_p2l)
    • 8 Nvidia Tesla V100 SXM2 32 GB GPUs

With the gpu_p5 partition (extension of June 2022 and accessible only with A100 GPU hours), you will have access to the following resources:

  • 52 eight-GPU accelerated compute nodes with:
    • 2 AMD Milan EPYC 7543 processors (32 cores at 2.80 GHz), or 64 cores per node
    • 512 GB of memory per node
    • 8 Nvidia A100 SXM4 80 GB GPUs

Pre- and post-processing

With the prepost partition, you will have access to the following resources:

  • 4 pre- and post-processing large memory nodes with:
    • 4 Intel Skylake 6132 processors (12 cores at 3.2 GHz), or 48 cores per node
    • 3 TB of memory per node
    • 1 Nvidia Tesla V100 GPU
    • An internal NVMe 1.5 TB disk

Visualization

With the visu partition, you will have access to the following resources:

  • 5 scalar-type visualization nodes with:
    • 2 Intel Cascade Lake 6248 processors (20 cores at 2.5 GHz), or 40 cores per node
    • 192 GB of memory per node
    • 1 Nvidia Quadro P6000 GPU

Compilation

With the compil partition, you will have access to the following resources:

  • 4 pre- and post-processing large memory nodes (see above)
  • 3 compilation nodes with:
    • 1 Intel(R) Xeon(R) Silver 4114 processor (10 cores at 2.20 GHz)
    • 96 GB of memory per node

Archiving

With the archive partition, you will have access to the following resources:

  • 4 pre- and post-processing nodes (see above)

Additional characteristics

  • Cumulated peak performance of 36.85 Pflop/s (until 5 February 2024)
  • Omni-PAth interconnection network 100 Gb/s : 1 link per scalar node and 4 links per converged node
  • IBM's Spectrum Scale parallel file system (ex-GPFS)
  • Parallel storage device with a capacity of 2.5 PB SSD disks (GridScaler GS18K SSD) following the 2020 summer extension.
  • Parallel storage device with disks having more than 30 PB capacity
  • 5 frontal nodes
    • 2 Intel Cascade Lake 6248 processors (20 cores at 2.5 GHz), or 40 cores per node
    • 192 GB of memory per node

Basic Software description

Operating environment

  • Red Hat version 8.6 (since 22/11/2023)
  • Slurm version 23.02.6 (since 24/10/2023)

Compilers

  • Intel compilers ​ifort​ and ​icc with Intel(R) Math Kernel Library
  • PGI compilers ​pgfortran​ and ​pgcc