Ada, IBM x3750: Hardware and software configurations

Ada

  • 332 x3750-M4 nodes, four 8-core Intel Sandy Bridge processors (sockets) at 2.7 GHz, used for the calculations.
  • Four IBM x3850 nodes, four 8-core Intel Westmere processors at 2.67 GHz, used for pre- and post-processing.
  • Two  x3750-M4 nodes, four 8-core Intel Sandy Bridge processors (sockets), used as a  front-end.
  • 49.25 terabytes of total memory.
  • Cumulated power peak of 233 Tflop/s.
  • InfiniBand FDR10 Mellanox Network.

For more details concerning the use of this machine's resources, see here.

Detailed description of machine

The SMP supercomputer is based on IBM x3750-M4 technology:  Each computing node contains four 8-core Intel Sandy Bridge E5-4650 processors (32 cores per node) with a clock speed of 2.7 GHz and 2 connections to the InfiniBand FDR10 Mellanox network.

The machine has a total of 332 x3750-M4 computing nodes, two x3750-M4 access nodes, and four x3850 pre- and post-processing nodes, offering a cumulative theoretical power of 233 Tflop/s:

  • 304 x3750-M4 diskless nodes (Sandy Bridge), each having 32 cores and 128 GBs of memory (4 GB/core).
  • 28 x3750-M4 nodes (Sandy Bridge), each having 32 cores and 256 GBs of memory (8 GB/core), with the following characteristics:
    • 16 diskless nodes;
    • 12 nodes with each having eight 300-GB disks for jobs requiring the writing of voluminous temporary files.
  • Four IBM x3850 nodes (Westmere), each having 32 cores, 1 TB of memory (32 GB/core) and eight 600-GB internal discs used for pre- and post-processing.
  • Two x3750-M4 nodes (Sandy Bridge), each having 32 cores, 128 GB of memory (4 GB/core) and four disks of 300 GB.
  • An InfiniBand FDR10 Mellanox interconnection network at two switch levels:
    • 1st level:  Each compute node is connected via two links to two 36-port switches (10 theoretical GB/s for the 32 cores).  For this reason, it is possible to interconnect 20 nodes with two switches by using 20 ports of each switch.
    • 2nd level:  Each 36-port switch is connected via 16 links to one 648-port switch (using the 16 ports remaining following the 1st level compute node connections).  Note that this 648-port switch also acts as the interconnection with the front-ends, the disk racks (GPFS), the BlueGene/Q racks, and the BlueGene/Q front-ends.
    • This interconnection network is used for the MPI internode communications and the I/O GPFS, offering 1.34 µs latency and 4.1 GB/s link throughput (point to point measures on MPI communication via the MPI Intel library).
    • Each compute node contains 4 sockets.  The 2 links between the different compute nodes are connected to only 2 of the 4 sockets in each compute node (one link to one socket). 
  • Disk space:
    • Be aware that most of the nodes do not have local disks.
    • Only the front-ends (access nodes) possess system disks.  For the other nodes, the system is loaded in memory when the node is booted up.
    • 980 TB shared with the Blue Gene/Q machine.

Sandy Bridge processor characteristics:

  • 8-core:  2 way SMT/core
  • Clock speed:  2.7 GHz
  • L1 data cache:  64 KB
  • L1 cache instructions: 64 KB
  • L2 cache:  4 MB by core (rapidly accessible by the other core)
  • L3 cache:  32 MB shared (one controller by core), accessible at 80 GB/s
  • Loading units (load/store):  2 per core
  • Integer calculation units:  2 per core
  • Floating point calculation units:  2 per core
  • Altivec calculation units:  1 per core

Software description:

  • RedHat version 6.7
  • LoadLeveler 5.1.0.14
  • GPFS version 3.5.0
  • POE version 1.3.0.9

Intel compilers:

  • ifort version 14.0.1.106
  • icc version 14.0.1.106

Scientific Libraries:

  • Intel® Math Kernel Library version 11.1.1