Performances |
Timings and Flops have been measured on three parallel machines which
hardware and software characteristics are given below:
| Machines | IBM SP2 | SGI/CRAY T3E | Fujitsu VPP300 |
| #Processors | 127 | 256 | 8 |
| Processor | IBM P2SC thin | Dec Alpha EV5 | Fujitsu Vector |
| Processor memory | 256 MB | 128 MB | 2048 MB |
| Memory data cache | 1 level cache: 128 KB | 2 level caches:8 KB + 96 KB | 64 Kb Scalar data cache |
| Processor peak performance | 500 MFlops/sec | 600 MFlops/sec | 2200 MFlops/sec |
| Interconnexion network | Omega Multilevel Switch | Bidirectional 3D Torus | Bidirectional Crossbar | 90 MB/sec | 600 MB/sec | 615 MB/sec |
| Operating system | AIX 4.2 | Unicos/mk 2.0 | UXP/V V10L20 |
| Compiler | xlf (version 4.1) | f90 (version 3.0) | frt (version L97121) |
| Optimization options | -O3 | -O3,unroll2,pipeline3 | -Of -Wv,-Of |
| Machines | IBM SP2 | SGI/CRAY T3E-600 | Fujitsu VPP300 |
| Build Schur matrix (sec.) | 2979.6 | 1340. | 63. |
| Factor Schur matrix (sec.) | 74.7 | 183.7 | 10. |
| Solve (sec.) | 3.6 | 1.8 | 0.1 |
| Total elapsed time (sec.) | 3125. | 1525.6 | 74. |
| Communication time (sec.) | 15.5 | 0.6 | 0.2 |
| Total MFlops/sec./processor | 22. | 42. | 859. |
![]() |
![]() |