## Performances |

Timings and Flops have been measured on three parallel machines which
hardware and software characteristics are given below:

Machines | IBM SP2 | SGI/CRAY T3E | Fujitsu VPP300 |

#Processors | 127 | 256 | 8 |

Processor | IBM P2SC thin | Dec Alpha EV5 | Fujitsu Vector |

Processor memory | 256 MB | 128 MB | 2048 MB |

Memory data cache | 1 level cache: 128 KB | 2 level caches:8 KB + 96 KB | 64 Kb Scalar data cache |

Processor peak performance | 500 MFlops/sec | 600 MFlops/sec | 2200 MFlops/sec |

Interconnexion network | Omega Multilevel Switch | Bidirectional 3D Torus | Bidirectional Crossbar |

90 MB/sec | 600 MB/sec | 615 MB/sec | |

Operating system | AIX 4.2 | Unicos/mk 2.0 | UXP/V V10L20 |

Compiler | xlf (version 4.1) | f90 (version 3.0) | frt (version L97121) |

Optimization options | -O3 | -O3,unroll2,pipeline3 | -Of -Wv,-Of |

Our test case is the 2D Laplace problem. We used NAG FFT to solve the local problem. The table below shows how the elapsed execution time is spread on the different PMD routines and the amount of floating point operations per second per processor on each parallel machine:

Machines | IBM SP2 | SGI/CRAY T3E-600 | Fujitsu VPP300 |

Build Schur matrix (sec.) | 2979.6 | 1340. | 63. |

Factor Schur matrix (sec.) | 74.7 | 183.7 | 10. |

Solve (sec.) | 3.6 | 1.8 | 0.1 |

Total elapsed time (sec.) | 3125. | 1525.6 | 74. |

Communication time (sec.) | 15.5 | 0.6 | 0.2 |

Total MFlops/sec./processor | 22. | 42. | 859. |

The scalability can be evaluated assuming a fixed global mesh size for any number of subdomains (or processes). The curves below shows the elapsed execution time versus the number of processors:

Especially noteworthy is that we need almost 64 RISC processors to perform the problem as fast as 4 vector processors.

The curves below shows timings at fixed local mesh size.

In such situation, as we know, mono-domain solvers usually provide timings wich evolve as N x f(N) to be compared to timings which evolve here as c x N, where the slope "c", as we notice, remains constant for any global mesh size N. For further details please read this paper.