Overview

License Terms

PMD is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

Installation & Tests

  1. Load the last PMD version (New release 1.2.6 since December 2004)
  2. Before installing PMD, check for MPI (manufacturer or LAM or MPICH implementation) , MPIBLACS, BLAS, LAPACK and ScaLAPACK libraries on your system.
  3. zcat pmd-(version).tar.gz | tar -xvf -
  4. cd PMD/MakeDep
    Configure the "Make.inc" file
  5. cd .. ; make
    Compile the PMD modules
  6. make examples
    Compile and load all the examples
  7. Run the examples

Getting started

PMD is a parallel Fortran 90 module which allows to solve positive definite elliptic linear second order operator systems. To access the PMD subroutines a "USE PMD" statement must be included in all scoping units. The pseudocode example hereafter shows the general calling sequence to the PMD subroutines to solve the whole problem assuming given local operator matrix and physical boundary conditions:

PROGRAM myprog
  USE PMD

  !... Declare and initialize User, MPI and PMD type objects
  ....
    
  !... Initialize PMD
  CALL PMD_Init_1DD(...)
    
  !... Set Local Operator Matrix and Boundary Conditions
  ....
  !... Link User Local Operator Matrix to PMD
  CALL PMD_Operator_1DD(...)
    
  !... Build the Schur Matrix
  CALL PMD_Schur_1DD(...)
    
  !... Factor the Schur Matrix
  CALL PMD_Schur_Factor_1DD(...)
    
  !... Solve the interface problem
  CALL PMD_Solve_1DD(...)
  
  !... End PMD
  CALL PMD_End_1DD(...)
  ....
END PROGRAM myprog

PMD implements a non-overlapped domain decomposition based on the dual-Schur complement. The Schur matrix is built using a parallel influence matrix technique. At this time, PMD implements a 1D Domain decomposition for 1D and 2D problems. The interface problem can be solved using a direct or an iterative solver. PMD is built on top of MPI to ensure data exchange between the subdomains. Full source code examples using PMD are presented here.

Performances

Timings and Flops have been measured on three parallel machines which hardware and software characteristics are given below:

Machines IBM SP2 SGI/CRAY T3E Fujitsu VPP300
#Processors127 256 8
Processor IBM P2SC thin Dec Alpha EV5 Fujitsu Vector
Processor memory 256 MB 128 MB 2048 MB
Memory data cache 1 level cache: 128 KB 2 level caches:8 KB + 96 KB 64 Kb Scalar data cache
Processor peak performance 500 MFlops/sec 600 MFlops/sec 2200 MFlops/sec
Interconnexion network Omega Multilevel Switch Bidirectional 3D Torus Bidirectional Crossbar
90 MB/sec 600 MB/sec 615 MB/sec
Operating system AIX 4.2 Unicos/mk 2.0 UXP/V V10L20
Compiler xlf (version 4.1) f90 (version 3.0) frt (version L97121)
Optimization options -O3 -O3,unroll2,pipeline3 -Of -Wv,-Of

Table 1: Hardware and software characteristics


Our test case was the 2D Laplace problem. We used NAG FFT to solve the local problem. The table below shows how the elapsed execution time is spread on the different PMD routines and the amount of floating point operations per second per processor on each parallel machines:

Machines IBM SP2 SGI/CRAY T3E-600 Fujitsu VPP300
Build Schur matrix (sec.) 2979.6 1340. 63.
Factor Schur matrix (sec.) 74.7 183.7 10.
Solve (sec.) 3.6 1.8 0.1
Total elapsed time (sec.) 3125. 1525.6 74.
Communication time (sec.) 15.5 0.6 0.2
Total MFlops/sec./processor 22. 42. 859.

Table 2: Performances of the 2D Laplace problem on 4 processors. Mesh size: Nx=801,Ny=601


The scalability can be evaluated assuming a fixed global mesh size for any number of subdomains (or processes). The curves below shows the elapsed execution time versus the number of processors:

[PMD Scalability at fixed global mesh]

Especially noteworthy is that we need almost 64 RISC processors to perform the problem as fast as 4 vector processors.

The curves below shows timings at fixed local mesh size.

[PMD Scalability at fixed local mesh size]


In such situation, as we know, mono-domain solvers usually provide timings wich evolve as N x f(N) to be compared to timings which evolve here as c x N, where the slope "c", as we notice, remains constant for any global mesh size N. For further details please read these papers.

Restrictions