License Terms
PMD is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later
version.
This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
Installation & Tests
- Load the last PMD version ()
- Before installing PMD, check for MPI (manufacturer or
LAM or
MPICH implementation) ,
MPIBLACS,
BLAS,
LAPACK and
ScaLAPACK libraries on your
system.
- zcat pmd-(version).tar.gz | tar -xvf -
- cd PMD/MakeDep
Configure the "Make.inc" file
- cd .. ; make
Compile the PMD modules
- make examples
Compile and load all the examples
- Run the examples
Getting started
PMD is a parallel Fortran 90
module which allows to solve positive definite elliptic linear second order operator systems. To access the PMD subroutines a "USE PMD" statement must be included in all scoping units. The pseudocode example hereafter shows the general calling sequence to the PMD subroutines to solve the whole problem assuming given local operator matrix and physical boundary conditions:
PROGRAM myprog
USE PMD
!... Declare and initialize User, MPI and PMD type objects
....
!... Initialize PMD
CALL PMD_Init_1DD(...)
!... Set Local Operator Matrix and Boundary Conditions
....
!... Link User Local Operator Matrix to PMD
CALL PMD_Operator_1DD(...)
!... Build the Schur Matrix
CALL PMD_Schur_1DD(...)
!... Factor the Schur Matrix
CALL PMD_Schur_Factor_1DD(...)
!... Solve the interface problem
CALL PMD_Solve_1DD(...)
!... End PMD
CALL PMD_End_1DD(...)
....
END PROGRAM myprog
|
PMD implements a non-overlapped domain decomposition based on the dual-Schur
complement. The Schur matrix is built using a parallel influence matrix technique. At
this time, PMD implements a 1D Domain decomposition for 1D and 2D problems. The interface problem can be solved using a direct or an iterative solver. PMD is built on top of MPI to ensure data exchange between the subdomains. Full source code examples using PMD are presented here.
Performances
Timings and Flops have been measured on three parallel machines which
hardware and software characteristics are given below:
| Machines | IBM SP2 | SGI/CRAY T3E | Fujitsu VPP300
|
| #Processors | 127 | 256 | 8
|
| Processor | IBM P2SC thin | Dec Alpha EV5 | Fujitsu Vector
|
| Processor memory | 256 MB | 128 MB | 2048 MB
|
| Memory data cache | 1 level cache: 128 KB | 2 level caches:8 KB + 96 KB | 64 Kb Scalar data cache |
| Processor peak performance | 500 MFlops/sec | 600 MFlops/sec | 2200 MFlops/sec |
| Interconnexion network | Omega Multilevel Switch | Bidirectional 3D Torus | Bidirectional Crossbar |
90 MB/sec | 600 MB/sec | 615 MB/sec |
| Operating system | AIX 4.2 | Unicos/mk 2.0 | UXP/V V10L20 |
| Compiler | xlf (version 4.1) | f90 (version 3.0) | frt (version L97121) |
| Optimization options | -O3 | -O3,unroll2,pipeline3 | -Of -Wv,-Of |
Table 1: Hardware and software characteristics
Our test case was the 2D Laplace problem. We used NAG FFT to solve the local problem. The table below shows how the elapsed execution time is spread on the different PMD routines and the amount of floating point operations per second per processor on each parallel machines:
| Machines | IBM SP2 | SGI/CRAY T3E-600 | Fujitsu VPP300 |
| Build Schur matrix (sec.) | 2979.6 | 1340. | 63. |
| Factor Schur matrix (sec.) | 74.7 | 183.7 | 10. |
| Solve (sec.) | 3.6 | 1.8 | 0.1 |
| Total elapsed time (sec.) | 3125. | 1525.6 | 74. |
| Communication time (sec.) | 15.5 | 0.6 | 0.2 |
| Total MFlops/sec./processor | 22. | 42. | 859. |
Table 2: Performances of the 2D Laplace problem on 4 processors. Mesh size: Nx=801,Ny=601
The scalability can be evaluated assuming a fixed global mesh size for any
number of subdomains (or processes). The curves below shows the elapsed
execution time versus the number of processors:
Especially noteworthy is that we need almost 64 RISC processors to
perform the problem as fast as 4 vector processors.
The curves below shows timings at fixed local mesh size.
In such situation, as we know, mono-domain solvers usually provide timings
wich evolve as N x f(N) to be compared to timings which evolve here as c x N,
where the slope "c", as we notice, remains constant for any global mesh size
N. For further details please read these papers.
Restrictions
- The computational domain must be simply connected.
- Along the domain decomposition axis, physical boundary
conditions are assumed to be non-periodic.
- The current version of PMD stands for 1D Domain
decomposition only which means that subdomain interfaces cannot cross.
- The interface normal vectors must be parallel to the domain decomposition
axis.
- It is up to the user to compute the interface first normal
derivatives along the domain decomposition axis.
- When using direct parallel solvers, which here are based on the
ScaLAPACK library, the BLACS process grid must be
square. This means that if Np denotes the process grid dimension,
then the size of the PMD process group must be equal to Np**2.
However this restriction doesn't stand when using parallel
iterative solvers as PCG and Bi-CGstab algorithms.
- Parallel PCG and Bi-CGstab methods, has an inconvenient property which is
the number of iterations to convergence increases as the number of
subdomains grows.
© CNRS - IDRIS, 13/01/2012