Each table below compares timings between GFT 1.0.1 and FFTW 2.1.3 on different scalar
architectures except for NEC-SX5 vector machine where provider ASL
FFT is compared to GFT. Timings concern only 1D complex-complex
FFTs of bases 2, 3 and 5.
IBM-SP4
- IBM Power 4 scalar processor.
- 3 cache levels (L2: 1.5 MB, L3: 32 MB)
- Processor speed: 1.3 GHz clock period.
- Processor peak performance: 5.2 Gflops.
- XL Fortran 8.1. Optimization flag: -O5
| Size of FFT | N=2**20 | N=3**13 | N=5**8 | N=5**10 |
| FFTW (sec.) | 0.390 | 0.450 | 0.090 | 3.920 |
| GFT (sec.) | 1.180 | 0.880 | 0.190 | 6.340 |
COMPAQ-ALPHA-ES40
- DEC ALPHA EV68 scalar processor.
- 2 cache levels (L2: 8 MB)
- Processor speed: 833 MHz clock period.
- Compaq Fortran Compiler X1.1.1. Optimization flag: -O6
| Size of FFT | N=2**20 | N=3**13 | N=5**8 | N=5**10 |
| FFTW (sec.) | 0.862 | 0.901 | 0.185 | 6.978 |
| GFT (sec.) | 3.272 | 1.618 | 0.322 | 10.625 |
SGI-O2100
- SGI R12000 scalar processor.
- 2 cache levels (L2: 8 MB)
- Processor speed: 350 MHz clock period.
- MIPSpro Fortran Compilers 7.3.1.1m. Optimization flag: -O3
| Size of FFT | N=2**20 | N=3**13 | N=5**8 |
| FFTW (sec.) | 3.633 | 1.788 | 0.405 |
| GFT (sec.) | 16.390 | 5.161 | 0.820 |
NEC-SX5
- NEC vector processor.
- Register length : 256 elements of 64 bit words.
- Processor peak performance : 8 Gflops.
- FORTRAN90/SX 2.0 Rev.253. Optimization flag: -Chopt
Here, instead of FFTW, we
compare complex-complex provider ASL FFT library with the one in GFT.
| Size of FFT | N=2**20 | N=3**13 | N=5**8 | N=5**10 |
| ASL (sec.) | 0.112 | 0.041 | 0.008 | 0.223 |
| GFT (sec.) | 0.069 | 0.095 | 0.030 | 0.863 |
Notes
- On scalar cache memory processors, FFTW delivers better performances
than GFT.
- In all cases, GFT shows better performances on vector processors
than on scalar ones.
- On vector processors, GFT performs better than NEC ASL FFTs when the
size of the FFT is a power of 2.
© CNRS - IDRIS, 23/04/2012