Quick benchmarking AHA code

Here I describe a very quick (and dirty) comparison of the AHA Model performance when built with Intel Fortran and GNU Gfortran.

The AHA Model SVN rev. 6066 (HG: 4e24c30b253c)
Single (first) generation, one light cycle (28 time steps)
Compared systems: (a) an older Linux Dell desktop, (b) cloud based Linux server, (c) a fairly modern 4-core Windows 7 desktop.
All systems are running on Intel CPUs
Maximum compiler optimisation (O3)

Build options for ifort and gfortran:

ifort -sox -parallel -O3 -static -heap-arrays -fp-model fast=2 -xHost -finline-functions

gfortran -O3 -march=native -funroll-loops -fforce-addr -static-libgfortran -static -static-libgcc

Disclaimer: These data and results cannot be used for a general comparison of GNU and Intel compilers: the analysis is vary limited, crude and applies solely to the AHA Model code.

GNU Fortran compiler:

Linux v.4.9.2
Windows v.5.2.0

Intel Fortran compiler:

Linux v.18.0.0
Windows v.17.0.1.143

The results were very consistent across hosts, with very small variability across repeated runs. Different Linux hosts showed virtually identical patterns and differed only in overall performance. The patterns of the results depend on the OS.

Linux

Fig.1

Note: Computation time, smaller is better. P parallel, NP not parallel.

GNU gfortran is significantly faster
Intel automatic parallelization (Intel P versus Intel NP) reduces performance of the AHA Model code.

Windows 7

Fig.2

Note: Computation time, smaller is better. P parallel, NP not parallel.

Intel Fortran with automatic parallelization shows the best performance for the AHA Model on Windows

A puzzle

The pattern found on Linux is puzzling: Does Intel Fortran compiler really offer a higher performance in FPU-intensive computations?

It is often assumed that Intel compilers offer better performance, especially on Intel hardware.

NASA NAS kernel benchmark

Therefore, another benchmark was done on Linux using the NASA NAS kernel benchmark:

Bailey, David H. & Barton, John T. (1985). The NAS kernel benchmark program. Moffett Field, California, NASA Ames Research Center

Fig.3

Note: Performance, higher is better.

As expected, Intel Fortran compiler shows overall a better performance on the NAS benchmark ...

Fig.4

Note: Performance, higher is better.

However, when different computational tasks are compared individually, the pattern is not so obvious GNU gfortran shows a (very) high benefit on certain computational tasks.

This is supported by the Polyhedron benchmark comparisons of various Fortran compilers (see here).

Conclusions

Different compilers (with different approaches to machine code optimisation) could sometimes be best suited for some very specific computational algorithms.
The success and performance of automatic vectorization depend on the platform/operating system. Intel automatic parallelization is not a “silver bullet” and might reduce performance (there are limits and an overhead to parallelization), especially on normal desktop (i.e. non-massively parallel) hardware.
Intel Fortran compiler does not seem to work very well with massively object-oriented Fortran code. Intel parallel manual tells that there must be no function and subroutine calls within the loop that is auto-parallelised unless they are inlined (e.g. Requirements for Vectorizable Loops). Such a requirement is hardly possible to fulfill in object oriented code.

The AHA Model