Turing: VASP

Introduction

VASP: Vienna Ab initio Simulation Package. Ab initio quantum chemistry software.

Documentation can be found on the VASP official site. You can also find information in the discussion forum.

Availability

  • 5.3.3
  • 5.3.5 (default version)
  • 5.4.1

This software is available on Turing in parallel (MPI).

Three executable files are supplied, to be chosen in function of your needs :

  • vasp: standard version
  • vasp_gamma: version optimised for the calculations at gamma point, faster but only allows calculations at gamma point
  • vasp_nc: version for non-collinear magnetic structure and/or spin-orbit coupling calculations (activated by some keywords in INCAR) but it uses more memory than the other versions

Starting from version 5.4.1, the names of the executables files have been officially changed to vasp_std, vasp_gam and vasp_ncl.

Important note: These three versions do not necessarily offer identical or acceptable performance on Turing. Nevertheless, they are complimentary, depending on the types of calculations. Consult the section below concerning the utilisation and performance of VASP software on Turing.

Attention: Access to this application is reserved to users/laboratories having a licence from the developers of VASP. We will verify your status before allowing you to execute files. Contact IDRIS support for more information.

If you have a licence for version 4 but you would like to accesss version 5, you must purchase a new licence. The VASP5.2 licence also opens access to the 5.3 and 5.4 versions.

Launching script

Following is an example of a LoadLeveler submission script for Turing, calculating in the WORKDIR directory:

job.ll
# @ job_name         = job
# @ output           = $(job_name).$(jobid)
# @ error            = $(job_name).$(jobid)
# @ job_type         = bluegene
# @ bg_size          = 128
# @ wall_clock_limit = 1:00:00
# @ queue
 
set -x
 
### Module initialisation ###
module load vasp
 
### Command echoes ###
set -x
 
### Run the calculation ###
runjob --np 2048 --ranks-per-node=16 : $VASP_EXEDIR/vasp_gamma

Following is an example of a LoadLeveler submission script for Turing, calculating in the TMPDIR directory :

job.ll
# @ job_name         = VASP
# @ job_type         = bluegene
# @ output           = $(job_name).$(jobid)
# @ error            = $(job_name).$(jobid)
# @ wall_clock_limit = 01:00:00
# @ bg_size          = 128
# @ queue
 
### Module initialisation ###
module load vasp
 
### Command echoes ###
set -x
 
### Copy to the TMPDIR ###
cp ./* $TMPDIR
 
### Run the calculation ###
cd $TMPDIR
runjob --np 2048 --ranks-per-node=16 : $VASP_EXEDIR/vasp_gamma
 
### Copy to the submission folder ###
cd -
cp $TMPDIR/* .

The module load vasp command loads the VASP default version. If you wish to use a different version, refer to the documentation on the module command.

Parameters specific to the software

Pseudopotentials

Some PAW pseudopotential data sets (version 5.2 for the calculations in LDA and PBE, including the pseudos for the GW calculations) are available in a directory accessible via the environment variable $PSEUDO_VASP. After the module command is executed, you may access this by simply typing:

cd $PSEUDO_VASP

In interactive mode, you can access this from the Turing front-end; alternatively, you may access it through integration into your script; for example:

module load vasp
cd $PSEUDO_VASP/PBE.52
cat H_h/POTCAR O_h/POTCAR > $WORKDIR/mondossier/POTCAR
cd -
runjob --np 2048 --ranks-per-node=16 : $VASP_EXEDIR/vasp_gamma

Utilisation on Turing

The performance of this software on Turing strongly depends on the size of the system studied, the parameters indicated in the input files and the version used. If you don't adapt your usage of VASP correctly, there can be up to a factor of 20 in restitution time!

Generally speaking, a correct system size for a Turing utilisation should be between 1500 and 5000 electronic bands (~500-1600 atoms per periodic cell). Below this, the performance will be very disappointing and above this, there is no guarantee that the system will fit on the memory even with correct usage of Turing.

The best performance, in fact, is obtained by not overloading the compute cores (up to a factor of about 2 in restiution time). This is equivalent to choosing --rank-per-node=16 when mapping the MPI tasks on the command line at the launching of VASP. This mapping attributes 1 GB of memory per MPI task which is sufficient in the large majority of cases. Nevertheless, for systems of extreme complexity or very large size, it is possible that this quantity of memory is not sufficient. Increasing the amount of memory per task will allow carrying out the calculation but this will result in not using all the requested compute cores and represents, therefore, a waste of resources. For more information, consult the page dedicated to process mapping on Turing.

In addition, file reading and writing by VASP are very penalizing for the performance (by about a factor of 2) on this machine. Consequently, it is recommended to avoid accessing files as much as possible, especially the WAVECAR, CHG and CHGCAR files which are produced by default for a calculation. During a restart, therefore, it is recommended to not call these files, and to ask VASP not to write them (see parameters of the INCAR file). If this type of file is necessary for a study (CHGCAR, LOCPOT, etc.), it is preferable to carry out a single point calculation which will produce the desired files on the optimised geometry of the system rather than including the file generation during the optimisation of the geometry.

Comment : The initialisation time of VASP, before any calculation is done, can take a certain amount of time on Turing. For this reason, it is usual to not have output rapidly.

VASP versions

There are three VASP versions which are differentiated by the compute algorithms implemented:

  • vasp_gamma : the fastest version (from 30 to 50% compared to the standard version) which permits only the calculations at the gamma point.
  • vasp : standard version (slower than the gamma version) which permits parallelisation on k points.
  • vasp_nc : version permitting calculations of non-collinear magnetic structures (activated by some keywords in INCAR); the performance is equal or inferior to the standard version (it also uses more memory).

The use of the gamma version is strongly recommended and this is even more justified by the fact that the study systems must be large (for which the gamma point is almost always sufficient).

Parameters of the INCAR file

The parallelisation parameters of the INCAR file are *very* sensitive on Turing. Therefore, it is necessary to add the following keywords/values in INCAR (caution, the manual indicates other values but these do NOT lead to better performance):

LREAL = Auto
LPLANE = .TRUE.
NPAR = optimal value
LSCALU = .FALSE.
NSIM = 4

The performance is strongly reduced when VASP reads and writes files. Therefore, it is also recommended to add:

!WAVECAR, CHG and CHGCAR not written (file sizes 0) :
LWAVE = .FALSE.
LCHARG = .FALSE.

!To cancel any file call during a restart :
ICHARG = 0
ISTART = 0

The NPAR value

The optimal value of NPAR depends on the system studied and the number of cores. If you enter an incorrect value, this may result in a doubled execution time or in the worst case, a segmentation fault, a RSPHERE error (NPAR too small) or even a calculation which diverges. Nevertheless, to help you, here are the optimal values obtained for a test case :

MOR zeolite cell, doubled and exchanged 1 time (289 atoms), PBE, 400 eV, Γ-point, 2 geometry steps:

Test case Number of cores
(bg_size)
NPAR
version
5.3.3 gamma
NPAR
version
5.3.3 standard
 
 
System doubled
576 atoms - 1586 bands
1024
(64)
32 32
2048
(128)
64 32
4096
(256)
64 64
 
 
 
System ×4
1156 atoms - 3612 bands
1024
(64)
32 32
2048
(128)
32 32
4096
(256)
64 32
8192
(512)
64
 
 
System ×8
2312 atoms - 7392 bands
1024*
(128)
64
2048*
(256)
64
4096*
(512)
64

* : 2 GB per MPI process is necessary for these calculations with the gamma version. The standard version, requiring even more memory, is not recommended for a system of this size.

Note 1 : The nc version uses the same parameters as the standard version.

Note 2 : Depending on the versions and parameters chosen, a WARNING message may appear at the beginning of the OUTCAR file indicating that your NPAR is not optimal. Since this did not occur during the above test, you don't need to take heed of this warning in the majority of cases (as you can see on the test, and contrary to what is announced in the WARNING, there isn't a mathematical rule for this parameter).

Note 3 : Beginning with version 5.3, VASP introduces the NCORE parameter, intended to replace the NPAR parameter. In the test case, using this parameter instead of NPAR slightly degrades the performance. The NCORE table is not provided here as it would be necessary to test each possible value (as was done for NPAR) and this is not planned for the time being.

Note 4 : You must choose NPAR = number of cores for calculations using a functional hybrid or for GW or RPA calculations but the performance will be mediocre.

Parallelisation on the k points (KPAR)

Beginning with version 5.3, VASP permits parallelisation on the k points. Because of using large systems, the gamma point is sufficient in the majority of cases; therefore, in order to obtain a good performance, you should use the vasp_gamma version. Nevertheless, it could be necessary to use some k points for a certain system, in which case you should use vasp.

A new keyword, the KPAR parameter, manages this parallelisation in the INCAR file. It designates the number of attributed compute core groups for each k point (or groups of k points, if there are more). This parameter must be equal to a lower multiple of the total number of cores. Because this new parallelisation method is still being developed, you should do some tests on your own study systems to check your results before proceeding with any studies (as noted in an OUTCAR file message). Inappropriate parameters under certain conditions can cause the computation to stop abruptly without necessarily producing an error message.

In general, as the number of k points remains small considering the size of the systems, it is advised to specify the following in the INCAR file :

KPAR = number of k points if it is a lower multiple of the number of cores; if not, it is a value close to but less than the number of k points

The best performance will be obtained if the number of irreducible k points of the system is a multiple of the total number of cores.

IBM hybrid version

The VASP 5.3.5 version was optimised by IBM in order to use the efficient vendor libraries and a supplementary level of parallelism. Using these binaries typically results in a performance gain of a factor of 2 to 3 according to the type of calculation and the nature of the system treated.

In the launching script, the command line runjob needs to be adapted as follows:

runjob --np 1024 --ranks-per-node=16 --envs "OMP_NUM_THREADS=2" --envs "FFT_ESSL_AGGRESSIVE=N" --exe $VASP_EXEDIR/IBM/vasp_IBM
  • The OMP_NUM_THREADS option allows using the OpenMP parallelisation of this version. The number of threads to be indicated is: ranks-per-node * OMP_NUM_THREADS = 32. The optimal values of ranks-per-node and OMP_NUM_THREADS need to be tested for each type of system treated (in general, OMP_NUM_THREADS=2 or 4).
  • The FFT_ESSL_AGGRESSIVE option allows obtaining the same numerical results as with the non-optimised version if the option is equal to N and can result in a slight gain in performance if it is equal to Y with a numerical difference < 10^-3.

The available binaries are as follows:

  • IBM/vasp_IBM : standard version
  • IBM/vasp_gamma_IBM :optimised version for calculations at the gamma point

Other binaries are available (VTS Tools, etc) in the $VASP_EXEDIR/IBM folder.

These versions have also been installed with the ELPA library, thereby allowing a supplementary performance gain. Their use is recommended. The binaries are named as follows:

  • IBM/vasp_ELPA_IBM
  • IBM/vasp_gamma_ELPA_IBM

When using the ELPA library, if the number of MPI processes (--np) is too elevated for the size of your system, you risk obtaining the following error message:

 ERROR: Problem contains processor column with zero width

In this case, it is necessary to decrease the number of MPI processes and adapt the number of OpenMP threads.

Note : These optimisations were ported on version 5.4.1. The root of the binary names respects the new official name of the binaries, suffixed in the same way. For example, IBM/vasp_ELPA_IBM becomes IBM/vasp_std_ELPA_IBM.

Documentation

Documentation can be found on the VASP official site. You can also find information in the discussion forum.