Gromacs on Jean Zay

Introduction

Gromacs is an atomistic simulation software which principally performs molecular dynamics of biological systems.

Available versions

Version Variants
2022.3 tmpi-cuda-plumed, mpi-cuda-plumed, mpi-double-cp2k
2022.2 mpi-cuda
2021.5 mpi-cuda
2021.4 mpi-cuda
2021.2 tmpi-cuda
2020.4 tmpi-cuda, mpi-cuda-plumed
2020.3 tmpi
2020.2 mpi-cuda-plumed
2020.1 tmpi-cuda
2019.6 mpi-cuda
2019.4 mpi-cuda-plumed
2019.2 serial, mpi-cuda
2018.7 serial, mpi-cuda

For GPU usage the best performance is obtained with the thread-MPI version limited on one node. This is available for version starting from 2020.

OpenMPI error message

Recent versions of Gromacs were compiled with OpenMPI Cuda-aware to boost performance on GPU.

If you do not use the GPUs, a warning message appears:

--------------------------------------------------------------------------
The library attempted to open the following supporting CUDA libraries,
but each of them failed.  CUDA-aware support is disabled.
libcuda.so.1: cannot open shared object file: No such file or directory
libcuda.dylib: cannot open shared object file: No such file or directory
/usr/lib64/libcuda.so.1: cannot open shared object file: No such file or directory
/usr/lib64/libcuda.dylib: cannot open shared object file: No such file or directory
If you are not interested in CUDA-aware support, then run with
--mca opal_warn_on_missing_libcuda 0 to suppress this message.  If you are interested
in CUDA-aware support, then try setting LD_LIBRARY_PATH to the location
of libcuda.so.1 to get passed this issue.
--------------------------------------------------------------------------

This message has no effect on your job. If you wish to prevent this message from being displayed, you can add the following lines to your submission script:

  export PSM2_CUDA=0
  export OMPI_MCA_opal_warn_on_missing_libcuda=0

Tools

The Gromacs pre- and post-processing tools (ex.: grompp) can be used on the front ends, only if their sequential version is chosen. For this, it is necessary to load a Gromacs version without MPI and CUDA.

Important options of mdrun

If you notice a strong load imbalance, then it is necessary to tweak the mdrun parameters. For example, the following options will have an important role:

  • -npme : number of MPI tasks dedicated to the PME calculation
  • -pme : where the PME is calculated (cpu, gpu)
  • -pmefft : where the FFT of the PME is calculated
  • -nb : where the non-bonded interactions are calculated

Do not hesitate to consult the Gromacs documentation for the other options (see the URLs above).

Example of usage on the CPU partition

Submission script on the CPU partition

gromacs_cpu.slurm
#!/bin/bash
#SBATCH --nodes=1               # 1 node is used
#SBATCH --ntasks-per-node=4     # 4 MPI tasks
#SBATCH --cpus-per-task=10      # Number of OpenMP threads per MPI task
#SBATCH --hint=nomultithread    # Disable hyperthreading
#SBATCH --job-name=gromacs      # Jobname
#SBATCH --output=GMX_GenMD.o%j  # Standard output file (%j is the job number)
#SBATCH --error=GMX_GenMD.o%j   # Standard error file
#SBATCH --time=10:00:00         # Expected runtime HH:MM:SS (max 100h)
##
## Please, refer to comments below for
## more information about these 4 last options.
##SBATCH --account=<account>@cpu  # To specify cpu accounting: <account> = echo $IDRPROJ
##SBATCH --partition=<partition>  # To specify partition (see IDRIS web site for more info)
##SBATCH --qos=qos_cpu-dev        # Uncomment for job requiring less than 2 hours
##SBATCH --qos=qos_cpu-t4         # Uncomment for job requiring more than 20h (up to 4 nodes)
 
# Cleans out the modules loaded in interactive and inherited by default
module purge
 
# Load needed modules
module load gromacs/2018.7-mpi-cuda
 
# Run : 4 MPI tasks (--ntasks-per-node=4) and 10 threads/task (--cpus-per-task=10)
# Be aware that Gromacs recommands 2 <= ntomp <= 6.
# Do your own tests
srun gmx_mpi mdrun -v -deffnm md_test -ntomp $SLURM_CPUS_PER_TASK

Example of usage on the GPU partition

Detection of GPUs is done automatically by Gromacs. No modification needs to be made in the command line.

But you have to reserve GPUs in your Slurm file with Slurm directive #SBATCH –gres=gpu:4.

Submission script on the GPU partition

gromacs_gpu.slurm
#SBATCH --nodes=1               # 1 node is used
#SBATCH --ntasks-per-node=4     # 4 MPI tasks
#SBATCH --cpus-per-task=10      # Number of OpenMP threads per MPI task
#SBATCH --gres=gpu:4            # Number of GPUs per node
#SBATCH --hint=nomultithread    # Disable hyperthreading
#SBATCH --job-name=gromacs      # Jobname
#SBATCH --output=GMX_GenMD.o%j  # Standard output file (%j is the job number)
#SBATCH --error=GMX_GenMD.o%j   # Standard error file
#SBATCH --time=10:00:00         # Expected runtime HH:MM:SS (max 100h for V100, 20h for A100)
##
## Please, refer to comments below for
## more information about these 4 last options.
##SBATCH --account=<account>@v100  # To specify partition (see IDRIS web site for more info)
##SBATCH --qos=qos_gpu-dev         # Uncomment for job requiring less than 2 hours
##SBATCH --qos=qos_gpu-t4          # Uncomment for job requiring more than 20h (up to 16 GPU, V100 only)
 
# Cleans out the modules loaded in interactive and inherited by default
module purge
 
# Load needed modules
module load gromacs/2018.7-mpi-cuda
 
# Run : 4 MPI tasks (--ntasks-per-node=4) and 10 threads/task (--cpus-per-task=10)
# Be aware that Gromacs recommands 2 <= ntomp <= 6.
# Do your own tests
# and 4 GPUs (--gres=gpu:4) automatically detected.
srun gmx_mpi mdrun -v -deffnm md_test -ntomp $SLURM_CPUS_PER_TASK

Submission script on the GPU partition with the thread-MPI version (latest developments)

gromacs_gpu.slurm
#SBATCH --nodes=1               # 1 node is used
#SBATCH --ntasks-per-node=1     # 1 MPI tasks
#SBATCH --cpus-per-task=40      # Number of OpenMP threads per MPI task
#SBATCH --gres=gpu:4            # Number of GPUs per node
#SBATCH --hint=nomultithread    # Disable hyperthreading
#SBATCH --job-name=gromacs_tmpi # Jobname
#SBATCH --output=GMX_GenMD.o%j  # Standard output file (%j is the job number)
#SBATCH --error=GMX_GenMD.o%j   # Standard error file
#SBATCH --time=10:00:00         # Expected runtime HH:MM:SS (max 100h for V100, 20h for A100)
##
## Please, refer to comments below for
## more information about these 4 last options.
##SBATCH --account=<account>@v100 # To specify gpu accounting: <account> = echo $IDRPROJ
##SBATCH --partition=<partition>  # To specify partition (see IDRIS web site for more info)
##SBATCH --qos=qos_gpu-dev        # Uncomment for job requiring less than 2 hours
##SBATCH --qos=qos_gpu-t4         # Uncomment for job requiring more than 20h (up to 16 GPU, V100 only)
 
# Cleans out the modules loaded in interactive and inherited by default
module purge
 
# Load needed modules
module load gromacs/2020.1-cuda
 
# Activate latest GPU developments
export GMX_GPU_PME_PP_COMMS=true
export GMX_FORCE_UPDATE_DEFAULT_GPU=1
export GMX_GPU_DD_COMMS=true
 
# Run : 4 thread-MPI tasks and 10 threads/task 4 GPUs (--gres=gpu:4) automatically detected.
# Please read the documentation about thread-MPI and latest GPU development
# http://manual.gromacs.org/documentation/current/user-guide/mdrun-performance.html
gmx mdrun -ntmpi 4 -npme 1 -ntomp 10 \
          -update gpu -bonded gpu \
          -nb gpu -pme gpu -pmefft gpu \
          -deffnm 6vxx_nvt -v

Comments:

  • All jobs have resources defined in Slurm per partition and per QoS (Quality of Service) by default. You can modify the limits by specifying another partition and / or QoS as shown in our documentation detailing the partitions and Qos.
  • For multi-project users and those having both CPU and GPU hours, it is necessary to specify the project accounting (hours allocation for the project) for which to count the job's computing hours as indicated in our documentation detailing the project hours management.