The PETSc library
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.
Description
PETSc is a library of functions written in C (but also offering a Fortran interface) for manipulating dense or sparse vectors and matrices and solving the corresponding linear systems (often related to partial differential equations) with direct or iterative solvers. PETSc also provides non-linear solvers, methods for solving differential equations, and many other tools. Finally, PETSc also allows the use of external libraries.
PETSc is available for CPU and GPU.
Installed Versions and Environments
Various versions are installed on Jean Zay and are accessible via the module command. They can be listed with the following command:
module avail petsc
Information on a particular version can be obtained via the command:
module show petsc-xxx
For example:
module show petsc/3.21.4-mpi-complex-int64--------------------------------------------------------------------------/lustre/fshomisc/sup/hpe/pub/modules-idris-env4/modulefiles/linux-rhel9-skylake_avx512/petsc/3.21.4-mpi-complex-int64:
module-whatis {PETSc is a suite of data structures and routines for the scalable (parallel) solution of scientific applications modeled by partial differential equations. }prereq intel-compilers/19.1.3 intel-compilers/2021.9.0conflict petsc
Available software environment(s):- intel-compilers/19.1.3 intel-mpi/2019.9- intel-compilers/2021.9.0 intel-mpi/2021.9
If you want to use this module with another software environment, please contact the support team.--------------------------------------------------------------------------
This installation is available in two different environments, with the first (intel-compilers/19.1.3 intel-mpi/2019.9 here) being the default. If you want another environment, you need to load it first. For example:
module load intel-compilers/2021.9.0 intel-mpi/2019.9 petsc
The options used for compiling PETSc are accessible in the file $PETSC_DIR/$PETSC_ARCH/include/petscconfiginfo.h and the installed options in $PETSC_DIR/$PETSC_ARCH/include/petscconf.h. The variables $PETSC_DIR and $PETSC_ARCH are automatically set when loading PETSc via the module command.
It is possible to add other versions of PETSc. For this, please contact the User Support team.
Documentation
📝 The documentation on the reference site.
Usage Examples
In this example, we use an exercise based on the DMDA subclass of DM, which manages communication between PETSc's algebraic structures and meshed data structures. This exercise is part of our PETSc training. You can find more details in the course materials available at the following address: PETSc Training.
C source code and compilation file:dmda.c:
#include <petscdmda.h>
#include <petscksp.h>
#include <petsctime.h>
int main(int argc, char **argv)
{
const PetscInt nx = 10, ny = 10, stencil_size = 5;
PetscInt i, j, its;
Mat A;
Vec x, b;
KSP solver;
const PetscReal rtol = 1.e-8;
KSPConvergedReason reason;
PetscReal errorNorm, rnorm;
PetscLogDouble t1, t2;
DM dm;
DMDALocalInfo info;
const PetscInt stencilWidth = 1;
MatStencil row, col5[stencil_size];
PetscScalar hx2, hy2, coef, coef5[stencil_size];
PetscScalar **bgrid;
PetscFunctionBeginUser;
PetscCall(PetscInitialize(&argc, &argv, NULL, NULL));
// Create the 2-D DMDA object
PetscCall(DMDACreate2d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, DMDA_STENCIL_STAR,
nx, ny, PETSC_DECIDE, PETSC_DECIDE, 1, stencilWidth,
NULL, NULL, &dm));
PetscCall(DMSetFromOptions(dm));
PetscCall(DMSetUp(dm));
// View the DMDA object
PetscCall(DMView(dm, PETSC_VIEWER_STDOUT_WORLD));
// Create the A matrix from the DMDA object
PetscCall(DMCreateMatrix(dm, &A));
// Retrieve local information from the DMDA object
PetscCall(DMDAGetLocalInfo(dm, &info));
hx2 = 1.0 / ((info.mx - 1) * (info.mx - 1));
hy2 = 1.0 / ((info.my - 1) * (info.my - 1));
coef = 1.0;
coef5[0] = 2.0 / hx2 + 2.0 / hy2;
coef5[1] = -1.0 / hx2; coef5[2] = -1.0 / hx2;
coef5[3] = -1.0 / hy2; coef5[4] = -1.0 / hy2;
// Loop on the grid points
for (j = info.ys; j < info.ys + info.ym; j++) {
for (i = info.xs; i < info.xs + info.xm; i++) {
row.i = i; row.j = j; row.c = 0;
if (i == 0 || i == (info.mx - 1) || j == 0 || j == (info.my - 1)) {
// Set matrix values to enforce boundary conditions (homogeneous Dirichlet conditions)
PetscCall(MatSetValuesStencil(A, 1, &row, 1, &row, &coef, INSERT_VALUES));
} else {
// Set matrix values fo interior points
col5[0].i = i; col5[0].j = j; col5[0].c = 0;
col5[1].i = i - 1; col5[1].j = j; col5[1].c = 0;
col5[2].i = i + 1; col5[2].j = j; col5[2].c = 0;
col5[3].i = i; col5[3].j = j - 1; col5[3].c = 0;
col5[4].i = i; col5[4].j = j + 1; col5[4].c = 0;
PetscCall(MatSetValuesStencil(A, 1, &row, stencil_size, col5, coef5, INSERT_VALUES));
}
}
}
PetscCall(MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY));
PetscCall(MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY));
// PetscCall(MatView(A, PETSC_VIEWER_STDOUT_WORLD));
// Create global vectors b and x from the DMDA object
PetscCall(DMCreateGlobalVector(dm, &b));
PetscCall(DMCreateGlobalVector(dm, &x));
PetscCall(VecSet(b, 0.0));
PetscCall(VecSetRandom(x, NULL));
PetscCall(KSPCreate(PETSC_COMM_WORLD, &solver));
PetscCall(KSPSetOperators(solver, A, A));
PetscCall(KSPSetTolerances(solver, rtol, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT));
PetscCall(KSPSetInitialGuessNonzero(solver, PETSC_TRUE));
PetscCall(KSPSetFromOptions(solver));
PetscCall(KSPSetUp(solver));
PetscCall(KSPView(solver, PETSC_VIEWER_STDOUT_WORLD));
PetscCall(PetscTime(&t1));
PetscCall(KSPSolve(solver, b, x));
PetscCall(PetscTime(&t2));
PetscCall(PetscPrintf(PETSC_COMM_WORLD, "Elapse time: %lf s\n", t2 - t1));
PetscCall(KSPGetConvergedReason(solver, &reason));
PetscCall(PetscPrintf(PETSC_COMM_WORLD, "Converged reason: %s\n", KSPConvergedReasons[reason]));
PetscCall(KSPGetIterationNumber(solver, &its));
PetscCall(PetscPrintf(PETSC_COMM_WORLD, "Iterations number: %d\n", its));
PetscCall(KSPGetResidualNorm(solver, &rnorm));
PetscCall(PetscPrintf(PETSC_COMM_WORLD, "Residual norm: %g\n", rnorm));
PetscCall(VecNorm(x, NORM_2, &errorNorm));
PetscCall(PetscPrintf(PETSC_COMM_WORLD, "Error norm (NORM_2): %g\n", errorNorm));
PetscCall(VecNorm(x, NORM_INFINITY, &errorNorm));
PetscCall(PetscPrintf(PETSC_COMM_WORLD, "Error norm (NORM_INFINITY): %g\n", errorNorm));
PetscCall(KSPDestroy(&solver));
PetscCall(MatDestroy(&A));
PetscCall(VecDestroy(&x));
PetscCall(VecDestroy(&b));
PetscCall(DMDestroy(&dm));
PetscCall(PetscFinalize());
return 0;
}
}
Makefile:
ALL: dmda
SOURCES = dmda.c
OBJ = $(SOURCES:.c=.o)
EXE = dmda.exe
CLEANFILES = ${OBJ} ${EXE}
include ${PETSC_DIR}/lib/petsc/conf/variables
include ${PETSC_DIR}/lib/petsc/conf/rules
dmda: ${OBJ}
${CLINKER} -o ${EXE} ${OBJ} ${PETSC_LIB}
CPU Example
For the CPU case, the compilation is done after loading the module petsc/3.20.1-mpi, for example. The job execution is then done using the following submission script:
#!/bin/bash
#SBATCH --job-name=dmda_cpu # Nom du job
#SBATCH --nodes=1
#SBATCH --ntasks=4 # Nombre total de processus MPI
#SBATCH --hint=nomultithread # 1 processus MPI par coeur physique (pas d'hyperthreading)
#SBATCH --time=00:10:00 # Temps d’exécution maximum demande (HH:MM:SS)
#SBATCH --output=dmda_cpu_%j.out # Nom du travail
#SBATCH --error=dmda_cpu_%j.out # Nom du fichier d'erreur (ici commun avec la sortie)
#SBATCH --qos=qos_cpu-dev
cd ${SLURM_SUBMIT_DIR}
module purge
# Utilise la meme version que pour la compilation du code
module load petsc/3.20.1-mpi
module list
view_args="-log_view" # to avoid slowing down due to -log_view_gpu_time
cpu_args="-dm_mat_type mpiaij -dm_vec_type mpi"
set -x
srun ./dmda.exe -da_grid_x 1000 -da_grid_y 1000 -ksp_type cg $cpu_args $view_args
GPU Example
The GPU version of PETSc, using CUDA, is installed on all our GPU partitions: V100, A100, and H100. The corresponding modules are available under the name petsc/3.22.1-mpi-cuda (for the A100 and H100 partitions, first do a module load arch/a100 or module load arch/h100, respectively).
To generate an executable compatible with the H100 partition, remember to compile your PETSc code on the compil_H100 partition by doing a module load arch/h100.
On Jean Zay, OpenMPI libraries CUDA-aware support GPUDirect, which optimises communications between GPUs. For more information, please refer to the documentation MPI CUDA-aware and GPUDirect. In this example, we use GPUDirect for optimal performance in a multi-GPU case. This requires adapting the code dmda.c presented in the previous section by making the following necessary modifications:
Add the binding step, include init_cuda.c in the SOURCES of the Makefile:
init_cuda.c:
#include <cuda.h>
#include <cuda_runtime.h>
void initialisation_cuda()
{
char* local_rank_env;
int local_rank;
cudaError_t cudaRet;
/* Récupération du rang local du processus via la variable d'environnement
positionnée par Slurm, l'utilisation de MPI_Comm_rank n'étant pas encore
possible puisque cette routine est utilisée AVANT l'initialisation de MPI */
local_rank_env = getenv("SLURM_LOCALID");
if (local_rank_env) {
local_rank = atoi(local_rank_env);
/* Définition du GPU à utiliser pour chaque processus MPI */
cudaRet = cudaSetDevice(local_rank);
if(cudaRet != CUDA_SUCCESS) {
printf("Erreur: cudaSetDevice a échoué\n");
exit(1);
}
} else {
printf("Erreur : impossible de déterminer le rang local du processus\n");
exit(1);
}
}
Initialise CUDA before MPI:
dmda_gpu.c:
…
/* On initialise CUDA... */
initialisation_cuda();
PetscFunctionBeginUser;
PetscCall(PetscInitialize(&argc, &argv, NULL, NULL));
…
Here is a submission script adapted to run this exercise on the A100 GPUs:
#!/bin/bash
#SBATCH --job-name=dmda_gpu # nom du job
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2 # nombre de tâche MPI par noeud
#SBATCH --gres=gpu:2 # nombre de GPU à réserver par nœud
#SBATCH --cpus-per-task=8 # nombre de coeurs à réserver par tâche
#SBATCH --hint=nomultithread # 1 processus MPI par coeur physique (pas d'hyperthreading)
#SBATCH --time=00:10:00 # Temps d’exécution maximum demande (HH:MM:SS)
#SBATCH --output=dmda_gpu_a100%j.out # Nom du travail
#SBATCH --error=dmda_gpu_a100%j.out # Nom du fichier d'erreur (ici commun avec la sortie)
#SBATCH --qos=qos_gpu_a100-dev
#SBATCH -C a100
cd ${SLURM_SUBMIT_DIR}
module purge
module load arch/a100
# Utilise la meme version que pour la compilation du code
module load petsc/3.22.1-mpi-cuda
module list
view_args="-log_view" # to avoid slowing down due to -log_view_gpu_time
if [ $SLURM_NTASKS == 1]
then
### mono-GPU :
gpu_args="-dm_mat_type aijcusparse -dm_vec_type cuda"
else
### multi-GPU :
gpu_args="-dm_mat_type mpiaijcusparse -dm_vec_type mpicuda"
fi
set -x
srun ./dmda.exe -da_grid_x 1000 -da_grid_y 1000 -ksp_type cg -pc_type hypre $gpu_args $view_args