This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.

Introduction to OpenACC and OpenMP GPU
Course leader: Thibaut Véry
Instructors: Thibaut Véry, Olga Abramkina, Xuezhou Lu
This 3-day training allows you to master GPU programming with OpenACC and OpenMP 5.0 directives in C/Fortran, by understanding hybrid architectures, key concepts, performance trade-offs, and profiling tools to effectively implement these approaches on real applications.
The detailed objectives of the training are as follows:
- Understand hybrid accelerated architectures (GPU) and their associated programming constraints.
- Use OpenACC and OpenMP 5.0 directives to parallelize existing codes or write new ones in C/Fortran.
- Grasp the key concepts of these programming languages, such as directives, clauses, parallel regions, and data management.
- Understand the advantages and disadvantages of using these approaches in terms of portability, performance, complexity, and ease of implementation.
- Be able to implement these concepts on concrete examples from real applications, such as numerical simulation.
- Evaluate the performance of these codes using profiling and analysis tools, such as NSight.
- Understand the differences between OpenACC and OpenMP target, and choose the most suitable language for each situation.
- Target audience
- Prerequisites
- Duration and modalities
- Course content
- Course materials
- Upcoming sessions
Target audience
Engineers, researchers, and developers wishing to master GPU programming with parallelization directives. This training is aimed at anyone involved in high-performance computing who wishes to acquire expertise in OpenACC and OpenMP 5.0 to optimize their applications on heterogeneous GPU/CPU architectures.
Prerequisites
To get the most out of this training, you must have the following knowledge:
- Mandatory: Knowledge and use of programming in Fortran 90/95 and/or C, including compiling and executing programs
- Optional: Basic knowledge of OpenMP and/or MPI to better understand parallelization concepts
Duration and modalities
This training lasts 3 days, with a welcome on the 1st day at 9:00 AM, then schedule from 9:30 AM to 5:30 PM. It takes place exclusively in-person at the IDRIS premises in Orsay (91).
Attendance
Minimum: 8 people;
Maximum: 20 people.
Context
Since the early 2010s, the use of graphics cards (GPUs) as acceleration devices for certain types of calculations has grown rapidly and, while they were initially designed for video games, their use for scientific computing has seen a steady increase.
Several approaches have been developed to allow scientists to exploit the computing power of GPUs. Initially, programming languages specific to architectures (e.g. CUDA for NVidia) or generic ones (like OpenCL) were made available to programmers. These languages are low-level and require a significant rewrite of the computing codes.
OpenACC and OpenMP (target) use directives to annotate codes and allow the use of GPUs. The modifications made to the codes are less intrusive and very often make it possible to obtain results close to low-level languages. It is this approach that we present during this session.
Course content
The course is divided into several modules that cover different concepts essential to GPU programming with directives:
- Getting started: Brief presentation of the main directives to be able to quickly execute code on graphics cards.
- Code analysis: Presentation of code profiling tools on GPU and CPU to identify the important parts for a progressive porting of a CPU code.
- Data management: Data transfers between CPU and GPU are one of the main limiting factors for performance. This module presents the directives that allow optimizing them.
- Modular programming: Presentation of the directives necessary to port a modular code.
- Optimization of computing kernels: Presentation of advanced clauses for managing computing kernels.
- Computing/transfer overlaps: Asynchronous execution of kernels and transfers.
- Multi-GPU: Use of multiple GPUs during calculations with a coupling of GPU directives with MPI and/or OpenMP CPU.
💡 50% of the time will be devoted to practical work (C or Fortran language).
For an efficient execution of the practical parts, these will take place on the Jean Zay supercomputer. A workstation with access to the IDRIS supercomputer is provided to the learners. Experience in using a supercomputer, as well as prior access to it, are not required.
Course materials
All course materials, including slides, notes, and practical exercises, are provided under the following license: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). For more details on the license, please consult this page.
The course is written with Jupyter Notebooks to make the practice more interactive and to be able to integrate simple exercises in addition to the practical work. These notebooks in C and Fortran are made available as well as a PDF document aggregating them.
The source files of the exercises and practical work are also made available.
Course materials (in English)
To view the dates of the upcoming sessions for this training, please visit the following page:
Registration
CNRS / French university staff | External participants |
Are you a staff member of the CNRS or a French university? Your registration is free via our server. | Our training courses are aimed at all professionals from companies, public organizations, and individuals. |