This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.
StarPU on Jean Zay
StarPU is a task programming library for hybrid, multicore and heterogeneous architectures.
The user provides the algorithms and constraints, and StarPU handles task dependencies, optimised heterogeneous scheduling, and optimised data transfers and replications between different memory locations.
StarPU can be used via its C/C++/Fortran/Python API or through OpenMP pragmas.
Useful links
Available versions
The library is made available via our modules. To view the available versions, you can run the command module avail:
$ module avail starpu
starpu/1.3.9-cuda starpu/1.4.2-mpi-cuda-debug
starpu/1.3.9-cuda-debug starpu/1.4.2-mpi-debug
starpu/1.3.9-cuda-debug-simgrid starpu/1.4.4-mpi
starpu/1.3.9-cuda-simgrid
# sortie générée le 25/02/2026
To use StarPU on the A100 partition, it is necessary to preload the module arch/a100 to access the modules compatible with this partition.
Several variants are available:
| Variants | Description |
|---|---|
| -cuda | CUDA support |
| -mpi | Parallel version |
| -simgrid | Architecture simulation possible |
| -debug | Debugging options enabled at compilation |
Compilation of a C code using StarPU
We define a simple code that submits a task to StarPU.
#include <starpu.h>
#include <stdio.h>
/* Bien respecter la signature de la fonction qui doit être soumise à StarPU */
void cpu_func(void *buffers[], void *cl_arg)
{
printf("Hello world\n");
}
/* On enveloppe la fonction avec des paramètres spécifiques à une tâche StarPU, ce qu'on appelle un "Codelet" */
struct starpu_codelet cl =
{
.cpu_funcs = { cpu_func },
.nbuffers = 0
};
int main(int argc, char *argv[])
{
int ret;
/* Initialisation de StarPU */
ret = starpu_init(NULL);
STARPU_CHECK_RETURN_VALUE(ret, "starpu_init"); /* Vérifie l'état de l'initialisation */
/* Définition d'une tâche StarPU */
struct starpu_task *task = starpu_task_create();
task->cl = &cl; /* On pointe vers le codelet défini plus haut */
/* Soumission de la tâche à StarPU */
ret = starpu_task_submit(task);
STARPU_CHECK_RETURN_VALUE(ret, "starpu_task_submit");
ret = starpu_task_wait_for_all();
/* Termine StarPU */
starpu_shutdown();
return 0;
}
The code can be compiled using the Makefile below:
# Dependent de la version utilisée (voir module load starpu/...)
CFLAGS += $$(pkg-config --cflags starpu-1.3)
LDLIBS += $$(pkg-config --libs starpu-1.3)
all: hello_world
hello_world:
gcc $(CFLAGS) hello_world.c -o hello_world $(LDLIBS)
clean:
rm -f hello_world starpu.log
# compilation
module load starpu/1.3.9-cuda-debug
make
Batch submission
#!/bin/bash
## JOB INFO
#SBATCH --job-name=starpu_code
#SBATCH --output=%x_%j.out
#SBATCH --error=%x_%j.err
## NODE CONFIGURATION
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=10
#SBATCH --partition=cpu_p1
#SBATCH --hint=nomultithread
## JOB ACCOUNTABILITY
#SBATCH --account=xxx@cpu
#SBATCH --partition=cpu_p1
#SBATCH --time=01:00:00
## ENV ACTIVATION
## USE SAME STARPU VERSION AS TO COMPILE!
module purge
module load starpu/1.3.9-cuda-debug
## CODE EXECUTION
make
./my_app
Batch submission for multi-node code
Here, you need to load a parallel version of the library (variant -mpi).
#!/bin/bash
## JOB INFO
#SBATCH --job-name=starpu_multi-nodes_code
#SBATCH --output=%x_%j.out
#SBATCH --error=%x_%j.err
## NODE CONFIGURATION
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=10
#SBATCH --partition=cpu_p1
#SBATCH --hint=nomultithread
## JOB ACCOUNTABILITY
#SBATCH --account=xxx@cpu
#SBATCH --partition=cpu_p1
#SBATCH --time=01:00:00
## ENV ACTIVATION
## USE SAME STARPU VERSION AS TO COMPILE!
module purge
module load starpu/1.4.2-mpi-debug
## CODE EXECUTION
make
srun ./my_mpi_app
In this example:
- we run on 2 nodes;
srunlaunches the executable./my_mpi_appin parallel on 8 tasks (4 tasks per node);- each process/task uses 10 CPU cores of a node and is therefore allocated 1/4 of the node's memory;
- we use the module
starpu/1.4.2-mpi-debugwhich includes MPI support.
Case of a Python code
Here, you will need to preload a Python module in addition to the StarPU module. For example:
module load python/3.8.8
module load starpu
Example of a Python script using StarPU:
import time
import starpu
from starpu.joblib import Parallel, delayed
# Initialise StarPU
starpu.init()
# Définit une fonction qui dure quelques secondes
def huge_task(task_number):
print("Beginning of task : ", task_number, "\n")
c = 0
for i in range(200000000):
c += 1
print("End of task : ", task_number, "\n")
return task_number
start = time.time()
X = [1,2,3,4]
Parallel()(delayed(huge_task)(x) for x in X) # lance plusieurs tâches à StarPU
stop = time.time()
print(stop-start)
# Termine StarPU
starpu.shutdown()
Tip
You can ignore the CPU binding defined by Slurm by setting:
export STARPU_WORKERS_GETBIND=0
In this case, StarPU instances will be distributed across all available cores on the node(s), even if you have not reserved them via Slurm. You may therefore disrupt the work of another user sharing the same node and your job will have degraded performance.
Specific configurations
If you need a compilation with a very specific configuration (and not available from our modules), it is perfectly possible to compile StarPU locally. For any assistance, you can contact support at assist@idris.fr.