Skip to main content

Getting Started on the Supercomputer

⚠ INFORMATION
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.

Introduction

Are you new to the Jean Zay supercomputer?

On this page, you will find the main points essential for accessing Jean Zay, making your first connection and submitting your first job.

This page is primarily aimed at new users of IDRIS. It is designed to be concise to ensure a quick start with the supercomputer. For more information on the various points covered, please refer to the full documentation.

And for a quick overview of useful Linux, Module and SLURM commands for Jean Zay users, see the cheatsheet above.


Do you have a question or request?
The IDRIS User Support is available,
Monday to Thursday from 9 am to 6 pm and Friday from 9 am to 5:30 pm:

📥 assist@idris.fr
☎️ +33 (0)1 69 35 85 55


Presentation of the Jean Zay machine

Jean Zay is a supercomputer composed of five partitions:

  • one scalar partition (nodes equipped only with CPUs)

  • and four accelerated partitions (hybrid nodes equipped with both CPUs and GPUs).

    All nodes access a shared file system via a very high bandwidth interconnect network.

INFO

A complete hardware description is available on the page The Jean Zay supercomputer.

Here is an overview of the machine's architecture:

jean_zay_material_description

All DARI projects with CPU or/and GPU hours have compute partitions defined on Jean Zay. These allow users to choose the type of resource (CPU or GPU) they wish to use. The table below summarises the main characteristics of these partitions.

PartitionNameCPUs per nodeCPU RAMGPUs per nodeGPU RAM
CPUcpu_p140192 GB--
quadri-GPU V100p1340192 GB416 GB / 32 GB
octo-GPU V100p224384 GB / 768 GB832 GB
octo-GPU A100p564512 GB880GB
quadri-GPU H100p696512 GB480GB
INFOS
  • For more information on the different partitions, see the pages on SLURM CPU partitions or/and SLURM GPU partitions.

  • All projects with CPU or/and GPU hours also have access to partitions dedicated to pre- and post-processing, visualisation, compilation or archiving. On these partitions, computing hours are not deducted from your allocation. For more information on these partitions, see the page SLURM CPU partitions.

Accessing the supercomputer


Any request to open an account on the Jean Zay machine is made on the eDARI portal.

This request involves a request to join a scientific project with computing hours on the supercomputer. You can therefore join an existing project with the agreement of its project leader or create your own project. Creating a scientific project is done via a request for computing hours on theeDARI portal.

Beforehand, we recommend that you consult the GENCI note detailing the terms of access to national resources. You will find, among other things, the conditions and criteria for eligibility to obtain computing hours and an account on the supercomputers.

It is the public operator GENCI (National High-Performance Computing Facility) that manages the allocation of computing resources for all national centres (CINES, IDRIS and TGCC).

To compute on Jean Zay:

To be able to compute on Jean Zay, you will need to complete the following three steps:

  1. Create a user account on eDARI
  2. Request an hour allocation (unless joining an existing project)
  3. Request the creation of a Jean Zay user account (with project attachment)


Create a user account on eDARI

First, you must create a user account on https://www.edari.fr/user/login using your institutional email address.

warning

This account is solely intended to access your personal space on the eDARI site to carry out all administrative procedures (requests for computing hours, opening accounts on Jean Zay, etc.) which must be done from eDARI. It has no link with any potential Jean Zay user account.

You will find more information on hour requests and account opening requests on Jean Zay in the following video:


Connection


First connection via SSH

The first connection to Jean Zay must be made via SSH, from an institutional address registered in the IDRIS filters and associated with your computing account:

ssh login@jean-zay.idris.fr

For your first connection, you must use your initial password consisting of the concatenation

  • of the randomly generated password by IDRIS (sent by email)
  • and the password you entered when requesting to open an account (eDARI).

This password will be changed immediately upon first connection (automatic procedure) to set your current password. You will find an example of a first connection in Password Management.

ATTENTION
  • On the first connection, the initial password is requested twice (once for the connection and a second time to change the initial password).
  • Being immediately disconnected after the new password has been accepted (all authentication tokens updated successfully) is normal.

Once connected, you arrive on one of the 5 login nodes of Jean Zay. These nodes, shared by all users, are dedicated to setting up the computing environment and should not be used for calculations (they are not equipped with GPUs).
Unlike the compute nodes, the login nodes have an HTTP/HTTPS proxy allowing data to be downloaded from remote servers (via the HTTP/HTTPS protocol with commands git or wget for example).

INFO

For more information on connecting to Jean Zay (targeting a specific login node, SSH connections by key or with a certificate, etc.), see the page SSH Access and Shells.

Connection via JupyterHub

The IDRIS teams have set up a JupyterHub solution that allows the use of Jupyter Notebooks and other applications such as VSCode, MLflow or Dask via a web interface without prior SSH connection to the machine.

IMPORTANT

An initial SSH connection to Jean Zay is essential before using JupyterHub.

Managing your data


Disk spaces

Each user has a personal HOME space (unique even for users attached to multiple projects).
In addition, for each project the user participates in, 4 disk spaces with various characteristics are accessible: WORK, SCRATCH, STORE and DSDIR.

IMPORTANT

To store your files, be sure to choose the best disk space according to their respective characteristics. This is essential to avoid saturating your quotas (failed calculations) or data loss (automatically purged disk space).

The table below summarises their main characteristics. For more information on how to better use your disk spaces, see the page Disk Spaces.

Disk SpaceDefault CapacityUsageCommand
$HOMEQuotas
3 GB / 150k inodes per user
- home directory when connecting interactively
- intended for small files (e.g. configuration files)
- unique in the case of multi-project login
cd $HOME
$WORKQuotas
5 TB / 500k inodes per project *
- workspace and permanent storage
- designed to accommodate large files (e.g. input/output data)
User-specific part:
cd $WORK
Common part accessible to all project users:
cd $ALL_CCFRWORK
$SCRATCHVery large security quotas
4.6 PB shared by all users
- workspace and semi-temporary storage
- Lifetime of files not read or modified: 30 days
- Optimal performance for read/write operations
User-specific part:
cd $SCRATCH
Common part accessible to all project users:
cd $ALL_CCFRSCRATCH
$STOREQuotas
50 TB / 100k inodes per project *
- archive space
- accessible from the login nodes and the prepost, archive, compil and visu partitions
User-specific part:
cd $STORE
Common part accessible to all project users:
cd $ALL_CCFRSTORE
$DSDIR-- space visible to all Jean Zay users
- contains models and large public databases
- set up by the IDRIS teams
- read-only
cd $DSDIR

* the project quotas can be increased on request from the project leader or their deputy via the Extranet interface or on request to User Support.

💡 Disk quotas

You can check the usage of your disk spaces using one of these two commands:

  • idr_quota_user for a view of your personal usage as a user;
  • idr_quota_project for an overview of your project and the consumption of each of its members.
INFO

For more information, see the page on Disk Quotas and Viewing Usage Rates.

💡 Best practices for database management

  • To avoid saturating your disk spaces, check if the model or database you need is already available on DSDIR.
  • If downloading is necessary (non-public database) and its volume requires you to download it to your SCRATCH (very large quotas), keep a copy of your database in the form of archives in the STORE (the SCRATCH is a semi-permanent space). You can then easily restore your database if files have been deleted.
  • WORK or SCRATCH?
    • WORK: Your files are not subject to any automatic deletion procedure, but the read and write performance is worse than that of the SCRATCH. The quotas are also more restrictive.
    • SCRATCH: Very large quotas and better read and write performance. But, files not accessed for 30 days are automatically deleted!
INFOS
  • If you are working on a public database, we can download it for you into the shared disk space DSDIR. The data will then be accessible to all users.
  • For more information on best practices for managing your data, see the page Databases.

Transferring data between your login machine and Jean Zay

If you need to transfer data between your machine and Jean Zay, you can use the commands related to SSH (sftp and scp).

# Sending a local file to Jean Zay
scp localSource login@jean-zay.idris.fr:JZDestination

# Retrieving a file from Jean Zay to the local machine
scp login@jean-zay.idris.fr:JZSource localDestination

or

# Connecting to the remote server via SFTP
sftp login@jean-zay.idris.fr destination

# Sending a local file to Jean Zay
sftp> put localSource JZDestination

# Retrieving a file from Jean Zay to the local machine
sftp> get JZSource localDestination
IMPORTANT

For this to work, your machine must be registered in the IDRIS filters or you must go through a registered machine!

INFO

For more information on how to transfer data in batch, see the page Transferring data between IDRIS and your login machine and this cheatsheet.

Computing environment


IDRIS provides a catalogue of tools (virtual environments, compiled libraries, etc.) accessible via the command module.

The module command

To load the products installed on Jean Zay, you need to use the command module. The table below summarises the basic module commands.

ActionModule command
display the modules containing the requested packageidr_module_search <package>
display the complete cataloguemodule avail
search for a specific toolmodule avail <package>
get info on a modulemodule show <package>
load a modulemodule load <package>/<version>
unload a modulemodule unload <package>
display the list of loaded modulesmodule list
start from a clean environmentmodule purge
ATTENTION

To access the modules adapted to the A100 or H100 partition, you must first load one of the following modulefiles:

  • For the A100 partition: module load arch/a100
  • For the H100 partition: module load arch/h100
INFOS
  • The list of modules can be enriched on request by contacting support via assist@idris.fr.
  • For more information on using the command module, see the page Modules

Modules and conda virtual environments

Pre-installed conda virtual environments by IDRIS are accessible via the command module.

  • The environment is activated automatically ( conda activate ) when the module is loaded ( module load conda ).
  • ATTENTION, it is not deactivated ( conda deactivate ) when the module is unloaded ( module unload conda ).

Once the environment is activated, you can view all the Python packages it contains using the commands pip list and conda list.

INFOS
  • It is strongly recommended to use the environments installed by us to obtain the best performance, pool resources and avoid saturating your quotas.
  • Any environment can be enriched on request by contacting support via assist@idris.fr.

Modules and compilation

Different compilers and libraries are available on Jean Zay ( module avail ) and can be activated using the command module load.

ATTENTION

We strongly recommend that you consult our web page on the use of the command module and the management of dependencies between the versions of the libraries and those of the compilers.

INFO

For more information on compilation and the different compilers available, see the dedicated pages.

Job submission


Two working modes are possible:

  • in batch
  • in interactive

Batch work allows you to close the interactive session after submitting a job, while interactive work requires you to keep the session open to avoid interrupting the execution.

ATTENTION

It is strongly discouraged to perform calculations on the login nodes as this can slow them down (or even crash them), which would impact all other users connected to the same node!
In addition, the limits set on these nodes (1 CPU per user and 30 minutes of CPU time per process) do not allow for good performance.

INFO

For more information on these two working modes, see the pages Batch Execution and Interactive Execution.

You will find examples below for a quick start.

Batch execution - Example of a SLURM script

Access to computing resources is managed by the Slurm manager for all users.

There are 2 essential steps to working in batch:

  1. Creating the Slurm script: a file containing the Slurm directives for resource reservation and the commands to be executed.
  2. Submitting the job: the Slurm script is submitted to the manager via the Slurm commands sbatch or srun for execution on the requested resources.
INFO

Access to the various hardware partitions of the machine depends on the type of job submitted (CPU or GPU) and the Slurm partition requested for its execution (See SLURM CPU partitions and SLURM GPU partitions for more information).

ATTENTION
  • The batch mode does not allow the user to intervene during the execution of the script commands (except to interrupt the job). Therefore, file transfers must be done without having to type a password.
  • The compute nodes have no Internet access, which prohibits any downloading (Git repositories, Python/Conda installation, …) from these nodes. If necessary, downloads must be done from the login nodes or from the pre/post-processing nodes before code execution: either interactively, or via the batch submission of cascading jobs.

Below are examples of scripts for executing an MPI code in HPC and a Python script in AI:


Here is an example of a CPU submission script for a batch MPI job on Jean Zay:

1. Content of the intel_mpi.slurm file:

#!/bin/bash
#SBATCH --job-name=MPIJob # job name
#SBATCH --ntasks=80 # Total number of MPI processes
#SBATCH --ntasks-per-node=40 # Number of MPI processes per node
# /!\ Warning, the following line is misleading but in Slurm vocabulary
# "multithread" refers to hyperthreading.
#SBATCH --hint=nomultithread # 1 MPI process per physical core (no hyperthreading)
#SBATCH --time=00:10:00 # Maximum execution time requested (HH:MM:SS)
#SBATCH --output=MPIJob%j.out # Standard output file
#SBATCH --error=MPIJob%j.out # Standard error file (here merged with standard output)

# go to the submission directory
cd ${SLURM_SUBMIT_DIR}

# purge interactively loaded and default inherited modules
module purge

# load modules
module load intel-all/19.0.4

# echo launched commands
set -x

# code execution
srun ./exec_mpi

2. Submission of the script via the sbatch command:

sbatch intel_mpi.slurm
INFO

For more information on job execution, including different examples (MPI, OpenMP, MPMD, CUDA MPS), see the dedicated page.

Tracking the progress of a SLURM job

The table below summarises the main commands for submitting the SLURM script and tracking its progress.

CommandFunction
sbatch <script>submit a batch Slurm script
squeue -u $USERtrack the submission status of your jobs
scontrol show job <jobid>display all parameters of a submitted job
scancel <jobid>cancel the execution of a job
INFO

You can connect via SSH to the compute nodes assigned to your jobs to monitor the execution of your calculations and control resource usage ( top, htop, nvidia-smi,...) : ssh <numéro du nœud>

Interactive execution

Any execution in interactive mode requires reserving resources via the Slurm manager. The time it takes to allocate these resources varies depending on the machine load.

ATTENTION

It is impossible to predict the moment at which the requested resources will be allocated. If you are not in front of your machine at that moment, resources will be reserved for your use without you using them.

From machines declared in the IDRIS filters, you have SSH access to the login nodes. You then have 2 options:

  1. Open a terminal directly on a compute node on which you reserve resources via the command srun.
    • Example with reservation of a GPU for 1 hour on the default partition:
    login@jean-zay3:∼$ srun --ntasks=1 --gres=gpu:1 --time=1:00:00 ... --pty bash
    srun: job 123456 queued and waiting for resources
    srun: job 123456 has been allocated resources
    login@r13i0n8:∼$
    • You are then connected to the compute node and can execute your code/script:
    login@r13i0n8:∼$ ./script.py
    • To disconnect:
    login@r13i0n8:∼$ exit
    exit
    login@jean-zay3:∼$
    ATTENTION
    • MPI is not supported in this configuration.
    • When the time limit (here 1 hour) is reached, the connection to the compute node is automatically cut. The execution is therefore interrupted prematurely.
  2. Make a resource allocation via the command salloc and chain executions on these resources via the command srun.
    • Example with reservation of a GPU for 1 hour from the default partition:
    login@jean-zay1:∼$ salloc --ntasks=1 --gres=gpu:1 <other-options>
    salloc: Pending job allocation 654321
    salloc: job 654321 queued and waiting for resources
    salloc: job 654321 has been allocated resources
    salloc: Granted job allocation 654321
    • When the allocation is effective, you can chain various executions:
    login@jean-zay1:∼$ srun python script_0.py
    ...
    login@jean-zay1:∼$ srun python script_1.py
    ...
    • To release the resources:
    login@jean-zay1:∼$ exit
    exit
    login@jean-zay1:∼$ salloc: Relinquishing job allocation 654321
    ATTENTION
    • When the time limit (here 1 hour) is reached, the allocation of the compute node is automatically terminated. Any execution in progress is therefore interrupted prematurely.
INFO

For more information on interactive execution, see the dedicated page..

Further information

Training

IDRIS provides various training courses for users of HPC and AI scientific computing.



Contact IDRIS

For any questions or requests, IDRIS User Support is available Monday to Thursday from 9 am to 6 pm and Friday from 9 am to 5:30 pm:


Workshops

IDRIS organizes workshops on getting started with the supercomputer and optimizing your computing codes.



Your opinion matters!

To give your feedback, report an error, or suggest an improvement, click here:

quick anonymous questionnaire

This questionnaire is temporary and will take less than a minute, so take the opportunity!