Jean Zay: Interactive execution of a CPU code

On Jean Zay, access to interactive resources can be done in two different ways.

Connection on the front end

Access to the front end is obtained via an ssh connection:

$ ssh login@jean-zay.idris.fr

The interactive node resources are shared between all the connected users: As a result, the interactive on the front end is reserved exclusively for compilation and script development.

Notes: On the front end, the RAM memory is limited to 5GB to be shared among all processes and the CPU time is limited to 30mn (1800s) per process to assure a better resource sharing.

All interactive executions of your codes must be done on the CPU compute nodes by using one of the two following commands:

However, if the computations require a large amount of CPU resources (in number of cores, memory, or elapsed time), it is necessary to submit a batch job.

Obtaining a terminal on a CPU compute node

It is possible to open a terminal directly on a compute node on which the resources have been reserved for you (here 4 cores) by using the following command:

$ srun --pty --ntasks=1 --cpus-per-task=4 --hint=nomultithread [--other-options] bash

Comments:

  • An interactive terminal is obtained with the --pty option.
  • The reservation of physical cores is assured with the --hint=nomultithread option (no hyperthreading).
  • By default, the allocated CPU memory is proportional to the number of reserved cores. For example, if you request 1/4 of the cores of a node, you will have access to 1/4 of its memory. You can consult our documentation on this subject : Memory allocation on CPU partitions.
  • --other-options contains the usual Slurm options for job configuration (--time=, etc.): See the documentation on batch submission scripts in the index section Execution/Commands of a CPU code.
  • The reservations have all the resources defined in Slurm by default, per partition and per QoS (Quality of Service). You can modify the limits of them by specifying another partition and/or QoS as detailed in our documentation about the partitions and QoS.
  • For multi-project users and those having both CPU and GPU hours, it is necessary to specify on which project account (project hours allocation) to count the computing hours of the job as explained in our documentation about computing hours management.
  • We strongly recommend that you consult our documentation detailing computing hours management on Jean Zay to ensure that the hours consumed by your jobs are deducted from the correct allocation.

The terminal is operational after the resources have been granted:

$ srun --pty --ntasks=1 --cpus-per-task=4 --hint=nomultithread bash
srun: job 1365358 queued and waiting for resources
srun: job 1365358 has been allocated resources
bash-4.2$ hostname
r4i3n7

You can verify if your interactive job has started by using the squeue command. Complete information about the status of the job can be obtained by using the scontrol show job <job identifier> command.

After the terminal is operational, you can launch your executable files in the usual way: ./your_executable_file. For an MPI execution, you should again use srun : srun ./your_mpi_executable_file. Important: Hyperthreading is not usable via MPI in this configuration. To leave the interactive mode :

bash-4.2$ exit

Caution: If you do not yourself leave the interactive mode, the maximum allocation duration (by default or specified with the --time option) is applied and this amount of hours is then counted for the project you have specified.

Interactive execution on the CPU partition

If you don't need to open a terminal on a compute node, it is also possible to start the interactive execution of a code on the compute nodes directly from the front end by using the following command (here with 4 tasks) :

$ srun --ntasks=4 --hint=nomultithread [--other-options] ./my_executable_file

Comments:

  • The --hint=nomultithread option reserves physical cores (no hyperthreading).
  • By default, the allocated CPU memory is proportional to the number of reserved cores. For example, if you request 1/4 of the cores of a node, you will have access to 1/4 of its memory. You may consult our documentation on this subject: Memory allocation on CPU partitions.
  • --other-options contains the usual Slurm options for configuring jobs (--output=, --time=, etc.) : See the documentation on batch job submission scripts found in the Jean Zay index section, Execution/commands of a CPU code).
  • Reservations have all the resources defined in Slurm by default, per partition and per QoS (Quality of Service). You can modify the limits in them by specifying another partition and/or QoS as detailed in our documentation about the partitions and QoS.
  • For multi-project users and those having both CPU and GPU hours, it is necessary to specify on which project account (project hours allocation) to count the computing hours of the job as explained in our documentation about computing hours management.
  • We strongly recommend that you consult our documentation detailing computing hours management on Jean Zay to ensure that the hours consumed by your jobs are deducted from the correct allocation.

Reserving reusable resources for more than one interactive execution

Each interactive execution started as described in the preceding section is equivalent to a different job. As with all the jobs, they are susceptible to being placed in a wait queue for a certain length of time if the computing resources are not available.

If you wish to do more than one interactive execution in a row, it may be pertinent to reserve all the resources in advance so that they can be reused for the consecutive executions. You should wait until all the resources are available at one time at the moment of the reservation and not reserve for each execution separately.

Reserving resources (here for 4 tasks) is done via the following command:

$ salloc --ntasks=4 --hint=nomultithread [--other-options]

Comments:

  • The --hint=nomultithread option reserves physical cores (no hyperthreading).
  • By default, the allocated CPU memory is proportional to the number of reserved cores. For example, if you request 1/4 of the cores of a node, you will have access to 1/4 of its memory. You may consult our documentation on this subject: Memory allocation on CPU partitions.
  • --other-options contains the usual Slurm options for configuring jobs (--output=, --time=, etc.) : See the documentation on batch job submission scripts found in the Jean Zay index section, Execution/commands of a CPU code).
  • Reservations have all the resources defined in Slurm by default, per partition and per QoS (Quality of Service). You can modify the limits in them by specifying another partition and/or QoS as detailed in our documentation about the partitions and QoS.
  • For multi-project users and those having both CPU and GPU hours, it is necessary to specify on which project account (project hours allocation) to count the computing hours of the job as explained in our documentation about computing hours management.
  • We strongly recommend that you consult our documentation detailing computing hours management on Jean Zay to ensure that the hours consumed by your jobs are deducted from the correct allocation.

The reservation becomes usable after the resources have been granted:

salloc: Pending job allocation 1367065
salloc: job 1367065 queued and waiting for resources
salloc: job 1367065 has been allocated resources
salloc: Granted job allocation 1367065

You can verify that your reservation is active by using the squeue command. Complete information about the status of the job can be obtained by using the scontrol show job <job identifier> command.

You can then start the interactive executions by using the srun command:

$ srun [--other-options] ./code

Comment: If you do not specify any option for the srun command, the options for salloc (for example, the number of tasks) will be used by default.

Important:

  • After reserving resources with salloc, you are still connected on the front end (you can verify this with the hostname command). It is imperative to use the srun command for your executions to use the reserved resources.
  • If you forget to cancel the reservation, the maximum allocation duration (by default or specified with the --time option) is applied and these hours is then counted for the project you have specified. Therefore, to cancel the reservation, it is necessary to manually enter:
$ exit
exit
salloc: Relinquishing job allocation 1367065