Ada: Class structure - interactive and batch limits

*****************************************************************************

Structure of batch classes on Ada                           December 16, 2016

*****************************************************************************

1) Interactive limits
   ==================

     - Uniprocess -  Sequential or multithread  (OpenMP/Pthreads): 
           Total memory for all of the work: < 3.5GB
           Duration: 30 minutes (CPU time)
           Maximum number of tasks: 

     - Multiprocess (MPI):
           Memory by MPI process: 3.5GB
           Duration: 30 minutes (Elapsed time)
           Max. number of processes: 32 (with the MP_PROCS variable)


2) Structure of the batch processing classes
   =========================================

   Access to the uniprocess (sequential) classes:
   ----------------------------------------------
   T(h) ^ ELAPSED
        |
   100h +---------------+-----------------|
        |       t4      |      t4L        |
    20h +---------------+-----------------|
        |       t3      |      t3L        |
    10h +---------------+-----------------|
        |       t2      |      t2L        |
     1h +---------------+-----------------| 
        |       t1      |      t1L        |
      0 +---------------+-----------------+--> Memory
                       3.5GB            20.0GB

You must specify the following keyword to submit in these classes:
                         # @ job_type = serial
The memory by default is 3.5 GB. It is possible to request up to 20 GB via
the keyword:             # @ as_limit = 5.0gb


   Access to the OpenMP or Pthreads (multithread) classes:
   -------------------------------------------------------

   T(h) ^ ELAPSED
        |
   100h +-------------+--------------+--------------|
        |   mt8t4/L   |   mt16t4/L   |   mt32t4/L   |
    20h +-------------+--------------+--------------|
        |   mt8t3/L   |   mt16t3/L   |   mt32t3/L   |
    10h +-------------+--------------+--------------|
        |   mt8t2/L   |   mt16t2/L   |   mt32t2/L   |
     1h +-------------+--------------+--------------|
        |   mt8t1/L   |   mt16t1/L   |   mt32t1/L   |
      0 +-------------+--------------+--------------+--> Number of
                      8             16             32    cores

The number of cores reserved in a job (5 in this example) is specified by
the keywords:            # @ job_type = serial
                         # @ parallel_threads = 5
The OMP_NUM_THREADS variable is automatically set at the same numerical value
as for parallel_threads.

By default, the memory reserved per core is 3.5 GB. The maximum which you
can request is 7.0 GB per core ("Large" memory classes).
You can reduce the memory request (which can decrease the queuing times of
your jobs) or increase the memory request if necessary (without exceeding
7.0 GB per core, depending on the case) via the keyword:
                         # @ as_limit
However, be aware that as_limit indicates a value PER PROCESS. For a
multithread job, there is only one process which generates all the OpenMP or
Posix threads. If your executable generates 5 threads and you need 7.0 GB per
thread, you must specify as_limit as the number of threads multiplied by
7.0 GB:
                         # @ as_limit = 35.0gb


   Access to the MPI or hybrid classes (MPI+threads):
   --------------------------------------------------

   T(h) ^ ELAPSED
        |
   100h +--------+---------+---------+
        | c8t4/L | c16t4/L | c32t4/L | 
    20h +--------+---------+---------+---------+-- ......... --+---------+
        | c8t3/L | c16t3/L | c32t3/L | c64t3/L |   c.......t3  | c2048t3 |
    10h +--------+---------+---------+---------+-- ......... --+---------|
        | c8t2/L | c16t2/L | c32t2/L | c64t2/L |   c.......t2  | c2048t2 |
     1h +--------+---------+---------+---------+-- ......... --+---------|
        | c8t1/L | c16t1/L | c32t1/L | c64t1/L |   c.......t1  | c2048t1 |
      0 +--------+---------+---------+---------+-- ......... --+---------+-->
                 8        16        32        64,128,256,512,1024      2048
                                                       Number of cores

The number of MPI processes requested (8, in this example) is specified by
the keywords:            # @ job_type = parallel
                         # @ total_tasks = 8

For a hybrid job, the number of OpenMP/Posix threads requested per MPI 
process cannot exceed 32. It is specified by the keyword:
                         # @ parallel_threads = 4
The number of reserved cores is determined by total_tasks * parallel_threads.
With these values above, therefore, you would need to reserve 32 cores.

CAUTION: jobs that request more than 32 cores run on DEDICATED nodes
(total_tasks * parallel_threads > 32). Therefore, the number of cores
reserved and ACCOUNTED FOR is a multiple of 32. For example, if you ask for
65 cores, your job will reserve and be accounted for 96 cores. 

By default, the memory is 3.5 GB by core. This value is the maximum which
you can request if your job reserves more than 64 cores. If you reserve fewer
than 64 cores, the maximum memory is 7.0 GB by core ("Large" memory classes).
You can reduce the memory request (which can decrease the queuing times of
your jobs) or increase the memory request if necessary (without exceeding
3.5 GB or 7.0 GB per core, depending on the case) via the keyword:
                         # @ as_limit
However, be aware that as_limit indicates a value PER MPI PROCESS.
Accordingly, for "pure" MPI jobs, the memory per MPI process is equal to the
memory per core. For example, to request 3.0 GB per MPI process, you must
specify:                 # @ as_limit = 3.0Gb
But for hybrid jobs (MPI+threads), the memory per MPI process is equal to 
the memory per thread multiplied by the requested number of threads. 
For example, if your executable generates 5 threads per MPI process and you
want to have 7.0 GB per thread, you must specifiy as_limit as follows :
                         # @ as_limit = 35.0Gb

There are only 28 nodes on which "Large" jobs can be run (more than 3.5 GB of
memory per core), in comparison to 304 nodes available for usual-sized jobs.
It is to be expected, therefore, that there is a longer waiting time for the 
"Large" batch classes.


3) General Comments
   ================

All the time limits are expressed in Elapsed time (or Clock time), the 
keyword is               # @ wall_clock_limit = hh:mm:ss
The time which is counted towards your hours allocation is the Elapsed time 
consumed multiplied by the number of reserved cores. The Elapsed time can
vary depending on the load of the nodes (I/Os or message passing). 
To avoid reaching the Elapsed time limit, specify a larger amount of time to
allow for a margin of security.

The INTERACTIVE parallel executions compete with the parallel jobs. When the
requested resources are not available, the request is rejected with an error
message.

For certain codes, the compilation can last for a very long time which could
exceed the interactive Elapsed time limit (see above). For this reason a
class (compil) is dedicated to perform these long compilations, with a 
maximum Elapsed time of 20 hours. To use the compilation class, you must 
specify the keyword:     # @ class = compil

To debug your code, you can reduce the waiting time of your jobs by adding
the keyword:             # @ class = debug
With this keyword, the maximum Elapsed time is 900s, with a maximum of 64 
cores.

To run on the pre/post-processing nodes, the keyword is:
                         # @ requirements = (Feature == "prepost")
The maximum Elapsed time is 20h. You can not require more than 32 cores (a 
maximum of 1 compute node). You can ask until 100 GB of memory in uniprocessor
classes or 30 GB per reserved core in parallel classes (mpi, openmp or hybrid
runs).
CAUTION: pre/post-processing nodes must be used only for pre/post-processing.
That means for:
- I/O intensive applications (recombination of files, ...)
- applications using large memory (mesh generation, mesh partitioning, ...)

Only to transfer data between the pre/post-processing frontend node and a 
machine outside of IDRIS, the keywords are:
                         # @ job_type = serial
                         # @ class = archive
                         # @ requirements = (Feature == "prepost-front")
The maximum Elapsed time is 20h.

Jobs requesting between 20 and 100 hours (t4 classes) are not reimbursed in
the case of problems on the compute nodes. In fact, we cannot guarantee the
stability of the hardware for a time duration of nearly 5 days. It is 
therefore advised to implement periodic checkpoints for such jobs.


4) Bonus jobs
   ==========

It is possible to run so-called "bonus jobs" by adding the following keyword
(before #@queue):
                         # @ account_no = bonus

Bonus jobs are not included in the DARI hours allocation and accounting. 
They are run during machine low-load periods and there are specific classes
for parallel bonus jobs. Bonus jobs are limited to a maximum of 512 cores and
a maximum elapsed time of 20 hours.
For more information about bonus jobs, please see our website.


To see the French version, type : news class
*****************************************************************************