Adapp : Execution of a sequential job in batch

The jobs are managed on all the nodes by the software LoadLeveler. They are distributed into classes principally in function of the Elapsed time, the number of cores, and the memory requested. You can consult the structure of batch classes on Ada and Adapp here.

To submit a sequential job in batch from Adapp, it is necessary to do the following:

  • Create a submission script:

    pg_seq.ll
    # @ job_type = serial
    # Specific to Adapp
    # @ requirements = (Feature == "prepost")
    # Max. CPU time of a process, hhh:mn:ss (1h30mn here)
    # @ wall_clock_limit=1:30:00
    # Name of job for LoadLeveler
    # @ job_name = Mono
    # Job standard output and error files merged in the same file
    # @ output   = $(job_name).$(jobid)
    # @ error    = $(job_name).$(jobid)
    # @ queue
     
    # To print an echo of the commands
    set -x
     
    # Go to the temporary directory TMPDIR
    cd $TMPDIR
     
    # The LOADL_STEP_INITDIR variable is automatically positioned by 
    # LoadLeveler to the directory where the command llsubmit was typed
    cp $LOADL_STEP_INITDIR/a.out .
     
    # Execution of the program
    ./a.out
  • Submit this script via the command llsubmit :

    $ llsubmit pg_seq.ll

Comments

  • Do not forget the keyword # @ requirements = (Feature == “prepost”); otherwise your computation will be done on the standard compute nodes, not Adapp.
  • In this example, we are supposing that the executable file a.out is found in the submission directory, that is the directory from which we enter the command llsubmit (the LOADL_STEP_INITDIR is automatically referenced by LoadLeveler).
  • The output file of the Mono.job_number computation is also found in the submission directory. It is created at the beginning of the job execution; editing or modifying this file during job execution can disrupt it.
  • Memory : The default value is 3.5 GB. The maximum value that you can request is 100 GB via the keyword as_limit. For example, to request 20 GB: # @ as_limit = 20.0gb.
  • The Elapsed time limit associated with the keyword wall_clock_limit is relative to the entire job. The limit is 20 hours maximum for the classes specific to Adapp.
  • It is also possible to limit the CPU time of each command executed in the job with the keyword cpu_limit. The combined use of the two keywords wall_clock_limit and cpu_limit allows you to ensure the execution of the last instructions of a job (those following the execution of your binary file):

    job.ll
    # @ job_type = serial
    # @ requirements = (Feature == "prepost")
    # @ wall_clock_limit=1:00:00
    # @ cpu_limit=45:00
    # @ job_name = myjob
    # @ output = $(job_name).$(jobid)
    # @ error  = $(job_name).$(jobid)
    # @ queue
     
    set -x
     
    # Copy to the TMPDIR the files you need for your computation:
    cp -p ... $TMPDIR
    cd $TMPDIR
     
    # Run:
    ./my_prog
     
    # Save to the WORKDIR  the result files you are interested in:
    ls -alrt
    mkdir $WORKDIR/results.$(jobid)
    cp ... $WORKDIR/results.$(jobid)

The execution of your binary file consumes most of the job's Elapsed time and CPU. Therefore, it is your binary file which is going to reach the CPU time limit (determined by the keyword cpu_limit). However, you don't know in advance how long this computing phase will last. Moreover, the TMPDIR directory is automatically erased at the end of the job, so you must save your files before this erasure occurs. To be sure that enough time will remain to carry out these safeguards, choose a program cpu_limit value sufficiently inferiour to the job's wall_clock_limit value. The writing of an executable file on the disk can be significantly slowed down by the overall load of the machine; the Elapsed time of the job's my_prog executable file is going to fluctuate but not its CPU time. It is wise, therefore, to leave a comfortable margin between the wall_clock_limit and the cpu_limit. It is not possible, however, to give you a more precise guideline about this because the CPU/Elapsed time ratio varies for each executable file: You must proceed by successive tries.