Ada, Adapp: Control commands for batch jobs

Batch jobs are managed on all the nodes by the LoadLeveler software.  Consequently, the principal commands used to control your jobs are the following:

  • llsubmit  to submit a job in batch:
  $ llsubmit mon_job.ll
  llsubmit: Processed command file through Submit Filter: ''/local/loadl/Fidris/llsubmit_exit''.
  llsubmit: The job ''ada338.idris.fr.1942'' has been submitted.
  

Any other messages imply a job error; sometimes an error message set up by IDRIS will indicate which submission parameter was omitted or is causing the problem. Note:  At IDRIS, the llsubmit command does not allow any options.

  • llq [-u login] displays information about the evolution and consumption of all the batch jobs on the machine. 

The -u option restricts the displaying of jobs belonging to the specified login (for example, your own jobs).

  $ llq -u rlab432
  Id                       Owner      Submitted   ST PRI Class        Running On
  ------------------------ ---------- ----------- -- --- ------------ -----------
  ada338.2630.0            rlab432     1/4  11:44 R  100 mt8t1        ada072
  1 job step(s) in query, 0 waiting, 0 pending, 1 running, 0 held, 0 preempted

The status (ST) column indicates if your job is running (R) or idle/waiting (I). The other current states are NQ (Not Queued, or outside of the queues); H (Hold); or CS (Changing State), which indicates that a job or a step has finished but is in the process of exiting the queues.

Attention: If you do not find the standard output of a completed job, it is probably because you are not in the submission directory. It is also possible that you have either gone beyond your quota on Ada, or that you have not specified the output and error files in your submission script (lines # @ output and # @ error). In the last case, the standard output of your job is lost.

  • llcancel allows you to delete a job.  For example, to delete job 'ada043.1942.0' which is running on one of the machine nodes:
  $ llq -u rlab432
  Id                       Owner      Submitted   ST PRI Class        Running On
  ------------------------ ---------- ----------- -- --- ------------ -----------
  ada338.1942.0            rlab432     1/4  11:42 R  100 c8t1         ada012
  
  $ llcancel ada338.1942
  llcancel: Cancel command has been sent to the central manager.
  • idrjar  displays information concerning the consumption of your batch jobs. It includes the following:
    • Elapsed time (and CPU time)
    • Memory requested and used
    • Dates of job submission, beginning, and end of execution
    • Efficiency (interesting for parallel jobs). This information cannot be obtained until the day following the job execution. To know more about this command, launch idrjar -h on Ada. Examples:
  $ idrjar -d02
  |-----------------------------------------------|
  |--- IDRIS/CNRS. Version du 17 decembre 2013 ---|
  |-----------------------------------------------|
  Outputs for login rlab432 for the period
          ==> 01 February 2013 to 28 February 2013
   Owner      Job Name           JobId       Queue tEse tCpu  #T    (%)   S
  ------- ---------------- ----------------- ----- ---- ----- --- ------- -
  rlab432 TEST_200_ssmodul ada338.74526.0 batch  371    32  16    0.54 R
  rlab432 TEST_200_ssmodul ada338.74535.0 batch 1397  2370  16   10.60 C
  rlab432 TEST_200_ssmodul ada338.74589.0 batch 3736 20125  16   33.67 R
  -------------------------------------------------------------------------
          TOTAL CONSUMPTION OF THE ABOVE JOBS ==> 88064s, or 24.46h (*)
  ---------------------------------- LEGEND ----------------------------------
  tEse  : Elapsed time consumed in seconds
  tCpu  : CPU consumed in seconds
  #T    : Number of tasks or processes used
  (%)   : Job efficiency rate ==> tCpu*100/(tEse*#T).
  S     : C (completed) ==> job completed normally
          R (removed)   ==> job destroyed during execution 
                            (by using the command llcancel, for example)
  $ idrjar -d12 -l
  |-----------------------------------------------|
  |--- IDRIS/CNRS. Version du 17 decembre 2013 ---|
  |-----------------------------------------------|
  Outputs for the login rlab432 for the period
          ==> 01 February 2013 to 28 Februay 2013
        JobId       Queue Qdate Bdate Edate   tEse       tCpu     Data+Stack MAXRSS #T    (%)   S
  ----------------- ----- ----- ----- ----- -------- ------------ ---------- ------ --- ------- -
  ada338.74526.0 batch 13/02 13/02 13/02      371           32       9663     80  16    0.54 R
                          15:38 15:38 15:44 00:06:11 000+00:00:32
  ada338.74535.0 batch 13/02 13/02 13/02     1397         2370       9663     91  16   10.60 C
                          15:45 15:45 16:08 00:23:17 000+00:39:30
  ada338.74589.0 batch 13/02 13/02 13/02     3736        20125       9663     23  16   33.67 R
                          16:09 16:09 17:11 01:02:16 000+05:35:25
  -----------------------------------------------------------------------------------------------
          TOTAL CONSUMPTION OF THE ABOVE JOBS ==> 88064s, or 24.46h
  ---------------------------------- LEGEND ----------------------------------
  Qdate      : date and hour of job entry in the Loadleveler queue
  Bdate      : date and hour of job execution start
  Edate      : date and hour of job execution end
  tEse       : Elapsed time consumed in seconds and in ''hours, minutes,
               seconds'' (format ==> hh:mm:ss).
  tCpu       : CPU time consumed in seconds and in ''days, hours, minutes,
               seconds'' (format ==> jjj+hh:mm:ss).
  Data+Stack : Data and Stack memory requested (in megabytes).
  MAXRSS     : maximum memory used by the job(in megabytes).
  #T         : number of tasks or processors used
  (%)        : job efficiency rate ==> tCpu*100/(tEse*#T).
  S          : C (completed) ==> job completed normally
               R (removed)   ==> job destroyed during execution
                                 (by using the command llcancel, for example)