Jean Zay: Slurm commands

Jobs are managed on all the nodes by the software Slurm .

  • To submit a submission script:

     $ sbatch script.slurm
  • To monitor jobs which are waiting or in execution:

     $ squeue -u $USER

    This command displays information in the following form:

    JOBID  PARTITION  NAME  USER  ST   TIME  NODES  NODELIST(REASON)   
      235  part_name  test   abc   R  00:02      1  r6i3n1 

    Where
    JOBID: Job identifier
    PARTITION: Partition used
    NAME: Job name
    USER: User name of job owner
    ST: Status of job execution ( R=running, PD=pending, CG=completing )
    TIME: Elapsed time
    NODES: Number of nodes used
    NODELIST: List of nodes used

  • To obtain complete information about a job (allocated resources and execution status):

     $ scontrol show job $JOBID 
  • To cancel an execution:

     $ scancel $JOBID 

Comments

  • A complete reference table of Slurm commands is available here .
  • In case of a problem on the machine, the SLURM default configuration is such that the running jobs are automatically restarted from scratch. If you want to avoid this behavior, you should use the --no-requeue option in the submission process, that is, submit your job doing

     $ sbatch --no-requeue script.slurm 

    or add the line

     $SBATCH --no-requeue 

    in your submission script.