Jean Zay: Slurm commands
Jobs are managed on all the nodes by the software Slurm .
- To submit a submission script:
$ sbatch script.slurm
- To monitor jobs which are waiting or in execution:
$ squeue -u $USER
This command displays information in the following form:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 235 part_name test abc R 00:02 1 r6i3n1
Where
JOBID: Job identifier
PARTITION: Partition used
NAME: Job name
USER: User name of job owner
ST: Status of job execution ( R=running, PD=pending, CG=completing )
TIME: Elapsed time
NODES: Number of nodes used
NODELIST: List of nodes used
- To obtain complete information about a job (allocated resources and execution status):
$ scontrol show job $JOBID
- To cancel an execution:
$ scancel $JOBID
Comments
- A complete reference table of Slurm commands is available here .
- In case of a problem on the machine, the SLURM default configuration is such that the running jobs are automatically restarted from scratch. If you want to avoid this behavior, you should use the
--no-requeue
option in the submission process, that is, submit your job doing
$ sbatch --no-requeue script.slurm
or add the line
$SBATCH --no-requeue
in your submission script.