Job submission on Dalia
⚠ INFORMATION
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.
Slurm Job Manager
Jobs are managed on Dalia by the Slurm software, as on Jean Zay.
The usual commands allow you to control jobs:
sbatch: Submission of a batch filesrun: Execution of a tasksqueue: Checking jobs in the queuescancel: Cancellation of a job
important
From the login node, you need to load the module slurm to use Slurm commands (if not already done):
module load slurm/slurm/24.11
Submission via an Apptainer container
It is recommended to work in Apptainer containers on Dalia.
Example of a submission script:
#!/usr/bin/env bash
#SBATCH --job-name=test_dalia
#SBATCH --output=slurm_log/%x_%j.out
#SBATCH --error=slurm_log/%x_%j.out
## Reservation de la totalité des ressources d'un noeud : 144 CPUs et 4 GPUs
#SBATCH --nodes=1 # Nombre de noeuds
#SBATCH --gpus-per-node=4 # Max 4 GPU par noeud
#SBATCH --ntasks-per-node=4 # Nombre de tache par noeud
#SBATCH --cpus-per-task=36 # Nombre de CPU par tache : 4 * 36 = 144 CPUs
## Temps limite d'execution du travail (HH:MM:SS)
#SBATCH --time=0:40:00
cd $PROJECT_DIR
export APPTAINER_CACHEDIR=/lustre/work/<project_group>/<login>/<cache_directory>
srun apptainer exec --nv --pwd /my_project_dir --bind $PROJECT_DIR:/my_project_dir mon_container.sif <commande> # Use --nv to enable nvidia support
--nvallows the use of NVIDIA GPUs in the container;--pwddefines the working directory in the container;--bindallows mounting the directory$PROJECT_DIRto/my_project_dirin the container.