Table des matières
Hugging Face Accelerate
Accelerate is a Hugging Face library which facilitates the usage of accelerators (GPU for Jean Zay).
Mono-node
In mono-node, you can generally use Accelerate easily already with the classic Hugging Face documentation.
Multi-node
In the case of usage in multi-node, it is necessary to generate one or multiple yaml configuration files and transmit them to the Accelerate code with the parameter --config_file
(or --config
for older versions). Moreover, an execution per node is necessary using the correct configuration file for each execution.
However, you can use Accelerate in multi-node on Jean Zay easily with the idr_accelerate launcher. This tool, implemented by IDRIS, is available on most of the Jean Zay modules.
This launcher manages the restrictions mentioned previously by creating and executing the Accelerate codes for you.
It replaces the Accelerate launcher in the following way: accelerate launch train.py
becomes idr_accelerate train.py
.
To use idr_accelerate in multi-node, you simply need to configure 1 task per node and use srun to launch one execution per node. Here is a simple example:
#!/bin/bash #SBATCH --job-name=accelerate_multi-node #SBATCH --gres=gpu:4 #SBATCH --ntasks-per-node=1 #SBATCH --nodes=2 #SBATCH --cpus-per-task=40 #SBATCH --hint=nomultithread #SBATCH --time=02:00:00 #SBATCH --qos=qos_gpu-dev #SBATCH --account=account@v100 ## load module module purge module load llm srun idr_accelerate train.py --lr 0.5 --epochs 10
A complete example of using Accelerate with DDP is available on our Git repository of LLM examples.
DeepSpeed via Accelerate
Accelerate enables a « simplified » usage of DeepSpeed. This makes it possible to use certain optimizations such as ZeRO. The Accelerate DeepSpeed documentation provides information about its basic functioning and provides some examples.
You can use DeepSpeed in 3 different ways with idr_accelerate :
# Gives only usage flags (simple, but few options possible) srun idr_accelerate --use_deepspeed --zero_stage 2 --mixed_precision fp16 train.py --lr 0.5 # Gives an Accelerate configuration file containing the DeepSpeed parameters to use srun idr_accelerate --config_file configs/accelerate_deepspeed.yaml train.py --lr 0.5 # Gives an Accelerate configuration file which points to a DeepSpeed configuration file # (the most complex method but also with the most options) srun idr_accelerate --config_file=configs/accelerate_deepspeed-config.yaml train.py --lr 0.5