Python personal environment

IDRIS offers many modules dedicated to AI. The modules are designed for the most efficient use of the Jean Zay intrastructure and the installed libraries (such as cuDNN, Intel MKL, NCCL).

If a module corresponds to your needs but lacks libraries which are indispensable for your codes, you have two solutions:

  • Make a request to the user support team (assist@idris.fr) to install the library and specify the concerned library and module. This will also potentially benefit other Jean Zay users.
  • Add the missing libraries locally via the pip command, by loading one of the proposed modules as an extension to the environment. The advantages and disadvantages of this method are described below.

If what you need is really very different from what is offered by IDRIS, you can create your own conda environment with exactly what is necessary for you. This gives you total control but there are some disadvantages:

  • An environment takes up space (particularly in terms of inodes) and could saturate your allocatted space. See the following section for tactics to limit this problem.
  • The installed codes are not optimised for Jean Zay and are not necessarily going to find core libraries such as cuDNN (even by loading the cudnn/7.6.5.32-cuda-10.2).
  • The user support team will have more difficulty assisting you in case your code shows abnormal conduct.

Some useful commands:

  • To know which modules are available: module avail, for example module avail python
  • To load a module: module load <module_name>
  • To obtain a complete list of associated modules: module list (after a module is loaded)
  • To “unload” all the modules: module purge (in case of anomaly, it could be preferable to begin a new session)

General advice before installing libraries locally

The storage directory for packages PyPI is situated in $HOME/.local by default. You could, therefore, rapidly saturate the disk quota of your $HOME if you keep this default behaviour. To avoid this, we recommend that you create a target directory in your $WORK and create a symbolic link in the $HOME which points to the target directory:

  • If the $HOME/.local file already exists, you must first move it to the $WORK:
$ mv $HOME/.local $WORK
$ ln -s $WORK/.local $HOME
  • If the $HOME/.local file does not yet exist, you must first create a directory in the $WORK:
$ mkdir $WORK/.local
$ ln -s $WORK/.local $HOME

Likewise, via conda, libraries are installed in $HOME/.conda; it is strongly encouraged, therefore, to proceed in the same way.

  • If the $HOME/.conda file already exists, you must first move it to the $WORK:
$ mv $HOME/.conda $WORK
$ ln -s $WORK/.conda $HOME
  • If the $HOME/.conda file does not yet exist, you must first create a directory in the $WORK:
$ mkdir $WORK/.conda
$ ln -s $WORK/.conda $HOME

To know the usage of the $WORK:

  • idrquota -w gives the usage percentage in GB and inodes (updated every 30 minutes).
  • For more detail, enter du --inodes --max-depth=1 $WORK or du -h --max-depth=1 $WORK.

Adding libraries locally via pip

Advantages and disadvantages of this method:

  • This method is suited for adding libraries to an existing module and avoids having to install everything.
  • In return, however, libraries installed via pip will be visible to every module which uses the same version of Python (installation is done in .local/lib/pythonx.y).
  • It is necessary, therefore, to be vigilant about possible incompatibilities if you launch calculations with multiple environments (for example, pytorch-gpu/py3/1.5.0 and pytorch-gpu/py3/1.7.1 are based on the same version of Python (3.7) but not the same version of numpy).
  • Local installation of a library takes precedence over the version from a module but it is necessary to stay vigilant about the dependencies.

To locally install packages from the PyPI storage, use the --user and --no-cache-dir options:

$ pip install --user --no-cache-dir <package>
$ pip uninstall <package>

Other useful commands (for the options, see official documentation):

  • pip cache purge (useful if you have forgotten the --no-cache-dir option)
  • pip list to see all the accessible packages (from the module or from a local installation)

Creation of a conda environment

You do not need to install Anaconda or Miniconda to create your own environment! You just load one of the proposed Python environments which are found by using the module avail python command.

The useful commands are:

  • conda deactivate to exit from the environment loaded with module load
  • conda create -y -n <name> python=x.y to create an environment (specify a Python version; otherwise, it is the module default version)
  • conda activate <name>
  • conda env remove --name <name> to correctly delete the environment
  • conda clean -a to remove unused packages and the cache. Do not be concerned if this appears to try to delete the packages of the system environment (ie. non-local).

Installation of packages is done via conda install. It could be wise to add -c conda-forge as this is the channel which contains the latest updates of many scientific packages.

Final advice

$PATH, $PYTHONPATH and $CONDA_PREFIX are configured correctly; it should not be necessary to modify them (which could change the directory precedence order, or result in paths to incompatible packages). However, if a package installs an executable (this is the case in Horovod and Deepspeed), it will be in .local/bin. To access this executable on a compute node, it will be necessary to enter export PATH=$WORK/.local/bin:$PATH (in the Slurm file or, after being connected, on a node).

Alternatively, if you use multiple Jean Zay modules (for example, Tensorflow and Pytorch), which you complete via the “pip” installations, it is possible to redefine the location of locally installed packages by defining the PYTHONUSERBASE variable: for example, export PYTHONUSERBASE=$WORK/.local_tf240. This is done before the package installation and before launching the computations (remember to eventually redefine the PATH if necessary).

If ever you need to compile AI codes using the GPUs (CUDA, cuDNN), intel MKL, etc., do not hesitate to contact the user support team (assist@idris.fr).