This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.
Python and Python libraries
Description
The Python ecosystem has many tools optimised for scientific computing and data processing. The most well-known ones are available on Jean Zay via modules.
Installed versions
The Python modules encapsulate Conda environments and contain libraries installed via conda or pip.
-
to list the modules:
module avail python -
to load a module (and therefore, indirectly, a Conda environment):
module load <module-selectionne>infoLoading a module activates the environment.
If these modules are incomplete for your use, you have several options:
- request the addition of a library via a ticket to IDRIS support (assist@idris.fr)
- add the missing package locally yourself via
pip install --user --no-cache-dir <nom-paquet> - create a new Conda environment via
conda create
The first option is largely preferable for performance and compatibility reasons with the Jean Zay environment. This also allows other users to benefit from these tools and avoids saturating your disk quotas.
Support for version 2.7 ended on 01/01/2020, so we only install new software versions for Python 3 from this date.
If you want to use the mpi4py package, you must also load one of the available MPI modules (module avail openmpi) via the module command. Indeed, no MPI module is loaded by default in the Python environments (although the mpi4py library is available).
The commands module purge, module switch, module unload do not work with Conda. This means that to deactivate a Conda environment and activate another one by loading another module, you need to run the following series of commands:
conda deactivate
module purge
module load <nouvel_environnement>
or restart a session.
Personal Python environment
IDRIS offers many modules dedicated to AI. The modules are designed to make the most of Jean Zay's infrastructure and the installed libraries (such as cuDNN, Intel MKL, NCCL).
If a module meets your needs but is missing some libraries essential to your codes, you have two solutions:
- make a request to assist@idris.fr specifying the library and the module concerned. This could potentially benefit other Jean Zay users. Note that the addition can only be made if it does not modify the environment too much: it might no longer work for other users.
- add the missing libraries locally via the
pipcommand, by overloading one of the proposed modules. Read on for the advantages and disadvantages of this method.
If your need is really very different from what IDRIS offers, you can create your own Conda environment with exactly what you need. This gives you total control but comes with some drawbacks:
- an environment takes up space (especially in terms of inodes) and risks saturating the space allocated to you. See the following section for tactics to limit this aspect.
- the installed codes are not optimised for Jean Zay and may not find the low-level libraries such as cuDNN (even when loading the
cudnn/7.6.5.32-cuda-10.2module). - User Support will have more difficulty troubleshooting in case of abnormal behaviour of your code.
- a personal installation generally comes with lower performance compared to an IDRIS installation.
Some useful commands:
module availto find out which modules are available, for examplemodule avail pythonmodule load <nom_complet_module>to load a module (and activate the Python or Conda environment at the same time)module list(once a module is loaded) to get the complete list of associated modulesmodule purgeto "unload" all modules. Be careful, this does not deactivate the Python and Conda environments (in case of anomaly, it may be preferable to start from a new session).
General advice before installing libraries locally
The storage directory for PyPI packages is located in $HOME/.local by default. You therefore risk quickly saturating the disk quota of your $HOME if this behaviour is maintained. To avoid this, we recommend creating a target directory in your $WORK directory and making a symbolic link in the $HOME pointing to the target directory:
- if the
$HOME/.localfolder already exists, you must first move it to the$WORK:
mv $HOME/.local $WORK
ln -s $WORK/.local $HOME
- if the
$HOME/.localfolder does not yet exist, you must first create a directory in the$WORK:
mkdir $WORK/.local
ln -s $WORK/.local $HOME
Similarly, via conda, the libraries are installed in $HOME/.conda, so it is strongly encouraged to proceed in the same way.
- if the
$HOME/.condafolder already exists, you must first move it to the$WORK:
mv $HOME/.conda $WORK
ln -s $WORK/.conda $HOME
- if the
$HOME/.condafolder does not yet exist, you must first create a directory in the$WORK:
mkdir $WORK/.conda
ln -s $WORK/.conda $HOME
To find out the occupation of $WORK:
idr_quota_useroridr_quota_projectgive the percentage of occupation in GB and inodes (updated once a day during the night)- for more detail, do
du --inodes --max-depth=1 $WORKordu -h --max-depth=1 $WORK
Local additions of libraries via pip
Advantages and disadvantages of this method:
- this method is suitable for adding libraries to an existing module and avoids having to install everything, which limits disk occupation.
- in return, these libraries installed via
pipwill be visible to any module, if it uses the same version of Python (the installation is done in.local/lib/pythonX.Y). - you will need to be vigilant about possible incompatibilities if you need to run calculations with various environments (for example
pytorch-gpu/py3/1.5.0andpytorch-gpu/py3/1.7.1are based on the same version of Python (3.7) but not the same version ofnumpy). - the local installation of a library takes precedence over the version from a module, but you need to remain vigilant about dependencies.
To install packages from the PyPI repository locally, use the --user and --no-cache-dir options:
pip install --user --no-cache-dir <paquet>
If you forget the --user option, you will attempt to install the package in the system environment and you will get an error message because the system environment is write-protected: you do not have the rights to modify it.
To uninstall a package installed in this way:
pip uninstall <paquet>
Other useful commands (see official documentation for options):
pip cache purgeuseful if you have forgotten the--no-cache-diroption and reduce disk occupation.pip listto have all accessible packages (coming from the module or a local installation)
Creation of a Conda environment
No need to install Anaconda or Miniconda to create your own environment! Just load one of the proposed Python environments, which can be found with the module avail python command.
The useful commands are:
conda deactivateto exit the environment loaded during themodule load- creation of an environment:
conda create -y -n <nom> python=x.y(with a specific version of Python, otherwise it is the default version of the module) conda activate <nom>conda env remove --name <nom>to properly delete the environmentconda clean -awill remove unused packages and the cache. Do not be alarmed if this attempts to delete the packages of the system environments (i.e. non-local)
Package installation is done via conda install. It may be wise to add -c conda-forge as this is the channel that contains the latest updates of many scientific packages.
Final advice
The $PATH, $PYTHONPATH and $CONDA_PREFIX are correctly configured, it is a priori not necessary to modify them (this risks changing the order of precedence of the directories or leading to paths to incompatible packages).
However, if a package installs an executable (this is the case with Horovod and Deepspeed), it will be in .local/bin. So, to access this executable on a compute node, it will be necessary to do a export PATH=$WORK/.local/bin:$PATH (in the SLURM file or once connected to a node).
If you alternately use several Jean Zay modules (for example tensorflow and pytorch), that you complete via pip installations, it is possible to redefine the location of the locally installed packages by defining the PYTHONUSERBASE variable differently depending on the module used: for example with tensorflow-gpu/py3/2.4.0, export PYTHONUSERBASE=$WORK/.local_tf240. This export must be done before the installation of the packages and before the launch of the calculations (do not forget to redefine the PATH if necessary).
If you ever need to compile AI codes using GPUs (CUDA, cuDNN), Intel MKL, etc., do not hesitate to contact support (assist@idris.fr).