Skip to main content

Python

⚠ INFORMATION
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.
⚠ INFORMATION
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.

Python and Python libraries

Description

The Python ecosystem has many tools optimised for scientific computing and data processing. The most well-known ones are available on Jean Zay via modules.

Installed versions

The Python modules encapsulate Conda environments and contain libraries installed via conda or pip.

  • to list the modules: module avail python

  • to load a module (and therefore, indirectly, a Conda environment): module load <module-selectionne>

    info

    Loading a module activates the environment.

If these modules are incomplete for your use, you have several options:

The first option is largely preferable for performance and compatibility reasons with the Jean Zay environment. This also allows other users to benefit from these tools and avoids saturating your disk quotas.

Python 2.7

Support for version 2.7 ended on 01/01/2020, so we only install new software versions for Python 3 from this date.

mpi4py and MPI modules

If you want to use the mpi4py package, you must also load one of the available MPI modules (module avail openmpi) via the module command. Indeed, no MPI module is loaded by default in the Python environments (although the mpi4py library is available).

Deactivation of a Conda environment

The commands module purge, module switch, module unload do not work with Conda. This means that to deactivate a Conda environment and activate another one by loading another module, you need to run the following series of commands:

conda deactivate
module purge
module load <nouvel_environnement>

or restart a session.

Personal Python environment

IDRIS offers many modules dedicated to AI. The modules are designed to make the most of Jean Zay's infrastructure and the installed libraries (such as cuDNN, Intel MKL, NCCL).

If a module meets your needs but is missing some libraries essential to your codes, you have two solutions:

  • make a request to assist@idris.fr specifying the library and the module concerned. This could potentially benefit other Jean Zay users. Note that the addition can only be made if it does not modify the environment too much: it might no longer work for other users.
  • add the missing libraries locally via the pip command, by overloading one of the proposed modules. Read on for the advantages and disadvantages of this method.
Personal Conda environment

If your need is really very different from what IDRIS offers, you can create your own Conda environment with exactly what you need. This gives you total control but comes with some drawbacks:

  • an environment takes up space (especially in terms of inodes) and risks saturating the space allocated to you. See the following section for tactics to limit this aspect.
  • the installed codes are not optimised for Jean Zay and may not find the low-level libraries such as cuDNN (even when loading the cudnn/7.6.5.32-cuda-10.2 module).
  • User Support will have more difficulty troubleshooting in case of abnormal behaviour of your code.
  • a personal installation generally comes with lower performance compared to an IDRIS installation.

Some useful commands:

  • module avail to find out which modules are available, for example module avail python
  • module load <nom_complet_module> to load a module (and activate the Python or Conda environment at the same time)
  • module list (once a module is loaded) to get the complete list of associated modules
  • module purge to "unload" all modules. Be careful, this does not deactivate the Python and Conda environments (in case of anomaly, it may be preferable to start from a new session).

General advice before installing libraries locally

The storage directory for PyPI packages is located in $HOME/.local by default. You therefore risk quickly saturating the disk quota of your $HOME if this behaviour is maintained. To avoid this, we recommend creating a target directory in your $WORK directory and making a symbolic link in the $HOME pointing to the target directory:

  • if the $HOME/.local folder already exists, you must first move it to the $WORK:
mv $HOME/.local $WORK
ln -s $WORK/.local $HOME
  • if the $HOME/.local folder does not yet exist, you must first create a directory in the $WORK:
mkdir $WORK/.local
ln -s $WORK/.local $HOME

Similarly, via conda, the libraries are installed in $HOME/.conda, so it is strongly encouraged to proceed in the same way.

  • if the $HOME/.conda folder already exists, you must first move it to the $WORK:
mv $HOME/.conda $WORK
ln -s $WORK/.conda $HOME
  • if the $HOME/.conda folder does not yet exist, you must first create a directory in the $WORK:
mkdir $WORK/.conda
ln -s $WORK/.conda $HOME

To find out the occupation of $WORK:

  • idr_quota_user or idr_quota_project give the percentage of occupation in GB and inodes (updated once a day during the night)
  • for more detail, do du --inodes --max-depth=1 $WORK or du -h --max-depth=1 $WORK

Local additions of libraries via pip

Advantages and disadvantages of this method:

  • this method is suitable for adding libraries to an existing module and avoids having to install everything, which limits disk occupation.
  • in return, these libraries installed via pip will be visible to any module, if it uses the same version of Python (the installation is done in .local/lib/pythonX.Y).
  • you will need to be vigilant about possible incompatibilities if you need to run calculations with various environments (for example pytorch-gpu/py3/1.5.0 and pytorch-gpu/py3/1.7.1 are based on the same version of Python (3.7) but not the same version of numpy).
  • the local installation of a library takes precedence over the version from a module, but you need to remain vigilant about dependencies.

To install packages from the PyPI repository locally, use the --user and --no-cache-dir options:

pip install --user --no-cache-dir <paquet>
info

If you forget the --user option, you will attempt to install the package in the system environment and you will get an error message because the system environment is write-protected: you do not have the rights to modify it.

To uninstall a package installed in this way:

pip uninstall <paquet>

Other useful commands (see official documentation for options):

  • pip cache purge useful if you have forgotten the --no-cache-dir option and reduce disk occupation.
  • pip list to have all accessible packages (coming from the module or a local installation)

Creation of a Conda environment

No need to install Anaconda or Miniconda to create your own environment! Just load one of the proposed Python environments, which can be found with the module avail python command.

The useful commands are:

  • conda deactivate to exit the environment loaded during the module load
  • creation of an environment: conda create -y -n <nom> python=x.y (with a specific version of Python, otherwise it is the default version of the module)
  • conda activate <nom>
  • conda env remove --name <nom> to properly delete the environment
  • conda clean -a will remove unused packages and the cache. Do not be alarmed if this attempts to delete the packages of the system environments (i.e. non-local)

Package installation is done via conda install. It may be wise to add -c conda-forge as this is the channel that contains the latest updates of many scientific packages.

Final advice

The $PATH, $PYTHONPATH and $CONDA_PREFIX are correctly configured, it is a priori not necessary to modify them (this risks changing the order of precedence of the directories or leading to paths to incompatible packages).

However, if a package installs an executable (this is the case with Horovod and Deepspeed), it will be in .local/bin. So, to access this executable on a compute node, it will be necessary to do a export PATH=$WORK/.local/bin:$PATH (in the SLURM file or once connected to a node).

Variable $PYTHONUSERBASE

If you alternately use several Jean Zay modules (for example tensorflow and pytorch), that you complete via pip installations, it is possible to redefine the location of the locally installed packages by defining the PYTHONUSERBASE variable differently depending on the module used: for example with tensorflow-gpu/py3/2.4.0, export PYTHONUSERBASE=$WORK/.local_tf240. This export must be done before the installation of the packages and before the launch of the calculations (do not forget to redefine the PATH if necessary).

Compilation on GPU

If you ever need to compile AI codes using GPUs (CUDA, cuDNN), Intel MKL, etc., do not hesitate to contact support (assist@idris.fr).

Your opinion matters!

To give your feedback, report an error, or suggest an improvement, click here:

quick anonymous questionnaire

This questionnaire is temporary and will take less than a minute, so take the opportunity!