Ada, Adapp, Turing: Disk spaces

Three distinct disk spaces (HOME, WORKDIR and TMPDIR) are accessible to users on the Ada, Adapp and Turing computers. A fourth disk space, the ARCHIVE space, is only accessible on Adapp. Each space has specific characteristics adapted to its usage which are described on this page. The paths to access these spaces are stocked in 4 variables of the shell environment: $HOME, $WORKDIR, $TMPDIR et $ARCHIVE.

The HOME

$HOME : This is the home directory during an interactive connection. This space is intended for frequently-used small-sized files such as the shell environment files, the tools, and potentially the sources and libraries of limited size (in space and in number of files). The characteristics of the HOME are:

* A permanent space.
* Backed up daily by the [[web:eng:docs:tina:tina-eng|TiNa software application]].
* Shared by the Adapp and Ada machines.
* Accessible in interactive or batch jobs.
* It is the user's home directory when beginning an interactive connection.  It can also be accessed through the ''$HOME'' variable: \\ \\ <code>$ cd $HOME</code>
* Submitted to [[web:eng:su:shared:quota-eng|group quotas]] which are intentionally rather low : 4GB by default. The IDRIS command ''quota_u'' allows you to see the real situation of your disk occupation and that of each of your group members.

  • The total HOME size is 4 TiB (4,4 TB) on Ada and Adapp and 2 TiB (2,2 TB) on Turing.

  • Intended to receive small-sized files, the block size is 256 KiB (262 KB) (command stat -f $HOME).

The WORKDIR

$WORKDIR : This is a permanent work and storage space which is usable in batch. In this space, we generally store large-sized files which are used to run batch jobs: data files, executable files, result or restart files, submission scripts and very large source files. The characteristics of WORKDIR are:

  • A permanent space.
  • Not backed up.
  • Common to the 3 machines: Adapp, Ada and Turing.
  • Accessible in interactive or in batch jobs.
  • Composed of 2 sections:
    • A section in which each user has an individual part; it is accessed with the command:

      $ cd $WORKDIR
    • A section common to the UNIX group to which the user belongs. The files to be shared by all the group members can be placed here; it is accessed with the command:

      $ cd $COMMONDIR
  • Submitted to group quotas : 1 TiB (1.1 TB), by default. The IDRIS command quota_u -w allows you to see the real situation of your disk occupation and that of each of your group members.
  • The total WORKDIR size is 932 TiB (1024 TB).
  • Intended to receive large-sized files, the block size is 4 MiB (4.2 MB) (command stat -f $WORKDIR).
  • The WORKDIR is a GPFS disk space for which the bandwidth (about 50 GiB/s in read/write) is shared between the Adapp, Ada and Turing machines. It can occasionally be saturated because of exceptionally intensive usage.

Usage recommendations:

  • Because the WORKDIR is not backed up, the files are not protected from the risk of accidental manual destruction (rm) or a disk failure. Therefore, it is necessary to regularly save the sensitive or important files in the Ergon archive server.

Attention :

  • Since batch jobs can run in the WORKDIR, the files are directly accessible in read/write (permanent space) and do not need to be explicitly copied. However, because several of your jobs can be run at the same time, you must create a unique execution directory for each of your jobs. In addition, the disk space is submitted to group quotas and your job execution can suddenly stop if the quotas are reached. Therefore, you must be aware of both your own activity in this disk space and that of your colleagues. For these reasons, you may prefer running your batch jobs in the TMPDIR.

The TMPDIR

$TMPDIR : An execution directory for batch jobs. The following are the characteristics of the TMPDIR:

  • TMPDIR is a temporary directory.
  • It is only accessible in batch jobs by using the $TMPDIR variable.
  • It is automatically created when a batch job begins and is, therefore, unique to each batch job.
  • It is automatically destroyed at the end of this job: You must, therefore, copy the important files on a permanent disk space (WORKDIR, for example) before the end of the job.
  • TMPDIR is not submitted to group quotas, as is HOME or WORKDIR. However, some security quotas are put in place to avoid the situation where a user could unintentionally completely fill up all of the disk space because of an accidental usage error.
  • Total TMPDIR size is 466 TiB (512,3 TB).
  • Intended to receive large-sized files, the block size is identical to that of the WORKDIR: 4 MiB (4,2 MB) (command stat -f $TMPDIR only works in batch).

Usage recommendations:

Examples of batch jobs using the TMPDIR can be found in the “Code execution/control” documentation (Ada here and Turing here). General advice for using the TMPDIR:

  • For each execution, we assume that the input files necessary for the execution (restart or executable) have previously been stored on a permanent file system (HOME ou WORKDIR). If this is not the case, the first step is to use the archive class and the principle of multi-step jobs (example on Ada here and on Turing here) to copy the necessary files into the WORKDIR by using the command mfget.
  • At the beginning of each batch job execution, we advise you to be in the TMPDIR.
  • Copy the necessary files from WORKDIR into TMPDIR using the command cp.
  • Launch the execution in the TMPDIR.
  • Before the batch job finishes, as a last step, you must:
    • Backup save the significant files (if they are used regularly or might be post-processed) in a permanent file system (HOME or WORKDIR) by using the command cp.
    • Archive files for a longer time period by saving them on the Ergon archive server, using the command mfput in the archive class of your batch job (example for Ada here or for Turing here).

Comments:

  • As the performance in read/write is identical for WORKDIR and TMPDIR, you can avoid making copies between these two directories if the code is able to read or write the files directly into the HOME or the WORKDIR;
  • TMPDIR, like WORKDIR, is a GPFS disk space whose bandwidth (about 50 GiB/s in read/write) is shared by the Adapp, Ada and Turing machines. In consequence, the input/output performance can vary as it can be slowed down in the case of exceptionally high-volume usage.

Specific to Ada and Adapp: Visibility in interactive

The created TMPDIR can be either local or shared, depending on the characteristics of the batch job:

  • If local, at the execution node of Ada or Adapp, it is not visible in interactive from the connection nodes. This is generally the case in sequential or in OpenMP jobs.
  • If it is shared by all the nodes of Ada end Adapp, it is visible in interactive from the connection nodes. This is the case for multi-node and multi-step jobs. To find the name of this directory:
    1. Put the command set -x (in bash) at the beginning of your job to view the echo of the commands entered and then use the $TMPDIR variable.
    2. Consult the job output during the execution (by using the commands cat, less, or tail, etc., on the file name indicated in the LoadLeveler directive #@ output, which is generally written in your submission directory). In the job output, you will find the value of the $TMPDIR variable and its real location.

Here is an example of a batch job which follows the above procedure:

exemple
  # Directives Loadleveler
  #@ output = <stdout>
  ...
  #@ queue
 
  # Script bash
  set -x
  cd $TMPDIR
  pwd -P      # real PATH of the TMPDIR
  ...

The ARCHIVE space (from Adapp only)

The ARCHIVE environment variable on Adapp is a link to the HOME of the Ergon archive server. This space is directly accessible in read/write from Adapp, the pre- and post-processing machine, through the intermediary of a GPFS mount.

$ ls -l $ARCHIVE

It is only accessible from the Adapp nodes (interactive or batch). It is not accessible from the Ada or Turing nodes.

HOME and WORKDIR: Data security

To improve the protection of your data stored in your HOME and WORKDIR spaces, we recommend that you respect the security policy put in place at IDRIS.

Summary of disk space characteristics

Note: The last column, entitled ARCHIVE, only concerns Adapp and Ergon.

HOME WORKDIR TMPDIR ARCHIVE
Life span permanent permanent duration of batch job permanent
Shared spaces common to Adapp and Ada (HOME is local on Turing) common to Adapp, Ada and Turing $ARCHIVE on Adapp, $HOME on Ergon
Backup saved yes no no no
Automatic deletion no no yes, at end of batch job no
Access in interactive yes yes no ($TMPDIR is not defined) yes from Adapp
Access in batch yes yes yes yes from Adapp
Block size 256 KiB (262 KB) 4 Mib (4.2 MB) 4 MiB (4.2 MB) 16 MiB (16.8 MB)
Group quotas quota_u : 5 GiB (5.4 GB) by default quota_u -w 1 TiB (1,1 To) by default none quota_u : 1 TiB (1.1 TB) on Ergon