Skip to main content

Flash Info No 2024-21

IDRIS
IDRIS
Computing center
⚠ INFORMATION
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.

Migration of WORK and HOME

[English version below]

Hello,

As previously announced, the arrival of the H100 extension comes with a renewal of the Jean Zay storage spaces with the installation of a new Lustre storage system that will offer increased storage capacity and improved bandwidth.

The migration of the HOME spaces was completed during this morning's maintenance (July 30th, 2024). We invite you to check your scripts to correct any hard-coded paths. Any path of the form "/gpfs7kw/linkhome/..." should become "/linkhome/..." or, if possible, be replaced by the use of the $HOME environment variable.

The migration of the WORK spaces started today. This operation is also handled by the IDRIS teams, so you do not have any specific actions to perform. This operation will be done in batches to avoid a long downtime of the machine. However, it will require suspending the "qos_cpu-t4" and "qos_gpu-t4" QoS, which allow running jobs longer than 20 hours.

For a specific project, the migration process will be as follows:

  • 20 hours before the migration begins, jobs using the project's computing hours will no longer be able to start to avoid jobs trying to access the WORK space during the operation (they will then appear with the status "AssocGrpJobsLimit")
  • just before the migration starts, the project's WORK space will become completely unavailable, including from the login nodes
  • once the migration is completed, the environment variables will be modified to point to the new WORK spaces on the Lustre storage system, and the pending jobs will be able to run again.

Warning: If you have jobs that use the computing hours of one project but access the WORK space of another project, they may fail because we will not be able to block their start appropriately.

A command "idr_migstatus" allows you to monitor the migration of your projects by indicating the current status of each:

  • "pending": the migration has not yet been performed, your jobs can still run, and you have access to your WORK
  • "planned": the migration will start in the next 24 hours, new jobs can no longer start, but you still have access to your WORK
  • "in progress": the migration is in progress, you no longer have access to your WORK
  • "migrated": the migration is completed, you have access to your WORK again, and your jobs can run again.

Note: The absolute path of the WORK spaces will change with the migration, but to simplify the transition, links will be set up so that the old absolute paths remain functional, at least initially. Once the migration is completed, we still invite you to modify any paths of the form "/gpfswork/..." or "/gpfsdswork/projects/..." that may appear in your scripts (if possible by replacing them with the use of the environment variable) or in your symbolic links.

We apologise for any inconvenience these operations may cause.

Best regards, The IDRIS support team


Dear Jean Zay user,

As previously announced, the installation on the new H100 extension comes with a renewal of the Jean Zay storage spaces with the installation of a new Lustre storage system offering an increased storage capacity and an improved bandwidth.

The migration of the HOME spaces is completed since today's maintenance operation (July 30th, 2024). We invite you to check your scripts in order to correct any hard-coded paths. Any path starting with "/gpfs7kw/linkhome/..." should become "/linkhome/..." or, if possible, the $HOME environment variable should be used instead.

The migration of the WORK spaces started today. This operation is also handled by the IDRIS teams so you do not have any specific actions to perform. The migration will be done by batch of projects to avoid having a long downtime of the machine. It will however require suspending the "qos_cpu-t4" and "qos_gpu-t4" QoS which allow running jobs of more than 20h.

For a specific project, the migration process will be as follow:

  • 20h before the migration begins, the jobs using computing hours allocated to that project will be held in queue (with the "AssocGrpJobsLimit" status) in order to avoid having jobs that use the WORK during the migration operation
  • just before the migration starts, the WORK space will become completely unavailable, including from the login nodes
  • once the migration is done, the environment variables will be modified to point to the new Lustre WORK space and your jobs will be able to run again.

Warning: If you have jobs that use the computing hours from a project but access the WORK disk spaces of another project, they might fail because we have no way to prevent them from starting when they should not.

The "idr_migstatus" command allows to monitor the migration of your projects by indicating the current status of each of them:

  • "pending" : the migration has not started yet, there is no impact on your jobs and you can access your WORK
  • "planned" : the migration is going to start in the next 20h, jobs that are not yet running will stay pending but you can still access your WORK
  • "in progress" : the migration is in progress, you will not have access to your WORK at this point
  • "migrated" : the migration is done, you can access your WORK again and your jobs can run.

Note: The absolute paths of the WORK spaces will be modified by the migration. However to ease the transition, symbolic links will be created in order to keep the old absolute paths working, at least for some times. Once the migration is completed, we do invite you to modify any absolute paths starting with "/gpfswork/..." or "/gpfsdswork/projects/..." that could appear in your scripts (use the environment variables whenever possible) or in your symbolic links.

We are sorry for the inconvenience those operations might cause.

Best regards, The IDRIS support team

Your opinion matters!

To give your feedback, report an error, or suggest an improvement, click here:

quick anonymous questionnaire

This questionnaire is temporary and will take less than a minute, so take the opportunity!