Skip to main content

Flash Info No 2024-22

IDRIS
IDRIS
Computing center
⚠ INFORMATION
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.

Dear Jean Zay user,

As part of the renewal of the storage spaces accompanying the H100 extension of Jean Zay, the current SCRATCH disk space will be permanently shutdown on Tuesday, September 3rd, 2024.

As previously announced, there will be no automatic copy of the current SCRATCH space to the new one.

To anticipate this shutdown, we invite you to migrate as soon as possible the data you wish to keep to the new space already accessible through the $NEWSCRATCH (or $ALL_CCFRNEWSCRATCH for the common project spaces) and to use that disk space in your jobs by modifying your submission scripts accordingly.

The data transfers can be done using one or more jobs on the "archive" partition, or interactively from the login nodes if the volume to be transferred is limited: http://www.idris.fr/eng/jean-zay/modifications-extension-jean-zay-h100-eng.html#scratch_et_all_scratch_copies.

In order to avoid confusions between the different disk spaces and facilitate the monitoring of the migration operation, we suggest that you remove any data that you will not need anymore or that you have already successfully migrated.

Please be aware that the new SCRATCH space is also subject to the 30 day deletion policy of the unused data.

Best regards, The IDRIS support team

Flash Info No 2024-21

IDRIS
IDRIS
Computing center
⚠ INFORMATION
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.

Migration of WORK and HOME

[English version below]

Hello,

As previously announced, the arrival of the H100 extension comes with a renewal of the Jean Zay storage spaces with the installation of a new Lustre storage system that will offer increased storage capacity and improved bandwidth.

The migration of the HOME spaces was completed during this morning's maintenance (July 30th, 2024). We invite you to check your scripts to correct any hard-coded paths. Any path of the form "/gpfs7kw/linkhome/..." should become "/linkhome/..." or, if possible, be replaced by the use of the $HOME environment variable.

The migration of the WORK spaces started today. This operation is also handled by the IDRIS teams, so you do not have any specific actions to perform. This operation will be done in batches to avoid a long downtime of the machine. However, it will require suspending the "qos_cpu-t4" and "qos_gpu-t4" QoS, which allow running jobs longer than 20 hours.

For a specific project, the migration process will be as follows:

  • 20 hours before the migration begins, jobs using the project's computing hours will no longer be able to start to avoid jobs trying to access the WORK space during the operation (they will then appear with the status "AssocGrpJobsLimit")
  • just before the migration starts, the project's WORK space will become completely unavailable, including from the login nodes
  • once the migration is completed, the environment variables will be modified to point to the new WORK spaces on the Lustre storage system, and the pending jobs will be able to run again.

Warning: If you have jobs that use the computing hours of one project but access the WORK space of another project, they may fail because we will not be able to block their start appropriately.

A command "idr_migstatus" allows you to monitor the migration of your projects by indicating the current status of each:

  • "pending": the migration has not yet been performed, your jobs can still run, and you have access to your WORK
  • "planned": the migration will start in the next 24 hours, new jobs can no longer start, but you still have access to your WORK
  • "in progress": the migration is in progress, you no longer have access to your WORK
  • "migrated": the migration is completed, you have access to your WORK again, and your jobs can run again.

Note: The absolute path of the WORK spaces will change with the migration, but to simplify the transition, links will be set up so that the old absolute paths remain functional, at least initially. Once the migration is completed, we still invite you to modify any paths of the form "/gpfswork/..." or "/gpfsdswork/projects/..." that may appear in your scripts (if possible by replacing them with the use of the environment variable) or in your symbolic links.

We apologise for any inconvenience these operations may cause.

Best regards, The IDRIS support team


Dear Jean Zay user,

As previously announced, the installation on the new H100 extension comes with a renewal of the Jean Zay storage spaces with the installation of a new Lustre storage system offering an increased storage capacity and an improved bandwidth.

The migration of the HOME spaces is completed since today's maintenance operation (July 30th, 2024). We invite you to check your scripts in order to correct any hard-coded paths. Any path starting with "/gpfs7kw/linkhome/..." should become "/linkhome/..." or, if possible, the $HOME environment variable should be used instead.

The migration of the WORK spaces started today. This operation is also handled by the IDRIS teams so you do not have any specific actions to perform. The migration will be done by batch of projects to avoid having a long downtime of the machine. It will however require suspending the "qos_cpu-t4" and "qos_gpu-t4" QoS which allow running jobs of more than 20h.

For a specific project, the migration process will be as follow:

  • 20h before the migration begins, the jobs using computing hours allocated to that project will be held in queue (with the "AssocGrpJobsLimit" status) in order to avoid having jobs that use the WORK during the migration operation
  • just before the migration starts, the WORK space will become completely unavailable, including from the login nodes
  • once the migration is done, the environment variables will be modified to point to the new Lustre WORK space and your jobs will be able to run again.

Warning: If you have jobs that use the computing hours from a project but access the WORK disk spaces of another project, they might fail because we have no way to prevent them from starting when they should not.

The "idr_migstatus" command allows to monitor the migration of your projects by indicating the current status of each of them:

  • "pending" : the migration has not started yet, there is no impact on your jobs and you can access your WORK
  • "planned" : the migration is going to start in the next 20h, jobs that are not yet running will stay pending but you can still access your WORK
  • "in progress" : the migration is in progress, you will not have access to your WORK at this point
  • "migrated" : the migration is done, you can access your WORK again and your jobs can run.

Note: The absolute paths of the WORK spaces will be modified by the migration. However to ease the transition, symbolic links will be created in order to keep the old absolute paths working, at least for some times. Once the migration is completed, we do invite you to modify any absolute paths starting with "/gpfswork/..." or "/gpfsdswork/projects/..." that could appear in your scripts (use the environment variables whenever possible) or in your symbolic links.

We are sorry for the inconvenience those operations might cause.

Best regards, The IDRIS support team

Flash Info No 2024-20

IDRIS
IDRIS
Computing center
⚠ INFORMATION
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.

SCRATCH Migration Procedure

[English version below]

Hello,

As previously announced, the arrival of the H100 extension comes with a renewal of the Jean Zay storage spaces with the installation of a new Lustre storage system offering increased storage capacity and improved bandwidth.

To anticipate the shutdown of the current SCRATCH space, we invite you to start using the new space accessible via the environment variable $NEWSCRATCH (or $ALL_CCFRNEWSCRATCH for the common project space) by modifying your submission scripts accordingly. A subsequent communication will specify the date of this shutdown with at least 15 days' notice.

There will be no automatic copying of the current SCRATCH content to the new space. Therefore, if you wish to keep certain data currently stored on the SCRATCH, make sure to copy it to your new space $NEWSCRATCH (for example, via one or more jobs on the "archive" partition or interactively from the login nodes if the volume to be transferred is limited) and delete data you no longer need.

Please note that this new SCRATCH space is also subject to the automatic deletion of data not used for 30 days.

For more information on the ongoing operations on Jean Zay as part of the H100 extension installation: http://www.idris.fr/jean-zay/modifications-extension-jean-zay-h100.html.

Best regards, The IDRIS support team


Dear Jean Zay user,

As previously announced, the installation on the new H100 extension comes with a renewal of the Jean Zay storage spaces with the installation of a new Lustre storage system offering an increased storage capacity and an improved bandwidth.

To anticipate the shut down of the current SCRATCH space, we invite you as of now to start using the new SCRATCH space already accessible through the $NEWSCRATCH (or $ALL_CCFRNEWSCRATCH for the common project spaces) by modifying accordingly your submission scripts. The actual shut down date of the current SCRATCH will be announced later with at least a 15 day notice.

There will be no automatic copy of the current SCRATCH space to the new one. Therefore, if you wish to keep some data currently stored on the SCRATCH, be sure to copy it to your new space $NEWSCRATCH (for instance through one or more jobs on the "archive" partition, or interactively from the login nodes if the volume to be transferred is limited) and to remove data you do not need anymore.

Please be aware that the new SCRATCH space is also subject to the 30 day deletion policy of the unused data.

For more information on the work in progress on Jean Zay for the H100 extension: http://www.idris.fr/eng/jean-zay/modifications-extension-jean-zay-h100-eng.html.

Best regards, The IDRIS support team

Flash Info No 2024-19

IDRIS
IDRIS
Computing center
⚠ INFORMATION
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.

Extension of STORE Unavailability

[English version below]

Hello,

Due to technical difficulties during the migration of the STORE space (see our previous communication for more details on the migration: http://www.idris.fr/flash-info-idris/flash-info-de-l-idris-167.html), the unavailability of this disk space must unfortunately be extended. Access to the STORE should be restored by the end of Thursday, 25 July. We invite you to consult the "Machine availability" page of our website (http://www.idris.fr/status.html) and the message of the day displayed when logging in to Jean Zay for the most up-to-date information.

As a reminder, it is still possible to access the old STORE space in read-only mode using the "$OLDSTORE" environment variable.

We apologise for any inconvenience caused.

Best regards, The IDRIS support team


Dear Jean Zay users,

Due to technical difficulties encountered during the migration of the STORE space (cf. our last email for more details regarding the migration: http://www.idris.fr/flash-info-idris/flash-info-de-l-idris-167.html), this disk space will unfortunately be unavailable for a longer period than expected. Access to the STORE should be possible again by the end of Thursday July 25th. You can check the "Machine availability" webpage (http://www.idris.fr/status.html) and the message of the day displayed when logging in on Jean Zay for the latest updates.

Please note that it remains possible to have read-only access to the old STORE using the "$OLDSTORE" environment variable.

We are sorry for the inconvenience this may cause.

Best regards, The IDRIS support team

Flash Info No 2024-18

IDRIS
IDRIS
Computing center
⚠ INFORMATION
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.

[TOOL_CALLS]Migration of the STORE on 22 and 23 July

[English version below]

Hello,

The STORE disk space will be completely unavailable on Monday 22 and Tuesday 23 July to migrate it to the new Lustre storage system installed as part of the Jean Zay H100 extension and the expansion of the WORK space. Therefore, please ensure that you do not submit any jobs using the STORE during this period (other jobs will continue to run normally).

As announced on 18 June, there will be a change in the STORE access policy on this occasion. It will no longer be possible to access this space from the compute nodes. Access to the STORE will be restored on Tuesday 23 July in the evening ONLY on the login nodes and on the "prepost", "visu", "compil" and "archive" partitions.

Until the end of August, it will still be possible to access the old STORE in read-only mode using the "$OLDSTORE" environment variable. This access method should be preferred in the weeks following the migration operation. Indeed, the data of the new STORE will then be available only on magnetic tapes, which could significantly slow down data access (up to several hours), while the rotating disk cache is repopulated with the most recently used data. Note that this new environment variable is already set so that you can anticipate the modification of your scripts.

As a reminder, the STORE is a space dedicated to the secure and long-term storage of archived data. Currently, there is redundancy of all data, stored both on rotating disks and magnetic tapes. The presence of data on rotating disks allows for relatively fast read/write access. In the future, only the most recently used data will be available on the rotating disk cache (still with a security copy on magnetic tapes). The rest of the data will be stored only on magnetic tapes (with two copies on different tapes to ensure data security) with much longer access times, incompatible with direct use from computations.

We invite you to modify your submission scripts if you access the STORE space directly from the compute nodes. To guide you, examples have been added at the end of our documentation on multi-step jobs: http://www.idris.fr/jean-zay/cpu/jean-zay-cpu-exec_cascade.html

Best regards, The IDRIS support team


Dear Jean Zay users,

The STORE disk space will be totally unavailable on Monday July 22nd and Tuesday July 23rd in order to migrate its data onto the new Lustre storage system installed in the framework of the Jean Zay H100 extension and the enlargement of the WORK disk space. Please make sure you do not submit any jobs using the STORE during this time (other jobs will continue to run normally).

As announced on June 18th, a change in the STORE access policy will take place after the migration, in that it won't be possible anymore to access this disk space from compute nodes. In turn, access to the STORE disk space will be again possible starting from Tuesday July 23rd evening, but only from login nodes and from the "prepost", "visu", "compil" and "archive" partitions.

Until the end of August, it will remain possible to access the old STORE using the "$OLDSTORE" environment variable. This way to access your archived data will be recommended in the weeks following the migration. Indeed, the data on the new STORE will first be available only on magnetic tape (with long access times, possibly up to several hours), while the rotating disk cache will be repopulated with the most recently used data. Note that this new environment variable is already defined so that you can modify your scripts in advance.

As a reminder, the STORE is a disk space dedicated to long term secured storage of archived data. In the current system, all the data is redundantly stored on rotating disks ("cache") and magnetic tapes, and its availability on rotating disks enables a relatively fast read/write access time. In the future, only the most recently used data will be available on the rotating disk cache, with a security copy on the magnetic tapes. The remainder of the data will be stored only on magnetic tapes (with a double copy on different tapes to guarantee its security) with a much longer access time, incompatible with a direct usage from compute nodes.

We invite you to change your submission scripts if you currently access the STORE space directly from the compute nodes. In order to help you, several examples have been added at the end of the multi-step jobs documentation: http://www.idris.fr/eng/jean-zay/cpu/jean-zay-cpu-exec_cascade-eng.html.

Best regards, The IDRIS support team

Flash Info No 2024-17

IDRIS
IDRIS
Computing center
⚠ INFORMATION
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.
  • Summary:

  • Panoram'IA: Join us on Friday 5 July at 10 am

  • Reminder: Changes to STORE access

  • IDRIS is recruiting for the 2024 external competitions

  • IDRIS Training

  • Panoram'IA: Join us on Friday 5 July at 10 am

IDRIS support invites you to "Panoram'IA" on Friday morning, 5 July at 10 am: the monthly live video magazine covering scientific and technical AI news. This session will include: AI news, Jean Zay H100 extension, feedback on CVPR2024 and our selection of papers with Papers Storm. The live stream and replays are available on our YouTube channel "Un oeil sur l'IDRIS": https://www.youtube.com/@idriscnrs.

  • Reminder: Changes to STORE access

As announced in the Flash Info of 18 June 2024, access to the STORE from compute nodes will no longer be possible from Tuesday 9 July 2024: http://www.idris.fr/flash-info-idris/flash-info-de-l-idris-165.html.

  • IDRIS is recruiting for the 2024 external competitions

As part of the 2024 CNRS external competitions, IDRIS is recruiting for the following positions:

  • Competition No. 65: 1 Scientific Computing Expert, shared position with Maison de la simulation (IR)
  • Competition No. 67: 1 Scientific Computing Expert (IR)
  • Competition No. 221: 1 Infrastructure Manager (AI)
  • Competition No. 235: 1 Administrative Assistant (AI).

For more information: http://www.idris.fr/annonces/idris-recrute.html.

  • IDRIS Training

Register now for the IDRIS training sessions scheduled for the new academic year:

  • Hybrid MPI/OpenMP Programming, 9 and 10 September
  • MPI, 24 to 27 September
  • Jean Zay Workshop, 3 and 4 October
  • OpenMP, 16 to 18 October
  • Optimised Deep Learning on Jean Zay, 22 to 25 October.

For more information on the 2024 IDRIS training catalogue and registration procedures: http://www.idris.fr/formations/catalogue.html.


Changes and impacts related to the Jean Zay H100 extension

IDRIS
IDRIS
Computing center
⚠ INFORMATION
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.

This post provides an overview of the ongoing operations for the commissioning of the Jean Zay H100 extension. The information provided here will evolve over time, and we invite you to check back regularly.

Flash Info No 2024-16

IDRIS
IDRIS
Computing center
⚠ INFORMATION
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.

Modification of STORE access terms

[English version below]

Hello,

With the arrival of the H100 extension, a renewal of the storage spaces is also planned for the summer. The storage volume (in bytes) of WORK spaces will be increased on this occasion. In return, access to the STORE will have to be adapted. In particular, read and write access to the STORE will no longer be possible from the compute nodes. It will be maintained on the login nodes and on the "prepost", "visu", "compil" and "archive" partitions.

This change will take effect after the maintenance scheduled for Tuesday, July 9, 2024. We invite you to modify your jobs now if you access the STORE space directly from the compute nodes. To guide you, examples have been added at the end of our documentation on multi-step jobs: http://www.idris.fr/eng/jean-zay/cpu/jean-zay-cpu-exec_cascade-eng.html

As a reminder, the STORE is a space dedicated to the long-term storage of archived data. For now, the STORE provides redundant data storage on both rotating disks and magnetic tapes. The presence of data on rotating disks means that it can be read and written relatively quickly. In the future, data may only be stored on magnetic tape (with two copies on different tapes to maintain data redundancy), which would severely degrade the performance of your calculations when accessing the STORE directly.

Please do not hesitate to contact IDRIS support (assist@idris.fr) if you need any assistance.

Best regards,

The IDRIS support team


Hello,

With the arrival of the H100 extension, a renewal of the storage spaces is also planned for the summer. The storage volume (in bytes) of WORK spaces will be increased on this occasion. In return, access to the STORE will have to be adapted. In particular, read and write access to the STORE will no longer be possible from the compute nodes. It will be maintained on the login nodes and on the "prepost", "visu", "compil" and "archive" partitions.

This change will take effect after the maintenance scheduled for Tuesday, July 9, 2024. We invite you to modify your jobs now if you access the STORE space directly from the compute nodes. To guide you, examples have been added at the end of our documentation on multi-step jobs: http://www.idris.fr/eng/jean-zay/cpu/jean-zay-cpu-exec_cascade-eng.html

As a reminder, the STORE is a space dedicated to the long-term storage of archived data. For now, the STORE provides redundant data storage on both rotating disks and magnetic tapes. The presence of data on rotating disks means that it can be read and written relatively quickly. In the future, data may only be stored on magnetic tape (with two copies on different tapes to maintain data redundancy), which would severely degrade the performance of your calculations when accessing the STORE directly.

Please do not hesitate to contact IDRIS support (assist@idris.fr) if you need any assistance.

Best regards,

The IDRIS support team

Flash Info No 2024-15

IDRIS
IDRIS
Computing center
⚠ INFORMATION
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.

Correctif : Arrêt du 19 juin 8h au 21 juin midi et retour en Red Hat 8.6

[English version below]

Bonjour,

Contrairement à ce que nous avions annoncé vendredi, des contraintes techniques supplémentaires nous empêchent finalement de maintenir l'accès à la partition A100 pendant l'arrêt de la machine prévu du mercredi 19 juin 8h au vendredi 21 juin midi.

Les frontales, les nœuds de pré/post-traitement et compilation ainsi que les espaces de stockage resteront accessibles du mercredi 19 juin vers 12h au vendredi 21 juin 8h.

Par ailleurs le retour en arrière vers la version 8.6 du système d'exploitation Red Hat de Jean Zay ne sera peut-être pas nécessaire et ne sera en tout cas pas réalisé à l'occasion de cet arrêt.

Nous nous excusons pour la gêne occasionnée.

Bien cordialement, L'équipe assistance de l'IDRIS


Hello,

Contrary to what we announced on Friday, extra technical constraints finally prevent us from maintaining access to the A100 partition during the machine shutdown scheduled from Wednesday, June 19 8am to Friday, June 21 at noon.

Login nodes, pre/post and compilation nodes, as well as storage spaces will remain accessible from Wednesday around noon to Friday 8 am.

Besides, the roll-back to the 8.6 version of the Red Hat Jean Zay operating system might not be necessary and will, in any case, not be performed during this shutdown.

We apologize for the inconvenience.

Best regards, The IDRIS user support team

Flash Info No 2024-14

IDRIS
IDRIS
Computing center
⚠ INFORMATION
This page was translated by an AI (LLM) with a cursory human check and is awaiting full review.

Jean Zay will be unavailable from Wednesday, June 19th at 8am to Friday, June 21st at noon in order to perform some operations on the infrastructure of IDRIS required to finalize the installation of the H100 extension of Jean Zay.

The login nodes, A100 nodes, pre/post-processing nodes and storage spaces will nevertheless be usable from Wednesday, June 19th around 2pm to Friday, June 21st at 8am. The jobs targeting those nodes will be executed as long as their maximum elapsed time is short enough.

Additionally, during that downtime, the operating system of the Jean Zay supercomputer will be downgraded to Red Hat 8.6 due to some stability issues affecting the accesses to the filesystems since the upgrade to Red Hat 9.2. This downgrade will take effect on Wednesday afternoon.

This operating system version change should have a limited impact on your use of the machine. Nevertheless, we recommend that you recompile your codes once after the update, even if the majority of them should continue to work without any action on your part.

We are sorry for the inconvenience caused by those operations. In case of problems, contact the IDRIS Support.

Best regards, The IDRIS user support team

Your opinion matters!

To give your feedback, report an error, or suggest an improvement, click here:

quick anonymous questionnaire

This questionnaire is temporary and will take less than a minute, so take the opportunity!