Repliement de protéines sur Jean Zay

L’IDRIS propose plusieurs logiciels de repliement de protéines.

Conseils

L’utilisation d’Alphafold et Colabfold se fait en deux étapes :

  • Alignement de séquences multiples
  • Repliement de la protéine

L’alignement de séquences est assez long et n’est pas porté sur GPU. Il est préférable de le faire en dehors de la réservation GPU pour ne pas gâcher des heures de calculs.

Une possibilité est de le faire sur la partition pre-post, puis d’utiliser les résultats pour la phase de repliement.

Alphafold

Liens utiles

Versions disponibles

Version Sur V100 Sur A100 Sur H100
alphafold/2.1.2
alphafold/2.2.4
alphafold/2.3.1
alphafold/2.3.2
alphafold/3.0.0
alphafold/3.0.1

Exemple de script de soumission

Alphafold 3

alphafold-3-H100.slurm
#!/usr/bin/env bash
#SBATCH --account=<account>@h100
 
#SBATCH --job-name=Alphafold3
#SBATCH --output=%x.%j
#SBATCH --error=%x.%j
 
## For inference
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=24
#SBATCH --gpus-per-node=1
#SBATCH --constraint=h100
 
# Load the modules
module purge
module load arch/h100
module load alphafold/3.0.1
 
# Define the path needed to run
## where are stored the weights of Alphafold3 (directory containing af3.bin)
weight_dir=${WORK}/Alphafold3/weights
# Print the environment (mostly for debug purpose)
env
 
python $(which run_alphafold.py) \
    --json_path=data.json \
    --model_dir=${weight_dir} \
    --output_dir=output_dir \
    --db_dir=${DB_DIR} \
    --run_data_pipeline=false \
    --run_inference=true

Alphafold 2.3.1

Monomer A100
alphafold-2.3.1-A100.slurm
#!/usr/bin/env bash
#SBATCH --nodes=1            # Number of nodes
#SBATCH --ntasks-per-node=1  # Number of tasks per node
#SBATCH --cpus-per-task=8    # Number of OpenMP threads per task
#SBATCH --gpus-per-node=1    # Number of GPUs per node
#SBATCH -C a100              # Use A100 partition
#SBATCH --hint=nomultithread # Disable hyperthreading
#SBATCH --job-name=alphafold # Jobname
#SBATCH --output=%x.o%j      # Output file %x is the jobname, %j the jobid
#SBATCH --error=%x.o%j       # Error file
#SBATCH --time=10:00:00      # Expected runtime HH:MM:SS (max 20h)
##
## Please, refer to comments below for 
## more information about these 3 last options.
##SBATCH --account=<account>@a100      # To specify cpu accounting: <account> = echo $IDRPROJ
##SBATCH --partition=<partition>       # To specify partition (see IDRIS web site for more info)
##SBATCH --qos=qos_gpu-dev             # Uncomment for job requiring less than 2 hours                                                                                                                                                               
 
module purge
module load cpuarch/amd
module load alphafold/2.3.1
export TMP=$JOBSCRATCH
export TMPDIR=$JOBSCRATCH
 
fafile=test.fa
 
python3 $(which run_alphafold.py) \
    --output_dir=outputs_${fafile} \
    --uniref90_database_path=${ALPHAFOLDDB}/uniref90/uniref90.fasta \
    --mgnify_database_path=${ALPHAFOLDDB}/mgnify/mgy_clusters_2022_05.fa \
    --template_mmcif_dir=${ALPHAFOLDDB}/pdb_mmcif \
    --obsolete_pdbs_path=${ALPHAFOLDDB}/pdb_mmcif/obsolete.dat \
    --bfd_database_path=${ALPHAFOLDDB}/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --pdb70_database_path=${ALPHAFOLDDB}/pdb70/pdb70 \
    --uniref30_database_path=${ALPHAFOLDDB}/uniref30/UniRef30_2021_03 \
    --use_gpu_relax \
    --model_preset=monomer \
    --fasta_paths=${fafile} \
    --max_template_date=2022-01-01 \
    --data_dir=${ALPHAFOLDDB}/model_parameters/2.3.1

Pour un monomer

alphafold.slurm
#!/usr/bin/env bash
#SBATCH --nodes=1            # Number of nodes
#SBATCH --ntasks-per-node=1  # Number of tasks per node
#SBATCH --cpus-per-task=10   # Number of OpenMP threads per task
#SBATCH --gpus-per-node=1    # Number of GPUs per node
#SBATCH --hint=nomultithread # Disable hyperthreading
#SBATCH --job-name=alphafold # Jobname
#SBATCH --output=%x.o%j      # Output file %x is the jobname, %j the jobid
#SBATCH --error=%x.o%j       # Error file
#SBATCH --time=10:00:00      # Expected runtime HH:MM:SS (max 100h)
##
## Please, refer to comments below for 
## more information about these 4 last options.
##SBATCH --account=<account>@v100       # To specify cpu accounting: <account> = echo $IDRPROJ
##SBATCH --partition=<partition>        # To specify partition (see IDRIS web site for more info)
##SBATCH --qos=qos_gpu-dev              # Uncomment for job requiring less than 2 hours
##SBATCH --qos=qos_gpu-t4               # Uncomment for job requiring more than 20h (max 16 GPUs)
 
module purge
module load alphafold/2.2.4
export TMP=$JOBSCRATCH
export TMPDIR=$JOBSCRATCH
 
## In this example we do not let the structures relax with OpenMM
 
python3 $(which run_alphafold.py) \
    --output_dir=outputs \
    --uniref90_database_path=${DSDIR}/AlphaFold/uniref90/uniref90.fasta \
    --mgnify_database_path=${DSDIR}/AlphaFold/mgnify/mgy_clusters_2018_12.fa \
    --template_mmcif_dir=${DSDIR}/AlphaFold/pdb_mmcif/mmcif_files \
    --obsolete_pdbs_path=${DSDIR}/AlphaFold/pdb_mmcif/obsolete.dat \
    --bfd_database_path=${DSDIR}/AlphaFold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --uniclust30_database_path=${DSDIR}/AlphaFold/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
    --pdb70_database_path=${DSDIR}/AlphaFold/pdb70/pdb70 \
    --fasta_paths=test.fa \
    --max_template_date=2021-07-28 \
    --use_gpu_relax=False \
    --norun_relax \
    --data_dir=${DSDIR}/AlphaFold/model_parameters/2.2.4

Pour un multimer

Attention le fichier fasta doit contenir les différents monomers.

alphafold_multimer.slurm
#!/usr/bin/env bash
#SBATCH --nodes=1            # Number of nodes
#SBATCH --ntasks-per-node=1  # Number of tasks per node
#SBATCH --cpus-per-task=10   # Number of OpenMP threads per task
#SBATCH --gpus-per-node=1    # Number of GPUs per node
#SBATCH --hint=nomultithread # Disable hyperthreading
#SBATCH --job-name=alphafold # Jobname
#SBATCH --output=%x.o%j      # Output file %x is the jobname, %j the jobid
#SBATCH --error=%x.o%j       # Error file
#SBATCH --time=10:00:00      # Expected runtime HH:MM:SS (max 100h for V100, 20h for A100)
##
## Please, refer to comments below for 
## more information about these 4 last options.
##SBATCH --account=<account>@v100       # To specify gpu accounting: <account> = echo $IDRPROJ
##SBATCH --partition=<partition>        # To specify partition (see IDRIS web site for more info)
##SBATCH --qos=qos_gpu-dev              # Uncomment for job requiring less than 2 hours
##SBATCH --qos=qos_gpu-t4               # Uncomment for job requiring more than 20h (max 16 GPUs, V100 only)
 
module purge
module load alphafold/2.2.4
export TMP=$JOBSCRATCH
export TMPDIR=$JOBSCRATCH
 
## In this example we let the structures relax with OpenMM
 
python3 $(which run_alphafold.py) \                                                                                                                                                                               
    --output_dir=outputs \
    --uniref90_database_path=${DSDIR}/AlphaFold/uniref90/uniref90.fasta \
    --mgnify_database_path=${DSDIR}/AlphaFold/mgnify/mgy_clusters_2018_12.fa \
    --template_mmcif_dir=${DSDIR}/AlphaFold/pdb_mmcif/mmcif_files \
    --obsolete_pdbs_path=${DSDIR}/AlphaFold/pdb_mmcif/obsolete.dat \
    --bfd_database_path=${DSDIR}/AlphaFold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --pdb_seqres_database_path=${DSDIR}/AlphaFold/pdb_seqres/pdb_seqres.txt \
    --uniclust30_database_path=${DSDIR}/AlphaFold/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
    --uniprot_database_path=${DSDIR}/AlphaFold/uniprot/uniprot.fasta \
    --use_gpu_relax \
    --model_preset=multimer \
    --fasta_paths=test.fasta \
    --max_template_date=2022-01-01 \
    --data_dir=${DSDIR}/AlphaFold/model_parameters/2.2.4

Colabfold

Liens utiles

Conseils pour la phase d’alignement

Le logiciel utilisé pour la phase d’alignement est MMSeqs.

colab_align.slurm
#!/usr/bin/env bash
#SBATCH --nodes=1                   # Number of nodes
#SBATCH --ntasks-per-node=1         # Number of tasks per node
#SBATCH --cpus-per-task=10          # Number of OpenMP threads per task
#SBATCH --hint=nomultithread        # Disable hyperthreading
#SBATCH --job-name=align_colabfold  # Jobname
#SBATCH --output=%x.o%j             # Output file %x is the jobname, %j the jobid
#SBATCH --error=%x.o%j              # Error file
#SBATCH --time=10:00:00             # Expected runtime HH:MM:SS (max 20h)
#SBATCH --partition=prepost  
 
DB=$DSDIR/ColabFold
 
input=test.fa
 
module purge
module load colabfold/1.5.1
colabfold_search ${input} ${DB} results

Exemple de script de soumission pour le repliement

colab_fold.slurm
#!/usr/bin/env bash 
#SBATCH --nodes=1 # Number of nodes 
#SBATCH --ntasks-per-node=1 # Number of tasks per node 
#SBATCH --cpus-per-task=10 # Number of OpenMP threads per task 
#SBATCH --hint=nomultithread # Deactivate hyperthreading
#SBATCH --gpus-per-node=1 # Number of GPUs per node
#SBATCH --time=20:00:00
#SBATCH --constraint=v100-32g # Use a GPU with 32G of memory. Remove this line if a 16G GPU fits your needs
#SBATCH --job-name=colabfold_folding
#SBATCH --output=%x.%j
#SBATCH --error=%x.%j
 
 
module purge 
module load colabfold/1.5.1
 
export TMP=$JOBSCRATCH 
export TMPDIR=$JOBSCRATCH
 
## This script works if you generated the results folder with colabfold_search results results
## We do not advice to perform the alignment in the same job as the folding.
## The results of the folding will be stored in results_batch.
 
colabfold_batch --data=$DSDIR/ColabFold results results_batch