Public Datasets Available on Savio¶
We make available some large public datasets used for certain workflows/software packages. These datasets are available on a read-only basis in subdirectories located at /global/scratch/collections
.
The datasets include:
Dataset | Directory | Version | Update/Download | Details |
---|---|---|---|---|
Blast | blastdb |
5.0 | monthly | various datasets including 'nr' and 'nt' |
ColabFold databases | colabdb |
1.5.2 | every 3 months | UniRef30, BFD/Mgnfiy, ColabFold DB |
genome/RefSeq | genomesdb |
Release 218 | yearly | fungi, invertebrate, plant, vertebrate_mammalian, vertebrate_other |
alphafold | alphafolddb |
2.3.0 | every 3 months | BFD, MGnify, PDB70, PDB PDB seqres.UniProt, UniRef30,UniProt, UniRef90 |
AlphaFold 3 | Alphafold3/public-db |
3.0.1 | every 3 months | BFD, MGnify, PDB70, PDB PDB seqres.UniProt, UniRef30,UniProt, UniRef90 |
For more details, please see the README
files in the database-specific subdirectories of global/scratch/collections
.
You can request additional datasets (or an update to existing datasets) through our Software/Data Request Form.
AlphaFold 3 on Savio¶
AlphaFold 3 is a new AI model developed by Google DeepMind and Isomorphic Labs for predicting the structure and interactions of molecules, including proteins, DNA, RNA, and small molecules. The software package and the public datasets are now available on the Savio cluster.
Genetic Datasets¶
The genetic datasets required for AlphaFold 3 are saved under the shared directory /global/scratch/collections/Alphafold3/public-db
on the Savio cluster.
Model Parameters¶
The model parameters are the result of training the AlphaFold model and are required for the AlphaFold 3 inference pipeline. The model parameters are distributed separately from the source code by Google DeepMind and are subject to the Model Parameters Terms of Use.
Savio users interested in using AlphaFold 3 are required to abide by the above terms of use. Savio users can request a personal copy of the trained model parameters for non-commercial use directly from Google DeepMind free of charge by filling out this Google form. If you have any questions about the fields of the form that are required to fill out, then you may send us an inquiry at brc-hpc-help@berkeley.edu. Once you receive a response and directions from Google DeepMind on obtaining the model parameters, you may save the parameters file in your home directory, a shared project group directory (if you are sharing with your project group members), or a personal scratch directory, inside a sub-directory named model_param
. The parameters file is a single file approximately 1GB in size.
Loading AlphaFold 3 module¶
module load bio/alphafold3/3.0.1
The bio/alphafold3
module defines various environment variables such as ALPHAFOLD_DIR
and DB_DIR
that can be used to run a job as shown below. Users will have to set up environment variables for MODEL_PARAMETERS_DIR
before running the script, or it can be set up directly in the SLURM job submission script as shown below.
Running¶
Below is a sample SLURM script for running alphafold3
on the savio4_gpu partition after loading the bio/alphafold3
module. It assumes the presence of the fold_input.json
file in the $HOME/af_input
directory and saves the output to $HOME/af_output
. Please take careful note of the different options, paths, and variables and make changes as necessary for your specific use case.
#!/bin/bash
#SBATCH --job-name=job_name
#SBATCH --account=account_name
#SBATCH --partition=savio4_gpu
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --gres=gpu:A5000:1
#SBATCH --time=1:30:00
module load bio/alphafold3/3.0.1
# export MODEL_PARAMETERS_DIR variable according to where they are saved
#If model parameters are saved in your home directory
export MODEL_PARAMETERS_DIR=/global/home/users/$USER/model_param
#If model parameters are saved in your shared project group directory
export MODEL_PARAMETERS_DIR=/global/home/groups/<project_name>/model_param
apptainer exec --nv --bind $HOME/af_input:/root/af_input \
--bind $HOME/af_output:/root/af_output \
--bind $MODEL_PARAMETERS_DIR:/root/models \
--bind $DB_DIR:/root/public_databases \
$ALPHAFOLD_DIR/alphafold3.sif \
python /app/alphafold/run_alphafold.py \
--json_path=/root/af_input/fold_input.json \
--model_dir=/root/models \
--db_dir=/root/public_databases \
--output_dir=/root/af_output
Note
- Make sure to use
python /app/alphafold/run_alphafold.py
when using thealphafold3.sif
image from thebio/alphafold3
module, as in the sample SLURM script above. This is different from the official instructions on thealphafold3
github page.