Skip to content

Public Datasets Available on Savio

We make available some large public datasets used for certain workflows/software packages. These datasets are available on a read-only basis in subdirectories located at /global/scratch/collections.

The datasets include:

Dataset Directory Version Update/Download Details
Blast blastdb 5.0 monthly various datasets including 'nr' and 'nt'
ColabFold databases colabdb 1.5.2 every 3 months UniRef30, BFD/Mgnfiy, ColabFold DB
genome/RefSeq genomesdb Release 218 yearly fungi, invertebrate, plant, vertebrate_mammalian, vertebrate_other
alphafold alphafolddb 2.3.0 every 3 months BFD, MGnify, PDB70, PDB PDB seqres.UniProt, UniRef30,UniProt, UniRef90
AlphaFold 3 Alphafold3/public-db 3.0.1 every 3 months BFD, MGnify, PDB70, PDB PDB seqres.UniProt, UniRef30,UniProt, UniRef90

For more details, please see the README files in the database-specific subdirectories of global/scratch/collections.

You can request additional datasets (or an update to existing datasets) through our Software/Data Request Form.

AlphaFold 3 on Savio

AlphaFold 3 is a new AI model developed by Google DeepMind and Isomorphic Labs for predicting the structure and interactions of molecules, including proteins, DNA, RNA, and small molecules. The software package and the public datasets are now available on the Savio cluster.

Genetic Datasets

The genetic datasets required for AlphaFold 3 are saved under the shared directory /global/scratch/collections/Alphafold3/public-db on the Savio cluster.

Model Parameters

The model parameters are the result of training the AlphaFold model and are required for the AlphaFold 3 inference pipeline. The model parameters are distributed separately from the source code by Google DeepMind and are subject to the Model Parameters Terms of Use. Savio users interested in using AlphaFold 3 are required to abide by the above terms of use. Savio users can request a personal copy of the trained model parameters for non-commercial use directly from Google DeepMind free of charge by filling out this Google form. If you have any questions about the fields of the form that are required to fill out, then you may send us an inquiry at brc-hpc-help@berkeley.edu. Once you receive a response and directions from Google DeepMind on obtaining the model parameters, you may save the parameters file in your home directory, a shared project group directory (if you are sharing with your project group members), or a personal scratch directory, inside a sub-directory named model_param. The parameters file is a single file approximately 1GB in size.

Loading AlphaFold 3 module

module load bio/alphafold3/3.0.1

The bio/alphafold3 module defines various environment variables such as ALPHAFOLD_DIR and DB_DIR that can be used to run a job as shown below. Users will have to set up environment variables for MODEL_PARAMETERS_DIR before running the script, or it can be set up directly in the SLURM job submission script as shown below.

Running

Below is a sample SLURM script for running alphafold3  on the savio4_gpu partition after loading the bio/alphafold3  module. It assumes the presence of the fold_input.json  file in the $HOME/af_input  directory and saves the output to $HOME/af_output . Please take careful note of the different options, paths, and variables and make changes as necessary for your specific use case.

#!/bin/bash
#SBATCH --job-name=job_name
#SBATCH --account=account_name
#SBATCH --partition=savio4_gpu
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --gres=gpu:A5000:1
#SBATCH --time=1:30:00

module load bio/alphafold3/3.0.1

# export MODEL_PARAMETERS_DIR variable according to where they are saved
#If model parameters are saved in your home directory
export MODEL_PARAMETERS_DIR=/global/home/users/$USER/model_param

#If model parameters are saved in your shared project group directory
export MODEL_PARAMETERS_DIR=/global/home/groups/<project_name>/model_param

apptainer exec --nv --bind $HOME/af_input:/root/af_input \
                    --bind $HOME/af_output:/root/af_output \
                    --bind $MODEL_PARAMETERS_DIR:/root/models \
                    --bind $DB_DIR:/root/public_databases \
                    $ALPHAFOLD_DIR/alphafold3.sif \
                    python /app/alphafold/run_alphafold.py \
                    --json_path=/root/af_input/fold_input.json \
                    --model_dir=/root/models \
                    --db_dir=/root/public_databases \
                    --output_dir=/root/af_output

Note

  • Make sure to use python /app/alphafold/run_alphafold.py  when using the alphafold3.sif  image from the bio/alphafold3  module, as in the sample SLURM script above. This is different from the official instructions on the alphafold3  github page.