Skip to content

Storing Data

Available storage

Summary

By default, each user on Savio is entitled to a 30 GB home directory that receives regular backups; in addition, each Faculty Computing Allowance-using research group can request 30 GB of project space and each Condo-using research group can request 200 GB of project space to hold research specific application software shared among the group's users. All users also have access to the large Savio high performance scratch filesystem for working with non-persistent data.

Name Location Quota Backup Allocation Description
HOME /global/home/users/ 30 GB Yes Per User HOME directory for permanent code, executables, small data
GROUP /global/home/groups/ 30/200 GB No Per Group GROUP directory, must be requested, for shared code, executables (30 GB for FCA, 200 GB for Condo)
SCRATCH /global/scratch/users none No Per User SCRATCH directory with Lustre FS for data for active compute. See below for details of purge policy.

Group Directory Access

Group directories are not created by default. Group directories must be requested by contacting brc-hpc-help@berkeley.edu. See the documentation on group directories here.

For information on making your files accessible to other users (in particular members of your group), see these instructions.

Checking your disk usage

To see your overall usage in your home directory in relation to your quota, you can run one of these two commands:

quota -s
checkquota

To see your overall scratch usage (in bytes and number of files):

grep ${USER} /global/scratch/scratch-usage.txt

You can also use this invocation to see your scratch (or condo storage) usage:

lfs quota -u ${USER} -h /global/scratch

If you have access to condo storage, you can see its available and used space as follows:

cd /global/scratch/projects/
./quota.sh

To determine what is taking up a lot of disk space you can use the du command. Here are some examples:

du -h path_to_subdirectory   # usage by file in a subdirectory
du -h -d 1                   # summary of usage within each top-level subdirectory

Use du sparingly

The du command places a heavy load on the filesystem. Please use it sparingly, including using on specific subdirectories, rather than using it for your entire home or (most importantly) scratch directory. Also consider running once for a given subdirectory and saving the result rather than running repeatedly.

Scratch storage

Every Savio user has scratch space located at /global/scratch/users/<username>. Scratch is a 8.7 PB resource shared among all users; to preserve this space for effective use by all users, we ask users to keep their usage to reasonable levels (see below) and to delete data no longer needed for active computation.

Large input/output files and other files used intensively (i.e., repeatedly) while running jobs should be copied ('staged') to/from your scratch directory from a home or group directory to avoid slowing down access to home directories for other users.

Limited space in scratch

Please remember that global scratch storage is a shared resource. We strongly urge users to regularly clean up their data in global scratch to decrease scratch storage usage. Furthermore, as general guidance, we generally expect individual users not to use more than 12 TB of space in scratch. Users with many terabytes of data may be requested to reduce their usage and may be locked out of their accounts if they do not reduce their usage.

Group scratch directories are not provided. Users who would like to share materials in their scratch directory with other users can set UNIX permissions to allow access to their directories/files.

Savio condo storage

Condo storage is additional persistence storage available for purchase at a competitive price.

As of summer 2021, Condo storage directories are on the same filesystem as Savio's scratch storage. So for users accessing condo storage directories found in /global/scratch/projects, you do not need to (and should not) stage your data to your individual scratch directory. For condo users accessing condo storage directories in /clusterfs, you should still stage your data to your individual scratch directory when appropriate.

You can see the available and used space in the condo as follows:

cd /global/scratch/projects/
./quota.sh

More storage options

If you need additional storage during the active phase of your research, such as longer-term storage to augment Savio's temporary Scratch storage, or off-premises storage for backing up data, the Active Research Data Storage Guidance Grid can help you identify suitable options.

Assistance with research data management

The campus's Research Data Management (RDM) service offers consulting on managing your research data, which includes the design or improvement of data transfer workflows, selection of storage solutions, and a great deal more. This service is available at no cost to campus researchers. To get started with a consult, please contact RDM Consulting.

In addition, you can find both high level guidance on research data management topics and deeper dives into specific subjects on RDM's website. (Visit the links in the left-hand sidebar to further explore each of the site's main topics.)

CGRL storage

The following storage systems are available to CGRL users. For running jobs, compute nodes within a cluster can only directly access the storage as listed below. The DTN can be used to transfer data between the locations accessible to only one cluster or the other, as detailed in the previous section.

Name Cluster Location Quota Backup Allocation Description
Home Both /global/home/users/$USER 30 GB Yes Per User Home directory ($HOME) for permanent data
Scratch Vector /clusterfs/vector/scratch/$USER none No Per User Short-term, large-scale storage for computing
Group Vector /clusterfs/vector/instrumentData/ 300 GB No Per Group Group-shared storage for computing
Scratch Rosalind (Savio) /global/scratch/users/$USER none No Per User Short-term, large-scale Lustre storage for very high-performance computing
Condo User Rosalind (Savio) /clusterfs/rosalind/users/$USER none No Per User Long-term, large-scale user storage
Condo Group Rosalind (Savio) /clusterfs/rosalind/groups/ none No Per Group Long-term, large-scale group-shared storage