Skip to content

Storing Data

Available storage

Summary

By default, each user on Savio is entitled to a 30 GB home directory that receives regular backups; in addition, each Faculty Computing Allowance-using research group can request 30 GB of project space and each Condo-using research group can request 200 GB of project space to hold research specific application software shared among the group's users. All users also have access to the large Savio high performance scratch filesystem for working with non-persistent data.

Name Location Quota Backup Allocation Description
HOME /global/home/users/ 30 GB Yes Per User HOME directory for permanent code, executables, small data
GROUP /global/home/groups/ 30/200 GB No Per Group GROUP directory, must be requested, for shared code, executables (30 GB for FCA, 200 GB for Condo)
SCRATCH /global/scratch/users none No Per User SCRATCH directory with Lustre FS for data for active compute. See below for details of purge policy.

Group Directory Access

Group directories are not created by default. Group directories must be requested by contacting brc-hpc-help@berkeley.edu. See the documentation on group directories here.

For information on making your files accessible to other users (in particular members of your group), see these instructions.

Checking your disk usage

To see your overall usage in your home directory in relation to your quota, you can run one of these two commands:

quota -s
checkquota

To see your overall scratch usage (in bytes and number of files):

grep ${USER} /global/scratch/scratch-usage.txt

You can also use this invocation to see your scratch (or condo storage) usage:

lfs quota -u ${USER} -h /global/scratch

To determine what is taking up a lot of disk space you can use the du command. Here are some examples:

du -h path_to_subdirectory   # usage by file in a subdirectory
du -h -d 1                   # summary of usage within each top-level subdirectory

Use du sparingly

The du command places a heavy load on the filesystem. Please use it sparingly, including using on specific subdirectories, rather than using it for your entire home or (most importantly) scratch directory. Also consider running once for a given subdirectory and saving the result rather than running repeatedly.

Scratch storage

Every Savio user has scratch space located at /global/scratch/users/<username>. Scratch is a 3.5 PB resource shared among all users; to preserve this space for effective use by all users, it is subject to the purge policy described below.

Large input/output files and other files used intensively (i.e., repeatedly) while running jobs should be copied ('staged') to/from your scratch directory from a home or group directory to avoid slowing down access to home directories for other users.

Purge Policy

Please remember that global scratch storage is a shared resource. We strongly urge users to regularly clean up their data in global scratch to decrease scratch storage usage. Users with many terabytes of data may be requested to reduce their usage, if global scratch becomes full.

We are in the process of implementing the following purge policy. However, the policy is NOT YET in effect.

  • Research IT will purge files not accessed in 120 days
  • Research IT will run a check every week looking back at inactive files from the last 120 days to determine which files to delete
  • Research IT will notify users which files will be purged via a file in their user directory
  • As the system fills a more aggressive purge policy may be required to maintain system functionality

If you need to retain access to data on the cluster for more than 120 days between uses, you can consider purchasing storage space through the condo storage program.

Group scratch directories are not provided. Users who would like to share materials in their scratch directory with other users can set UNIX permissions to allow access to their directories/files.

Savio condo storage

Condo storage is additional persistence storage available for purchase at a competitive price.

As of summer 2021, Condo storage directories are on the the scratch filesystem (but of course without purging). So for users accessing condo storage directories found in /global/scratch/projects, you do not need to (and should not) stage your data to your individual scratch directory. For condo users accessing condo storage directories in /clusterfs, you should still stage your data to your individual scratch directory when appropriate.

More storage options

If you need additional storage during the active phase of your research, such as longer-term storage to augment Savio's temporary Scratch storage, or off-premises storage for backing up data, the Active Research Data Storage Guidance Grid can help you identify suitable options.

Assistance with research data management

The campus's Research Data Management (RDM) service offers consulting on managing your research data, which includes the design or improvement of data transfer workflows, selection of storage solutions, and a great deal more. This service is available at no cost to campus researchers. To get started with a consult, please contact RDM Consulting.

In addition, you can find both high level guidance on research data management topics and deeper dives into specific subjects on RDM's website. (Visit the links in the left-hand sidebar to further explore each of the site's main topics.)

CGRL storage

The following storage systems are available to CGRL users. For running jobs, compute nodes within a cluster can only directly access the storage as listed below. The DTN can be used to transfer data between the locations accessible to only one cluster or the other, as detailed in the previous section.

Name Cluster Location Quota Backup Allocation Description
Home Both /global/home/users/$USER 30 GB Yes Per User Home directory ($HOME) for permanent data
Scratch Vector /clusterfs/vector/scratch/$USER none No Per User Short-term, large-scale storage for computing
Group Vector /clusterfs/vector/instrumentData/ 300 GB No Per Group Group-shared storage for computing
Scratch Rosalind (Savio) /global/scratch/users/$USER none No Per User Short-term, large-scale Lustre storage for very high-performance computing
Condo User Rosalind (Savio) /clusterfs/rosalind/users/$USER none No Per User Long-term, large-scale user storage
Condo Group Rosalind (Savio) /clusterfs/rosalind/groups/ none No Per Group Long-term, large-scale group-shared storage