Staging Data for Computation¶

Summary

Savio was designed for reading/writing files (i.e., I/O) very quickly using the scratch filesystem. Using scratch has benefits for your jobs and reduces the I/O load on the filesystem supporting home and group directories, thereby helping other users get their work done. Therefore, in many cases you should copy ('stage') your data to scratch before running your job(s) and write your job output files to scratch.

When to stage data to scratch¶

Copying a file to scratch is equivalent in terms of filesystem usage to reading the file once in a job. Situations under which one should use scratch include:

files that are used repeatedly within a job or across multiple jobs,
files you wish to read/write in parallel,
large input/output files, and
large numbers of files.

As a rule of thumb, if you are working with more than 100 MB data within your job, you should consider staging it via scratch.

Using data in condo storage¶

For condo storage set up in summer 2021 or since then, the condo storage is on the same filesystem as scratch, so data stored in the condo storage does not need (and should not) be staged to scratch before use. For condo storage set up before summer 2021, please consider staging your data to scratch based on the guidance above.

Staging data to job-specific local storage (/tmp and /dev/shm (memory-backed storage))¶

As an alternative to using your scratch directory for I/O for a job, you are welcome to use the "tmp" storage in the /tmp directory on the node(s) your Slurm job is using if you wish (for example based on the specifics of the software you are using). While in general we suggest using scratch, one situation where it could be advantageous to use /tmp is on savio3_htc, which has fast NVMe solid state drives that, for jobs doing non-parallel I/O, may achieve several-fold faster I/O than using scratch.

Another option for fast local storage is to use /dev/shm, which provides temporary filesystem space that actually resides in RAM (memory) for fast access.

Note that you will need to copy the data within every job to or from /tmp or /dev/shm, as both are local to a given machine and not directly accessible from the login nodes.

Note that at the start of a job submitted to a compute node on Savio, per-job /tmp and /dev/shm directories are set up. That space is accessible only within the confines of the running slurm job itself as /tmp and /dev/shm. These are, in reality, filesystem overlays from /local/job[jobid] and /dev/shm/job[jobid].

Python packages (and Conda environments) include many (thousands or more) small files. Python packages installed via pip are by default stored in ~/.local in your home directory and Conda environments and their constituent packages are by default stored in ~/.conda.

Using these packages or environments (particularly if running Python in parallel) can lead to a heavy burden on the filesystem supporting users' home directories and group directories. In contrast, Python packages and Conda environments installed on scratch can be accessed quickly and without burden on the filesystem.

Staging Data for Computation¶

When to stage data to scratch¶

Using data in condo storage¶

Staging data to job-specific local storage (/tmp and /dev/shm (memory-backed storage))¶

Python-related materials and disk use¶