Backing Up Data¶
With the exception of home directories, the Savio filesystem is not backed up. Given that data is generally stored in non-backed-up scratch and condo storage, we have developed some guidance on approaches for backing up or archiving your data to the cloud.
Note that we expect that non-data files, such as code and configuration files would generally be protected by having copies elsewhere, such as on GitHub or your own machine(s).
Note that "backing up" up one's data can take several forms. Here we mean making a protection copy in case the data on Savio becomes lost, corrupted, or otherwise unavailable. The question of how often to make these copies is a matter of the comfort (and budget) of the research project. A similar variant of this would be making an archive copy of data that needs to be retained but only accessed infrequently, if at all. Cloud storage is a good solution for these needs. More traditionally, backup means regularly saving changes made to files by using backup software, such as Time Machine, that records the differences and stores these incrementally in a manner that allows retrieval at any point in time. At this time, campus offers no officially sanctioned vendor services for this type of backup. Researchers are on their own to arrange for traditional backup services of this type, though information security and procurement policies apply.
Note that given the availability of scratch and condo storage, we don’t anticipate that users will need to be regularly moving data back and forth to cloud or other storage during the course of computations.
Also note that cloud storage can have minimum time charges, so frequent backup of files that are changing regularly can end up incurring multiple charges for the same file [link to more details in new RDM doc under development].
Archive backup¶
For archive-level backup, we recommend Glacier Deep Archive, which is a storage tier in Amazon’s S3 storage system. Archive-level backup involves backing up material that you never expect to have to retrieve, such as in the case of accidental deletion by a user or data loss on the Savio filesystem.
One can use Globus to copy data to AWS as needed and the Globus timer service to automate backups to occur at a regular interval.
Our Research Data Management documentation provides details of setting up AWS and Globus for backups.
Cold or warm backup¶
For data that you expect to retrieve occasionally, that you may need to save for a short period of time, or where files may change regularly (but note Wasabi has minimum charges), we recommend Wasabi. This would be suitable for data that you need to remove from scratch because you are not actively computing with it, or backing up files in more active use than those suitable for archive-level backup.
One can use Globus to copy data to Wasabi as needed and the Globus timer service to automate backups to occur at a regular interval.
Our Research Data Management documentation provides details of setting up Wasabi and Globus for backups.