Data Storage Overview¶
The number and variety of different (cloud) storage options can seem overwhelming.
This documentation gives an overview of some options for storage of research data in the cloud (and related campus storage), with guidance and recommendations for different use cases, in particular for backup. Additional details on the steps involved in setting up storage and backup transfers can be found here.
Note that “backing up” up one’s data can take several forms. Here we mean making a protection copy in case the data becomes lost, corrupted, or otherwise unavailable. The question of how often to make these copies is a matter of the comfort (and budget) of the research project. A similar variant of this would be making an archive copy of data that needs to be retained but only accessed infrequently, if at all. Cloud storage is a good solution for these needs. More traditionally, backup means regularly saving changes made to files by using backup software, such as Time Machine, that records the differences and stores these incrementally in a manner that allows retrieval at any point in time. At this time, campus offers no officially sanctioned vendor services for this type of backup. Researchers are on their own to arrange for traditional backup services of this type, though information security and procurement policies apply.
Note that non-data files, such as code, configuration files, and documents, would generally be stored in other locations such as GitHub, backed-up home directories, bDrive, Box, etc.
Use cases and recommendations¶
It’s helpful to (at least loosely) define several types of storage as follows.
- Active/hot: frequently accessed (e.g., weekly/monthly), needs to be readily accessible, fast upload/download.
- Warm: accessed on a quarterly basis (small number of times a year), needs to be readily accessible.
- Cold: accessed once/twice a year; if longer, should have plan to transition to Archive
- Archive: accessed every 1-3 years at most, conceivably never; persistent; slow upload ok.
Next we’ll give a few recommendations. The following sections give more details about these recommended options.
- For archive storage, Amazon Glacier Deep Archive is a good option.
- For cold, warm, or active storage, Wasabi cloud storage is a good option.
- For large amounts of active/hot, warm or cold storage, UC Berkeley’s Active Archive Object Storage is a good cloud-like option.
With the imposition of storage limits in bDrive (i.e., Google Drive) and Box, these storage options are no longer as appealing for large amounts of research data, but they may be useful in some situations. Berkeley IT has information about storage limits in Google Drive and Box.