Skip to content

Access Options

Cloud storage access options

Once one has set up cloud storage (discussed here), there are various ways to transfer data to the storage, including automating backups. We give an overview here.

Globus

We generally recommend using Globus for transferring data between various resources including personal computers, cloud storage, and Savio. The source and destination locations between which files are transferred are called “collections”. Globus will transfer files quickly (often in parallel) and (as needed) in multiple parts, resuming transfers after any interruptions, verifying transfer completion, and notifying the user when the transfer is complete.

Globus has a web app, which is generally the best place to start, as well as a CLI (command line interface).

For use on personal or lab machines, one can use Globus Connect Personal or on a lab machine one can set up Globus Connect Server.

One can request that Globus sync files, so that only new or changed files are transferred. This is of course very useful for backup purposes. One danger comes with syncing files that change regularly, given the 30 day (Wasabi) or 180 day (Amazon Glacier Deep Archive) minimum time limits, as one can get charged multiple times for the same file over a given time period.

Transfers can be automated via the Globus timer service, described next.

Note: when transferring files from Google Drive, Globus may not be the preferred solution due to challenges it encounters with Google Drive shortcuts and “same name” files. We recommend using rclone for transferring Google Drive files if you have content with these characteristics.

Globus timer service

One can set up recurring transfers using the Globus timer service, either via the Globus web app or via the Globus CLI. This is a good option for use on Savio.

Other transfer tools

Other tools for making transfers to cloud storage include rclone [link to our rclone info] and AWS CLI (command line interface), as well as various other software for data transfer and backup. Note that despite the name, AWS CLI can be used with Wasabi. AWS CLI allows one to directly transfer into Glacier Deep Archive, unlike Globus, but given the availability of Lifecycle policies to transfer immediately from standard S3 storage to Glacier Deep Archive, this is not an important point of comparison.

If you have administrative access to the local machine, one could use cron to set up automated transfers with rclone and AWS CLI.

Mounting cloud storage

One can "mount" cloud storage such that it appears as a location of the filesystem on the machine you are using as well as various other software used for data transfer. This includes use of rclone mount and use of tools to mount S3-compatible cloud storage such as Amazon Glacier and Wasabi. This makes it easy to move files into and out of the storage, though setting up the mount can sometimes be a bit tricky. In general for I/O speed when doing active computation, one would want to move the data onto local disk storage before reading/writing with an application/executable/program.

Zipping files for transfer

Zipping files for transfer to cloud storage can have advantages and disadvantages.

Advantages include:

  • Faster transfers
  • On AWS, fewer "PUT" requests, reducing costs.
  • Less cloud storage used (based on compression), reducing costs.

Disadvantages include:

  • More logistical details to manage.
  • Perhaps less transparent in terms of what files are in the cloud storage, but in AWS and Wasabi storage, one should be able to see the files that are part of each zip file.
  • Less ability to sync only what has changed, e.g., if one file in a zip changes.
  • You must retrieve the entire zip file, so thought should be given to its contents.