Transferring Data Between Savio and Box or bDrive

Berkeley Research Computing recommends the use of rclone to transfer files between Savio and either Box and bDrive.
We'll first describe how to set up rclone so it can access your Box or bDrive account, then how to transfer files via rclone, and then various tips and tricks for customizing rclone.

One-time setup

Here are the steps for setting up Box for access via rclone from Savio.

Here are the steps for setting up bDrive for access via rclone from Savio.

Basic rclone usage

Here we'll illustrate some basic usage of rclone to access the remote and copy files back and forth. We'll illustrate with a remote set up to access Box. Remember to ssh to the DTN and to load the rclone module before using rclone.

Getting basic information about the remote

We can list the available remotes (here we see four remotes):

[paciorek@dtn ~]$ rclone listremotes
Box:
Box-SPA:
bDrive:
bDrive-encrypt:

We can list directories:

[paciorek@dtn ~]$ rclone lsd Box:
-1 2019-02-14 14:21:10 -1 Fall 2019 PhD Admits
-1 2017-07-20 10:11:33 -1 VMs
-1 2018-01-09 09:09:10 -1 Week 6
-1 2016-02-06 16:19:58 -1 admin
-1 2018-05-29 10:44:23 -1 nimble-talks
-1 2019-02-22 18:00:24 -1 research
-1 2016-08-02 14:46:38 -1 workshops

[paciorek@dtn ~]$ rclone lsd Box:research
-1 2016-02-06 15:01:32 -1 bigGP
-1 2019-02-22 18:00:24 -1 code
-1 2016-02-06 15:47:25 -1 paleon
-1 2016-02-06 13:53:23 -1 stepps1

We can query the size of directories:

[paciorek@dtn ~]$ rclone size Box:
Total objects: 1147
Total size: 122.552 GBytes (131588922324 Bytes)

We can list the files (including nested files) within a directory:

[paciorek@dtn ~]$ rclone ls Box:research/code/matlab
30166 #bmars.m#
480 README
2376 run.wood2.m
2387 run.wood2stdz.m
80 sub.m
11908 fbmtools/fbmmlpread.m
1588 fbmtools/thin.m
909 fbmtools/private/genop.m
582 fbmtools/private/gminus.m
572 fbmtools/private/gplus.m

Copying files

We can copy files to and from the remote, as documented here. Note that this will skip files already on the remote that have not changed in the local (i.e., Savio) directory.

[paciorek@dtn ~]$ rclone copy /global/home/users/paciorek/tutorial-parallel-basics Box:test_copy

We can copy files from Box to Savio:

[paciorek@dtn ~]$ rclone copy Box:admin /global/home/users/paciorek/admin

We can also synchronize the files on Savio and our remote, which in addition to copying files will delete files from the remote that don't exist locally on Savio:

[paciorek@dtn ~]$ rclone sync /global/home/users/paciorek/tutorial-parallel-basics Box:test_copy

Tips and tricks

Limiting transfer speed to avoid Google Drive's daily transfer limit

Google Drive, and therefore bDrive, limit one to transferring at most 750 GB in a day. In order to avoid exceeding this limit and having your transfer stop, you can tell rclone to limit the transfer speed such that transferring for a period of 24 hours will fall below the 750 GB limit.

[paciorek@dtn ~]$ rclone sync --bwlimit 8.2M /global/scratch/paciorek/big_proj_data Box:big_proj_data

Using rclone chunker to break up large files (particularly for Box)

Box limits files to a maximum size of 15 GB. To get around this you can use rclone's new "chunker" feature. You'll need to use a recent version of rclone (currently this feature is a beta feature not available in the general rclone release). As of April 2020, that means using this module:

module load rclone/2019-12-16-beta

Once you've loaded the module you should be able to run rclone config, select the "chunker" type (number 29) and follow the directions here to set up a chunker remote that 'wraps' a remote such as Box. Then when you want to transfer to Box with chunking set up, make sure to select the 'wrapper' remote and not the original Box remote.

Automatic encryption

You can set up a remote such that files copied to the remote are encrpyted as they are sent to the remote and decrypted as they are retrieved from the remote. As with the chunker discussion above, you'll need to 'wrap' an existing remote. To set up the encrypted remote that wraps an existing remote, after loading the rclone module, run rclone config and select the "crypt" type (number 10). Then when you want to transfer files such that they are encrpyted on the remote, make sure to select the 'wrapper' remote and not the original remote.

Configuring rclone to email upon completion or failure

Here's a snippet of bash shell code that will email you when your rclone transfer either succeeds or fails.

[paciorek@dtn ~]$ rclone sync /global/scratch/paciorek/big_project_data Box:big_project_data --log-file logfile --log-level "INFO" && \
mail -s "Rclone result" username@berkeley.edu <<< $(cat logfile) || \
mail -s "Rclone failed" username@berkeley.edu <<< "rclone transfer failed"