Transferring Data Between Savio and Box or bDrive
Berkeley Research Computing recommends the use of rclone to
transfer files between Savio and either Box and bDrive.
We'll first describe how to set up rclone so it can access your Box or bDrive account, then how to transfer files via rclone, and then various tips and tricks for customizing rclone.
Here are the steps for setting up Box for access via rclone from Savio.
Here are the steps for setting up bDrive for access via rclone from Savio.
Basic rclone usage
Here we'll illustrate some basic usage of
access the remote and copy files back and forth. We'll illustrate with
a remote set up to access Box. Remember to ssh to the DTN and to load
the rclone module before using rclone.
Getting basic information about the remote
We can list the available remotes (here we see four remotes):
[paciorek@dtn ~]$ rclone listremotes
We can list directories:
[paciorek@dtn ~]$ rclone lsd Box:
-1 2019-02-14 14:21:10 -1 Fall 2019 PhD Admits
-1 2017-07-20 10:11:33 -1 VMs
-1 2018-01-09 09:09:10 -1 Week 6
-1 2016-02-06 16:19:58 -1 admin
-1 2018-05-29 10:44:23 -1 nimble-talks
-1 2019-02-22 18:00:24 -1 research
-1 2016-08-02 14:46:38 -1 workshops
[paciorek@dtn ~]$ rclone lsd Box:research
-1 2016-02-06 15:01:32 -1 bigGP
-1 2019-02-22 18:00:24 -1 code
-1 2016-02-06 15:47:25 -1 paleon
-1 2016-02-06 13:53:23 -1 stepps1
We can query the size of directories:
[paciorek@dtn ~]$ rclone size Box:
Total objects: 1147
Total size: 122.552 GBytes (131588922324 Bytes)
We can list the files (including nested files) within a directory:
[paciorek@dtn ~]$ rclone ls Box:research/code/matlab
We can copy files to and from the remote, as documented here. Note that this will skip files already on the remote that have not changed in the local (i.e., Savio) directory.
[paciorek@dtn ~]$ rclone copy /global/home/users/paciorek/tutorial-parallel-basics Box:test_copy
We can copy files from Box to Savio:
[paciorek@dtn ~]$ rclone copy Box:admin /global/home/users/paciorek/admin
We can also synchronize the files on Savio and our remote, which in addition to copying files will delete files from the remote that don't exist locally on Savio:
[paciorek@dtn ~]$ rclone sync /global/home/users/paciorek/tutorial-parallel-basics Box:test_copy
Tips and tricks
Limiting transfer speed to avoid Google Drive's daily transfer limit
Google Drive, and therefore bDrive, limit one to transferring at most 750 GB in a day. In order to avoid exceeding this limit and having your transfer stop, you can tell rclone to limit the transfer speed such that transferring for a period of 24 hours will fall below the 750 GB limit.
[paciorek@dtn ~]$ rclone sync --bwlimit 8.2M /global/scratch/paciorek/big_proj_data Box:big_proj_data
rclone chunker to break up large
files (particularly for Box)
Box limits files to a maximum size of 15 GB. To get around this you can use rclone's new "chunker" feature. You'll need to use a recent version of rclone (currently this feature is a beta feature not available in the general rclone release). As of April 2020, that means using this module:
module load rclone/2019-12-16-beta
Once you've loaded the module you should be able to run
config, select the "chunker" type (number 29) and follow the
directions here to set up a
chunker remote that 'wraps' a remote such as Box. Then when you want
to transfer to Box with chunking set up, make sure to select the
'wrapper' remote and not the original Box remote.
You can set up a remote such that files copied to the remote are
encrpyted as they are sent to the remote and decrypted as they are
retrieved from the remote. As with the chunker discussion above,
you'll need to 'wrap' an existing remote. To set up
the encrypted remote that wraps an existing remote, after
loading the rclone module, run
rclone config and select
the "crypt" type (number 10). Then when you want to
transfer files such that they are encrpyted on the remote, make sure
to select the 'wrapper' remote and not the original remote.
Configuring rclone to email upon completion or failure
Here's a snippet of bash shell code that will email you when your rclone transfer either succeeds or fails.
[paciorek@dtn ~]$ rclone sync
--log-file logfile --log-level "INFO" && \
mail -s "Rclone result" email@example.com <<< $(cat logfile) || \
mail -s "Rclone failed" firstname.lastname@example.org <<< "rclone transfer failed"