Skip to content

Transferring Data Between Savio and Box or bDrive

Overview: why rclone?

Berkeley Research Computing recommends the use of rclone to transfer files between Savio and either Box and bDrive. The bDrive storage and collaboration solution offers everyone at UC Berkeley unlimited storage, strong search capabilities, and mobile access. This storage is an important data management resource for research teams, and is often used for offsite backups of valuable data.

rclone performs integrity checks on transfers and supports encryption of transferred files as well as file names if required. Because it is a command line tool, it is easy to use in a script. Scripted use of rclone could, for example, be launched after-hours to copy data generated during the day from multiple directories as an automated backup mechanism (the computer running rclone should be configured to not sleep if transfers will run for an extended period). Versions of rclone are available for macOS, Windows, and Linux. On Savio, rclone is already and requires minimal configuration to connect to your Box or bDrive account.

We'll first describe how to set up rclone so it can access your Box or bDrive account, then how to transfer files via rclone, and then various tips and tricks for customizing rclone.

One-time setup

Here are the steps for setting up Box for access via `rclone` from Savio.

Here are the steps for setting up bDrive for access via `rclone` from Savio.

Basic rclone usage

Here we'll illustrate some basic usage of rclone to access the remote and copy files back and forth. We'll illustrate with a remote set up to access Box. Remember to ssh to the DTN and to load the `rclone` module before using `rclone`.

Getting basic information about the remote

We can list the available remotes (here we see four remotes):

[paciorek@dtn ~]$ rclone listremotes
Box:
Box-SPA:
bDrive:
bDrive-encrypt:

We can list directories:

[paciorek@dtn ~]$ rclone lsd Box:
-1 2019-02-14 14:21:10 -1 Fall 2019 PhD Admits
-1 2017-07-20 10:11:33 -1 VMs
-1 2018-01-09 09:09:10 -1 Week 6
-1 2016-02-06 16:19:58 -1 admin
-1 2018-05-29 10:44:23 -1 nimble-talks
-1 2019-02-22 18:00:24 -1 research
-1 2016-08-02 14:46:38 -1 workshops

[paciorek@dtn ~]$ rclone lsd Box:research
-1 2016-02-06 15:01:32 -1 bigGP
-1 2019-02-22 18:00:24 -1 code
-1 2016-02-06 15:47:25 -1 paleon
-1 2016-02-06 13:53:23 -1 stepps1

We can query the size of directories:

[paciorek@dtn ~]$ rclone size Box:
Total objects: 1147
Total size: 122.552 GBytes (131588922324 Bytes)

We can list the files (including nested files) within a directory:

[paciorek@dtn ~]$ rclone ls Box:research/code/matlab
30166 #bmars.m#
480 README
2376 run.wood2.m
2387 run.wood2stdz.m
80 sub.m
11908 fbmtools/fbmmlpread.m
1588 fbmtools/thin.m
909 fbmtools/private/genop.m
582 fbmtools/private/gminus.m
572 fbmtools/private/gplus.m

Copying files

We can copy files to and from the remote, as documented here. Note that this will skip files already on the remote that have not changed in the local (i.e., Savio) directory.

[paciorek@dtn ~]$ rclone copy /global/home/users/paciorek/tutorial-parallel-basics Box:test_copy

We can copy files from Box to Savio:

[paciorek@dtn ~]$ rclone copy Box:admin /global/home/users/paciorek/admin

We can also synchronize the files on Savio and our remote, which in addition to copying files will delete files from the remote that don't exist locally on Savio:

[paciorek@dtn ~]$ rclone sync /global/home/users/paciorek/tutorial-parallel-basics Box:test_copy

Tips and tricks

Limiting transfer speed to avoid Google Drive's daily transfer limit

Google Drive, and therefore bDrive, limit one to transferring at most 750 GB in a day. In order to avoid exceeding this limit and having your transfer stop, you can tell `rclone` to limit the transfer speed such that transferring for a period of 24 hours will fall below the 750 GB limit.

[paciorek@dtn ~]$ rclone sync --bwlimit 8.2M /global/scratch/users/paciorek/big_proj_data Box:big_proj_data

Using nohup to keep rclone running in the background

In most cases (e.g., when not using a terminal multiplexer) if your terminal closes or times out while an rclone command is running, all progress will be lost. The easiest way to avoid this is by using the nohup command to run rclone in the background. To use nohup, simply add nohup before your rclone command and followed by &. By default, any output of the command will be printed to the file nohup.out in the working directory. To specify a different file append > followed by the file name to your command.

[paciorek@dtn ~]$ nohup rclone copy Box:admin /global/home/users/paciorek/admin > out.txt &

Using rclone chunker to break up large files (particularly for Box)

Box limits files to a maximum size of 15 GB. To get around this you can use `rclone`'s new "chunker" feature. You'll need to use a recent version of `rclone` (currently this feature is a beta feature not available in the general `rclone` release). As of April 2020, that means using this module:

module load rclone/2019-12-16-beta

Once you've loaded the module you should be able to run rclone config, select the "chunker" type (number 29) and follow the directions here to set up a chunker remote that 'wraps' a remote such as Box. Then when you want to transfer to Box with chunking set up, make sure to select the 'wrapper' remote and not the original Box remote.

Automatic encryption

You can set up a remote such that files copied to the remote are encrpyted as they are sent to the remote and decrypted as they are retrieved from the remote. As with the chunker discussion above, you'll need to 'wrap' an existing remote. To set up the encrypted remote that wraps an existing remote, after loading the `rclone` module, run rclone config and select the "crypt" type (number 10). Then when you want to transfer files such that they are encrpyted on the remote, make sure to select the 'wrapper' remote and not the original remote.

Configuring rclone to email upon completion or failure

Here's a snippet of bash shell code that will email you when your rclone transfer either succeeds or fails.

[paciorek@dtn ~]$ rclone sync /global/scratch/users/paciorek/big_project_data Box:big_project_data --log-file logfile --log-level "INFO" && \
mail -s "Rclone result" username@berkeley.edu <<< $(cat logfile) || \
mail -s "Rclone failed" username@berkeley.edu <<< "rclone transfer failed"