Transferring Data Between Savio and Box or bDrive
Overview: why rclone
?¶
While rclone
is a good option to to transfer files
between Savio and either Box and bDrive, Globus
Globus may be an even better option.
The bConnected storage and collaboration solution offers everyone at UC Berkeley easily accessible storage, strong search capabilities, and mobile access. This storage is an important data management resource for research teams, and is often used for offsite backups of valuable data. In past years, the amount of storage has been unlimited, but this is no longer the case.
rclone
performs integrity checks on transfers and supports encryption of
transferred files as well as file names if required. Because it is a command
line tool, it is easy to use in a script. Scripted use of rclone
could, for
example, be launched after-hours to copy data generated during the day from
multiple directories as an automated backup mechanism (the computer running
rclone should be configured to not sleep if transfers will run for an extended
period). Versions of rclone
are available for macOS, Windows, and Linux. On
Savio, rclone
is already and requires minimal configuration to connect to your
Box or bDrive account.
We'll first describe how to set up rclone
so it can access your Box or bDrive
account, then how to transfer files via rclone
, and then various tips and
tricks for customizing rclone
.
One-time setup¶
Here are the steps for setting up Box for access via `rclone` from Savio.
Here are the steps for setting up bDrive for access via `rclone` from Savio.
Basic rclone
usage¶
Here we'll illustrate some basic usage of rclone
to
access the remote and copy files back and forth. We'll illustrate with
a remote set up to access Box. Remember to ssh to the DTN before using `rclone`.
Getting basic information about the remote¶
We can list the available remotes (here we see four remotes):
[paciorek@dtn ~]$ rclone listremotes
Box:
Box-SPA:
bDrive:
bDrive-encrypt:
We can list directories:
[paciorek@dtn ~]$ rclone lsd Box:
-1 2019-02-14 14:21:10 -1 Fall 2019 PhD Admits
-1 2017-07-20 10:11:33 -1 VMs
-1 2018-01-09 09:09:10 -1 Week 6
-1 2016-02-06 16:19:58 -1 admin
-1 2018-05-29 10:44:23 -1 nimble-talks
-1 2019-02-22 18:00:24 -1 research
-1 2016-08-02 14:46:38 -1 workshops
[paciorek@dtn ~]$ rclone lsd Box:research
-1 2016-02-06 15:01:32 -1 bigGP
-1 2019-02-22 18:00:24 -1 code
-1 2016-02-06 15:47:25 -1 paleon
-1 2016-02-06 13:53:23 -1 stepps1
We can query the size of directories:
[paciorek@dtn ~]$ rclone size Box:
Total objects: 1147
Total size: 122.552 GBytes (131588922324 Bytes)
We can list the files (including nested files) within a directory:
[paciorek@dtn ~]$ rclone ls Box:research/code/matlab
30166 #bmars.m#
480 README
2376 run.wood2.m
2387 run.wood2stdz.m
80 sub.m
11908 fbmtools/fbmmlpread.m
1588 fbmtools/thin.m
909 fbmtools/private/genop.m
582 fbmtools/private/gminus.m
572 fbmtools/private/gplus.m
Copying files
We can copy files to and from the remote, as documented here. Note that this will skip files already on the remote that have not changed in the local (i.e., Savio) directory.
[paciorek@dtn ~]$ rclone copy /global/home/users/paciorek/tutorial-parallel-basics Box:test_copy
We can copy files from Box to Savio:
[paciorek@dtn ~]$ rclone copy Box:admin /global/home/users/paciorek/admin
We can also synchronize the files on Savio and our remote, which in addition to copying files will delete files from the remote that don't exist locally on Savio:
[paciorek@dtn ~]$ rclone sync /global/home/users/paciorek/tutorial-parallel-basics Box:test_copy
Tips and tricks¶
Limiting transfer speed to avoid Google Drive's daily transfer limit¶
Google Drive, and therefore bDrive, limit one to transferring at most 750 GB in a day. In order to avoid exceeding this limit and having your transfer stop, you can tell `rclone` to limit the transfer speed such that transferring for a period of 24 hours will fall below the 750 GB limit.
[paciorek@dtn ~]$ rclone sync --bwlimit 8.2M /global/scratch/users/paciorek/big_proj_data Box:big_proj_data
Using nohup
to keep rclone
running in the background¶
In most cases (e.g., when not using a terminal multiplexer) if your terminal closes or times out while an rclone
command is running, all progress will be lost. The easiest way to avoid this is by using the nohup
command to run rclone
in the background. To use nohup, simply add nohup
before your rclone
command and followed by &
. By default, any output of the command will be printed to the file nohup.out
in the working directory. To specify a different file append >
followed by the file name to your command.
[paciorek@dtn ~]$ nohup rclone copy Box:admin /global/home/users/paciorek/admin > out.txt &
Using rclone chunker
to break up large files (particularly for Box)¶
Box limits files to a maximum size of 15 GB. To get around this you
can use `rclone`'s new "chunker"
feature. You should be able to run rclone
config
, select the "chunker" type (number 38) and follow the
directions here to set up a
chunker remote that 'wraps' a remote such as Box. Then when you want
to transfer to Box with chunking set up, make sure to select the
'wrapper' remote and not the original Box remote.
Automatic encryption¶
You can set up a remote such that files copied to the remote are
encrpyted as they are sent to the remote and decrypted as they are
retrieved from the remote. As with the chunker discussion above,
you'll need to 'wrap' an existing remote. To set up
the encrypted remote that wraps an existing remote, after
loading the `rclone` module, run rclone config
and select
the "crypt" type (number 13). Then when you want to
transfer files such that they are encrpyted on the remote, make sure
to select the 'wrapper' remote and not the original remote.
Configuring rclone
to email upon completion or failure¶
Here's a snippet of bash shell code that will email you when your
rclone
transfer either succeeds or fails.
[paciorek@dtn ~]$ rclone sync
/global/scratch/users/paciorek/big_project_data Box:big_project_data
--log-file logfile --log-level "INFO" && \
mail -s "Rclone result" username@berkeley.edu <<< $(cat logfile)
|| \
mail -s "Rclone failed" username@berkeley.edu <<< "rclone transfer failed"