Data Transfer Overview¶
Moving data - both internal and external to UC Berkeley - as part of research can be a time consuming and challenging task. Campus supports data transfer tools such as Globus for medium- to large-sized data transfer needs and consultants in Research IT also have knowledge of other tools, such as Rclone, Secure FTP (SFTP) via FileZilla, FTP Secure (FTPS), Secure Copy Protocol (SCP), Rsync, Cyberduck, and Google Drive Sync/Google Drive File Stream to help with moving data between specific systems.
Globus¶
Globus is a user-friendly, web-based tool that is especially well suited for large file transfers and sharing data with colleagues. It makes GridFTP (which provides better transfer rates for large files) transfers trivial, so users do not have to learn command line options for manual performance tuning. Globus also does automatic performance tuning and has been shown to perform comparable to - or even better (in some cases) than - expert-tuned GridFTP. Using Globus allows you to make unattended transfers that are fast and reliable. It is recommended for transferring a large number of files and/or large files and can also be used to transfer files between and to and from various cloud platforms and providers such as Google Drive, Box, Wasabi, and AWS S3 storage. UC Berkeley has a Globus subscription (i.e., "High Assurance and HIPAA/BAA subscription") that provides a wide variety of functionality to all campus affiliates. For additional details, guidelines, instructions, and examples on using Globus to transfer and share data, see:
- Using Globus at UC Berkeley
- Using Globus with the Savio HPC cluster
- Sharing files on the Savio HPC Cluster using Globus
- LBNL Information Technology: Globus
- UC Berkeley Department of Statistics: Using Globus for file transfers
Rclone¶
Rclone is a command-line program that can be installed, configured, and used on Linux, macOS, and Windows systems to manage files on cloud storage and is well-suited for transferring and syncing files to and from and between cloud platforms (including S3-based services) and other systems. Rclone also offers options to optimize a transfer and reach higher transfer speeds than other common transfer tools such as scp and rsync (see below). UCB researchers can find helpful instructions, guidelines, and examples on configuring and using Rclone to transfer data between and to and from the following systems, resources, and platofrms as follows:
- The Savio HPC cluster and bDrive (Google Drive)/UC Berkeley Box here. Though this documentation is focused more towards users of the Savio HPC cluster, the examples and instructions there should be useful to general UCB researchers as well.
- You can find instructions on configuring and using Rclone to transfer files to and from Google Drive here.
- You can find instructions on configuring and using Rclone to transfer files to and from Amazon Web Services (AWS) S3 storage here and here.
- You can find instructions on configuring and using Rclone to transfer files to and from Wasabi here and here.
- You can find instructions on configuring and using Rclone to transfer files to and from Google Cloud Storage here.
- You can find instructions on configuring and using Rclone to transfer files to and from Box here.
- Detailed instructions on How to Migrate Data from Google Drive to Wasabi using Rclone.
For additional information, guidelines, instructions, and examples on configuring and using Rclone to transfer data to and from Google Drive, Box, and DropBox, see Using rclone - UC Berkeley Statistics.
SFTP, FTPS, SCP, and Rsync¶
When transferring a modest number of smaller-sized files, you can use command-line file transfer tools such as Secure File Transfer Protocol (SFTP), FTP Secure (FTPS) Secure Copy Protocol (SCP), and Rsync on Linux and macOS systems, to help with moving data between specific systems, e.g., between a local host and a remote host or between two remote hosts. Rsync can also be used as a utility for efficiently transferring and synchronizing files between a computer and a storage drive and across networked computers by comparing the modification times and sizes of files. These prorgrams are commonly found on Unix-like operating systems (e.g., Linux and macOS systems). FileZilla is a free and open-source, cross-platform FTP/SFTP/FTPS GUI application for use with Windows, Linux, and macOS systems.
Note that on the BRC Supercluster SCP, STFP, FTPS, and Rsync protocols and tools can be used to transfer data to and from the cluster by connecting to the BRC supercluster's dedicated Data Transfer Node, dtn.brc.berkeley.edu.
For more details, guidelines, instructions, and examples of using these tools, see below. Though much of the documentation linked here is focused and directed more towards users of the Savio HPC cluster or other computing systems on the UC Berkeley campus, the examples and instructions there should also be useful to general UCB researchers as well:
- Data Transfer Overview for the Berkeley Savio and Vector clusters
- Using SCP with the Savio HPC Cluster
- Using Rsync on Savio for file migration
- How to Use Rsync to Exclude Files and Directories in Data Transfer
- Using SFTP via Filezilla -- Example: Transferring data between your computer and the Berkeley Savio and Vector clusters
- Using FileZilla to transfer files from the UCB Genomics Sequencing Lab FTP Server
- UC Berkeley Department of Statistics: How do I copy files to or from my account?
- UC Berkeley Instructional & Research Information Systems (IRIS): SSH and FTP
Cyberduck, duck, and Mountain Duck¶
Cyberduck is a popular stand-alone FTP, SFTP, FTPS, WebDAV client, and cloud storage browser for macOS and Windows that allows you to connect to and transfer files between different servers and cloud storage services, including Amazon S3, Microsoft Azure, Google Drive, and Dropbox. Cyberduck provides a user-friendly interface and supports various file transfer protocols, making it a versatile tool for managing and transferring files between different servers and cloud storage services. It also includes support for Google Drive and Amazon S3 and allows users to make direct file transfers between, for example, Google Drive and Box. You can download and install Cyberduck by visiting the official Cyberduck website and downloading the appropriate version for your operating system.
Also, if you want to synchronize files in near real-time between, for example, your own laptop computer and the Savio HPC Cluster or your lab server or another system, you can use Cyberduck to synchronize a directory on your computer with a directory on Savio or another system. On Savio, you should use the dtn.brc.berkeley.edu address (for the Savio Data Transfer Node) when connecting when using Cyberduck (in this example). Research IT also provides some support for the use of Cyberduck on the Analytics Environments on Demand (AEoD) system. As an example, here are some guidelines for how to use Cyberduck to transfer files between two remote servers or cloud providers:
-
Download and install Cyberduck: Visit the official Cyberduck website and download the application for your operating system. Install Cyberduck by following the provided instructions.
-
Launch Cyberduck: Once installed, launch the Cyberduck application on your computer.
-
Add connections to the remote servers/cloud providers: Click on the "Open Connection" button in Cyberduck to add connections to the remote servers or cloud providers you want to transfer data between. Select the appropriate protocol (e.g., FTP, SFTP, Amazon S3, Azure Blob Storage, etc.) for each connection and enter the required connection details such as server address, port, username, and password.
-
Save the connections: After entering the connection details, save the connections by clicking the "Connect" button. Cyberduck will establish connections to the remote servers/cloud providers and remember the configuration for future use.
-
Browse the remote servers/cloud providers: Cyberduck will display a list of connections or bookmarks representing the remote servers or cloud providers you added. Double-click on a connection or bookmark to establish a connection and browse the remote file system.
-
Transfer data between remote servers/cloud providers: To transfer data between the connected servers/cloud providers, you can use drag-and-drop operations or right-click on files/folders and select options like "Upload" or "Download". For example, to transfer data from one server/cloud provider to another, navigate to the source server/cloud provider, select the files or folders you want to transfer, and drag them to the destination server/cloud provider. Cyberduck will initiate the transfer and show progress indicators.
-
Manage files and folders: Cyberduck provides features to manage files and folders on the remote servers/cloud providers. You can create new folders, rename files/folders, delete files/folders, change permissions, and perform other file operations as needed.
-
Disconnect from the remote servers/cloud providers: When you have finished transferring data and managing files, you can disconnect from the remote servers/cloud providers by closing the Cyberduck application or using the "Disconnect" option in the application menu.
duck is the command line interface (CLI) for Cyberduck and is available for use with Mac, Windows & Linux. It is a universal file transfer tool which runs in your shell on Linux and macOS or your Windows command line prompt. With duck, you can edit files on remote servers, download, upload and copy between servers with FTP, SFTP or WebDAV and there is support for cloud storage Amazon S3 & OpenStack Swift deployments.
Mountain Duck is another file transfer software application developed by the same company that developed Cyberduck. Mountain Duck is built upon the same foundation as Cyberduck but takes a different approach. Rather than functioning as a standalone client, Mountain Duck focuses on mounting remote servers as local drives on your computer. It integrates with your operating system's File Explorer (Windows) or Finder (macOS), allowing you to access and interact with remote files as if they were stored locally. Mountain Duck supports a wide range of server protocols, similar to Cyberduck, and it can help you to simplify the process of transferring data between different cloud platforms (for example) by treating them as local drives on your computer. In this way you can use Mountain Duck to transfer files between your computer and other systems and cloud storage providers, as well as between cloud storage providers (for example).
For more details, guidelines, instructions, and examples on installing, configuring, and using these tools, see:
- Cyberduck & Mountain Duck Help
- Command Line Interface (CLI)
- Cyberduck Quick Reference Cheat Sheet
- Cyberduck Frequently Asked Questions/Troubleshooting
- Mountain Duck Documentation
Google Drive Sync/Google Drive File Stream/Google Drive Desktop¶
Google Drive Sync, also known as Google Drive for Desktop, is a desktop application provided by Google that allows you to access and synchronize your Google Drive files directly from your local computer (e.g., laptop). It provides a convenient way to work with your files stored in Google Drive without taking up local storage space. For example, Google Drive Sync/Google Drive File Stream can be used to transfer and/or synchronize files between Google Drive/bDrive and Box, AWS S3 buckets, or other cloud platforms.
Google Drive Sync allows you to access your Google Drive files offline, make edits, and have them automatically synced back to the cloud when you're connected to the internet. It provides a seamless integration between your Google Drive cloud storage and your local machine, allowing you to work with files as if they were stored locally.
To install and use Google Drive Sync, visit the Google Drive web page here and download Google Drive for Desktop for your operating system (Windows or macOS). As an example, the following guidelines may be helpful using Google Drive Sync/Google Drive for Desktop to transfer and synchronize files between Google Drive and your Box account:
-
Install and set up Google Drive Sync/Google Drive Desktop: Download and install Google Drive Desktop on your computer. Sign in to the application using your Google account credentials.
-
Sync Google Drive to your computer: After setting up Google Drive Sync/Desktop, the application will create a folder on your computer that represents your Google Drive. By default, this folder is named "Google Drive" and is located in your user directory.
-
Sync Box to your computer: Similarly, if you have a Box account, install and set up the Box Sync application on your computer. Sign in to Box Sync using your Box account credentials. This will create a folder on your computer that represents your Box storage.
-
Copy files from Google Drive to Box: With both Google Drive Sync and Box Sync set up, you can now manually copy or move files from your Google Drive folder to your Box folder. Simply locate the files you want to transfer within your Google Drive folder, and either drag and drop them into your Box folder or use copy and paste commands.
-
Sync files to Box: Google Drive Sync and Box Sync applications will automatically sync the transferred files from your local Box folder to your Box cloud storage. The synchronization process may take some time, depending on the file sizes and your internet connection speed.
Please note that this method requires you to have sufficient local storage on your computer to accommodate the files being transferred from Google Drive to Box. Additionally, ensure that you have enough storage space available in your Box account to accommodate the transferred files.
It's important to keep backups of your important files before performing any major file transfers to avoid potential data loss.
Remember to consult the documentation and support resources provided by both Google Drive Sync and Box Sync for more detailed instructions and troubleshooting guidance specific to your operating system and software versions.
For additional guidelines, instructions, and examples on using Google Drive Desktop to transfer and synchronize files between Google Drive and other systems, see:
- Use Google Drive for desktop
- Stream & mirror files with Drive for desktop
- UC Berkeley Department of Statistics -- Using Google Drive (aka bDrive) including automated access
Transferring Data from Google Drive to Google Cloud Storage Using Google Colab Tools¶
Google Colab tools can also be used to transfer data from Google Drive to Google Cloud Storage. The example Colab notebook here, which is a Python Jupyter Notebook, shows how to mount Google Drive using the Google Colab tools, which require authenticating in a separate brower window. The notebook then shows how to authenticate and connect to a Google Cloud Storage project, list all buckets, and start copying files from Google Drive to a Google Cloud Storage bucket.
S3 Browser (for Windows)¶
S3 Browser is a free Windows software application specifically designed for managing and transferring data to and from Amazon S3 (Simple Storage Service), which is a popular cloud storage service provided by Amazon Web Services (AWS), along with Amazon CloudFront. S3 Browser provides a user-friendly interface to browse, upload, download, and manage files stored in your Amazon S3 buckets. The S3 Browser Windows client can be downloaded here.
S3 Browser simplifies the process of working with Amazon S3 by providing a dedicated interface for managing your S3 buckets and performing data transfers. It offers additional features like multi-threaded file transfers, support for large file uploads, and advanced file search capabilities.
Note that S3 Browser is specifically designed for Amazon S3 and may not support other cloud storage providers. To use S3 Browser for transferring data between your computer (e.g., your laptop) and AWS S3 storage, follow these steps:
-
Download and install S3 Browser: Visit the official S3 Browser website and download the application for your operating system. Install S3 Browser by following the provided instructions.
-
Launch S3 Browser: Once installed, launch the S3 Browser application on your computer.
-
Configure AWS credentials: In order to connect to your AWS S3 account, you need to provide your AWS access credentials to S3 Browser. This includes your Access Key ID and Secret Access Key. You can obtain these credentials from the AWS Management Console by following AWS's instructions for creating IAM (Identity and Access Management) users and generating access keys.
-
Connect to your AWS S3 account: Launch S3 Browser and click on the "Add Account" button. Enter a name for your account, specify the Access Key ID and Secret Access Key you obtained from AWS, and select your desired AWS region. Click "OK" to add the account.
-
Browse and manage your S3 buckets: After connecting to your AWS S3 account, S3 Browser will display a list of your S3 buckets. You can browse the buckets and navigate through the folder structure within them.
-
Transfer data to AWS S3: To upload data from your computer to AWS S3, select the bucket where you want to upload the files. Click on the "Upload Files" button and select the files or folders you want to upload from your local computer. S3 Browser will transfer the selected files to the specified location in your AWS S3 bucket.
-
Download data from AWS S3: To download files from AWS S3 to your computer, navigate to the desired file in S3 Browser. Right-click on the file and select "Download" or simply double-click on the file. S3 Browser will initiate the download and save the file to the specified location on your computer.
-
Manage files and folders: S3 Browser provides various features to manage files and folders in AWS S3. You can create new folders, rename files or folders, delete files or folders, copy or move files between folders or buckets, set access permissions, and perform other management tasks.
-
Disconnect from AWS S3: When you have finished transferring data and managing files, you can disconnect from your AWS S3 account by closing the S3 Browser application.
For additional guidelines, instructions, and examples on using S3 Browser to transfer files between AWS S3 storage and other systems, see:
- Uploading and Downloading your files to and from Amazon S3
- Amazon S3 Buckets Overview
- How to create an Amazon S3 Bucket
- How to delete an Amazon S3 Bucket
- How to edit Amazon S3 Bucket Policies
- Share your Buckets with other Amazon S3 users
WinSCP (for Windows)¶
WinSCP is an SFTP and FTP client for Microsoft Windows that Windows users can use to transfer files securely. It can be used to copy files between a local computer and remote servers using FTP, FTPS, SCP, SFTP, WebDAV or S3 file transfer protocols.
Getting Help and Additional Resources¶
Whether you are looking to complete a one time data transfer, implement an automated transfer schedule, or move data to/from/between UC Berkeley systems, please contact research-it-consulting@berkeley.edu for assistance. Also, for further information, guidelines, instructions, and examples on data transfer software commonly used by UC Berkeley researchers, see: