This document describes how to use Globus to transfer data between your computer (or any other computer on the Internet that is acting as a Globus "endpoint" or "collection" and to which you have access) and the Berkeley Savio and Vector clusters. An "endpoint" is simply a location (e.g., laptop, desktop, server, cluster, storage system) that you want to transfer files to or from. A Globus "collection" is a discoverable access point that allows data to be transferred through GridFTP or HTTPS, and contains policies that govern sharing data with other users.
Some locations that you can use to transfer data to/from include your own personal computer, bDrive/Google Drive, Wasabi, Cloudian, AWS, the Secure Research Data and Compute (SRDC) platform at UCB, and some campus departmental facilities such as the Econometrics Laboratory and the Statistical Computing Facility.
Globus is ideal for transferring large files and/or large numbers of files at high speed. It allows you to start an unattended transfer and have confidence that it will be completed reliably; even sending you an email notification when it's done.
The Globus user interface (UI) can be accessed through a browser, such as Google Chrome. It is generally recommended that an “incognito” browser window (e.g., a Google Chrome “incognito” or “private” browser window) be used when accessing the Globus UI so other credentials are ignored. This can also help avoid sending (old) login cookies. Incognito will also automatically forget your passwords when you close out at the end of a session. You can access an Incognito Google Chrome browser by clicking on the three vertical dots in the upper right corner of the browser and clicking “New Incognito Window” in the drop-down menu. You’ll know you are incognito when the address bar turns black at the top of the window and you’ll see the word “Incognito” in the top right of the browser window. For more instructions on this, please see here and here.
For information on how to use Globus, see Globus's Step-by-Step Getting Started Guide.
For more general information on the use of Globus at UC Berkeley (beyond the Savio/BRC Cluster), see our documentation here.
Using Globus with Savio¶
This video walks through the process of using Globus with Savio:
A quick summary for users:
- Visit the Globus website.
- Click the Log In button at upper right and log in using your institutional login (e.g. UC Berkeley or LBNL), or Globus ID. -If this your first time, you can create a free Globus ID here.
- You should land on the File Manager page.
- Turn on two-panel mode by toggling the switch near the top right.
- Use the File Manager page to transfer files:
- Click in the Collection field at left to search for or select an endpoint.
- Search for
ucb#brcin the Collection field. (This is the Globus endpoint for the BRC supercluster, which connects via the supercluster's Data Transfer Node, dtn.brc.berkeley.edu.)
- Select the
ucb#brcentry when it is displayed below. Doing so will return you to the original Transfer Files page. Note that there currently may be multiple Savio Globus endpoints/collections with the same name (
ucb#brcor similar). Users can also do a collections search on "8dfcaab6-fe4e-4376-9430-785631512d4a" (which is the Globus UUID associated with the
ucb#brccollection) in the globus collections search box in the Globus UI, which is labeled as "Managed Mapped Collection on UCB/BRC DTN01 Endpoint" in the globus UI. This is the new Globus Connect Server version 5 (GCSv5) connector and this should work for all users.
- If prompted to authenticate, click the Continue button and follow the on-screen instructions. (In brief, you'll be taken to an LBNL page where you can enter your BRC cluster username, along with your token PIN followed by a one-time password that you will generate via the Google Authenticator application. See Logging In for details on generating that password.) Note that in some instances you may be prompted to re-authenticate first, and you might be given a choice to login with various different identities. In that case it is usually best to login with an identity that includes your username as part of the ID, e.g. email@example.com. You might then be taken to a page where you will be asked to enter your BRC cluster username, along with your PIN followed by a one-time password that you generate via the Google Authenticator application.
- Enter the path for the directory on the BRC cluster to/from which you'd like to copy files, into the Path field at left. (E.g.
/global/home/users/myusernamefor your home directory, where
myusernameis your BRC cluster username. For more details on storage locations, see the relevant User Guide page for the cluster on which you are working.)
- Click in the Collection field at right and repeat the process for your second collection. This second endpoint/collection would often be your own computer, which you would need to set up as an endpoint as described below. (Globus also provides test endpoints,
go#ep2, each of which provides sample files that you can practice transferring to the cluster.)
- In the list of files in either of the two collections/endpoints you have chosen (i.e., in either the left or right panel), select one or more files or folders.
- Click the appropriate "Start" button at the bottom of each collection/endpoint to transfer these files to the other collection/endpoint.
Users can also set up guest collections (also termed shared endpoints in some cases) on a resource that they have access to in order to share data with collaborators that they choose. This gives the collaborator access to specific directories that you choose. The collaborator with whom you share does not need to have an account on the resources. In this way, you can use Globus to share directories with collaborators in your Savio home, scratch, or condo storage directories.
Transferring files between Savio and your personal computer¶
To transfer files between your computer and the BRC supercluster, you'll need to set up Globus Connect Personal. Here's a quick overview of how to install and use it:
- Log into the Globus website. (If you're not already logged in, that is.)
- Visit the Collections page.
- Click on "Get Globus Connect Personal".
- Follow the onscreen instructions on how to set up and install this software. You can also find detailed, official instructions for installing Globus Connect Personal for Mac OS X, Microsoft Windows, and Linux via links on this Globus Support "How To" page.
- Once you've installed Globus Connect Personal, and that software is actively running on your computer, go back to the "File Manager" page and select your new endpoint as above.
Keep in mind that, by default, Globus Connect Personal will only have access to certain folders on your personal or local computer (e.g., your laptop or desktop system), such as your home directory on your Mac, for example. Any other area, such as a network drive (including network attached storage (NAS)) or other external storage device (that can be attached or mounted to your laptop computer or PC, for example), must be enabled within the Globus Connect Personal application. To configure your Globus Connect Personal instance to access other directories on your personal or local computer, or files on an external storage device attached to your computer, see the instructions for
For example, on a Mac laptop, click on the Globus Connect Personal icon, select "Preferences", then select the "Access" menu to adjust what directories Globus can have access to.
In some circumstances, for example when setting up and using Globus Connect Personal on a Windows PC, you may need to have administrator access on the PC to make certain configurations of the Globus Connect Personal endpoint. For example, if you are attempting to configure the Globus Connect Personal endpoint on a Windows PC to access the file system of an attached/mounted NAS file system and you need to map a network drive on the Windows PC for this purpose, it is possible that administrative access will be required in this case. If you are installing and configuring Globus Connect Personal on a Windows system as a non-administrator, then note that by default, Globus Connect Personal prompts to be installed in C:\Program Files. Regular users can not write to this folder. Instead, browse to a place you have write access to (e.g. your Desktop folder).
Note that if you are only using Globus to move files between the Savio cluster and other external computing systems that use Globus (such as another University cluster or national lab cluster), then there is no need to install and use Globus Connect Personal on your personal or local computer.
Transferring files between Savio and bDrive/Google Drive¶
To transfer files between Savio and bDrive/Google Drive, follow these steps:
- In your browser, navigate to app.globus.org.
- Search for University of California, Berkeley.
- You will see the CalNet login screen.
- Log in with your individual CalNet credentials or your Special Purpose Account (SPA) from here. Click here if you need help logging on to a SPA account. WARNING: If your CalNet ID does not match your UC Berkeley email address, you will receive an error when you attempt to log into UCB Google Drive Collections (step 10 below). This will also happen if you are working with a SPA account. Email firstname.lastname@example.org for help.
- Once you log on, you will see a CILogon Information Release page. Generally you can leave the default selection and click Accept. If you have logged into and used Globus before, you can skip to step 9.
- You will see a welcome screen acknowledging that you have Logged in. There is an option to link other accounts if this is not the first time that you are using Globus, but generally you can click Continue.
- In the File Manager, click in the Collection search field, search for "UCB Google Drive Collections", and click on it.
- You will be sent to a page titled "Authentication/Consent Required" if this is the first time you are accessing this collection. You will be asked to consent to allow for Globus services to access data on your behalf and for the Globus endpoint at UC Berkeley to manage your Google credential. Make sure to select Allow and Continue on the corresponding pages that come up to provide consent and register your credential. On another screen you will also be prompted to log into Google and grant permissions for the Globus endpoint operated at UC Berkeley to access data on Google Drive/bDrive (if this is the first time you are accessing this collection.) Be sure if you are logged into more than one account in your browser, that you select the correct one (e.g., email@example.com). See the Globus website here for additional details and examples (including screenshots).
- Once your credentials are registered and permission granted, you can now navigate back to File Manager and see your "My Drive" (by default), and you should see your folders and documents there. You can now just navigate to your bDrive/Google Drive directory/folder of interest in the Globus File Manager and transfer files in similar fashion as described in the instructions above regarding transferring data between Savio and a personal computer (in the "Using Globus with Savio" section).
For additional and more general instructions, examples, and details on using Globus to access your files on (and transfer files to and from) Google Drive or Google Cloud, see the Globus.org documentation here.
Note that for Google Drive, there are "rates limits" on how much data and how many files a user can transfer in any 24 hour period. (Currently, individual users can only upload 750 GB each day between "My Drive" and all shared drives. Users who reach the 750-GB limit or upload a file larger than 750 GB cannot upload additional files that day. Uploads that are in progress will complete. The maximum individual file size that you can upload or synchronize is 5 TB.). For more details on this, please see the Google documentation here. To help to alleviate the restrictions from the rates limits, as well as to keep file counts low (and for easier data retrieval), we recommend that you archive your data using zip or tar. For example, if a folder or subfolder contains a lot of small files, we recommend that you compress the folder or subfolders using zip or tar before the transfer.
Transferring files between Savio and Box¶
To transfer files between Savio and your UC Berkeley Box account, follow these steps:
- If you aren't already logged in to Globus, see the instructions in the previous sections for logging into Globus for the first time (or if you've logged in previously), along with the authentication/consent steps.
- In the File Manager, click in the Collection search field, search for "UCB Box Collections", and click on it.
- If this is the first time you are accessing this collection, you will see a request for Authentication/Consent to allow access within the File Manager. Click Continue.
- On the next screen you will see that an Identity selection is required. Click on your account.
- The following page will confirm the access the Globus Web App will have to your accounts. Click Allow.
- You will be returned to the File Manager where you will see that Authentication is Required for your UC Berkeley Box account. Click Continue.
- On the next screen you will confirm that you want to Register a Credential for Globus to manage. Click Continue.
- On the next screen you will be asked to log in to your UC Berkeley Box account to grant access. Enter your Box account email address and password and click Authorize.
- On the next screen click Grant access to Box to confirm that Globus can access your Box account.
- You will then be returned to the File Manager, where you should be able to now see your UC Berkeley Box folders and documents. You can now just navigate to your Box folders or files of interest in the Globus File Manager and transfer files in similar fashion as described in the instructions above regarding transferring data between Savio and a personal computer (in the "Using Globus with Savio" section).
Transferring files between Savio and Wasabi¶
To transfer files between Savio and your Wasabi account, follow these steps:
- You’ll need to first set up a Wasabi account (if you don’t already have one) before you can use Globus to transfer files between your Wasabi S3 buckets and Savio. One way for UC Berkeley researchers to get started with the process of setting up a Wasabi account is to review the bIT Wasabi Cloud storage page, sending an email to firstname.lastname@example.org requesting information about the UCB Wasbi service, and once your Wasabi account has been set up, following the instructions in the Wasabi Getting Started Guide (for example).
- Now, assuming that you have set up your account and S3 buckets on Wasabi, and if you aren't already logged in to Globus, see the instructions in the previous sections for logging into Globus for the first time (or if you've logged in previously), along with the authentication/consent steps.
- In the File Manager or Globus UI collection search field, look for the collection "UCB S3 Wasabi Data" and click on that when it comes up.
- Then, navigating back to the Globus File Manager for the "UCB S3 Wasabi Data" collection, you can transfer files from the “UCB S3 Wasabi Data” Collection to Savio in similar fashion as described in the instructions above regarding transferring data between Savio and a personal computer. However, if this is the first time you are using the Globus S3 Wasabi Connector, you will be sent to a page titled “Authentication/Consent Required”. You will be asked to consent to allow the Globus endpoint at UC Berkeley to manage your Wasabi credential. The process here is similar to the case of accessing Google Drive/bDrive or Box in Globus as described in previous sections. Click on Continue.
- You will then also be sent to a page which includes the content “LBL Globus Web app would like to manage collections on UCB/BRC DTN01 Endpoint". Click Allow.
- If this is the first time using the Globus S3 Wasabi storage connector and accessing the "UCB S3 Wasabi Data" collection, you will also be sent to a page where you must register your credentials. Here, you enter in the access key ID and secret access key that were generated when you set up your Wasabi account. (See step #1). Click Continue.
- Now, you can navigate back to the “File Manager” page for the “UCB S3 Wasabi Data” collection. All your Wasabi buckets should now be visible in the File Manager and the files within these buckets should be ready to transfer to any other Globus endpoints/collections. You can now access and transfer files between the “UCB S3 Wasabi Data” collection and Savio via the “ucb#brc” endpoint on the Savio DTN. (See step #4).
Transferring files between Savio and AWS S3¶
To transfer files between Savio and your AWS S3 account, follow these steps:
- You’ll need to first set up a UC Berkeley AWS account (if you don’t already have one), or otherwise have your own private AWS account, before you can use Globus to transfer files between your AWS S3 buckets and Savio or another system. One way for UC Berkeley researchers to get started with the process of setting up an AWS S3 account is to navigate to the Berkeley IT bCloud: Private and Public Cloud Services page and scroll down to the section “Get Started with a Public bCloud Service Account”. From there you can request a new account, or move an existing account under consolidated billing by filling out the bCloud AWS Central On-boarding Google Form. You can also contact the bcloud services team at email@example.com with any questions that you have. They will then get in touch to finalize your bCloud AWS account setup. You can also directly contact and consult with Robert Amos (firstname.lastname@example.org), who is the Cloud Operations Manager at UC Berkeley and manages UCB AWS accounts, about costs and pricing, etc.
- Now, assuming that you have set up your AWS account and S3 buckets on AWS S3, note that in order to set up Globus to access an AWS S3 bucket, you’ll need to have an IAM access key ID and a secret key ready to go. Due to Globus’s implementation of the AWS S3 connector (see also here) you can only add a single IAM access key ID and secret key to your Globus configuration. However, you’ll have access to any buckets that the IAM access key ID is configured to have access to. Please also note that for your first time setting up the S3 connector you’ll have to go through various “consent” and “authorization” prompts, some of which are documented below. Giving consent is a standard part of the Globus process whereby you authorize Globus to perform additional privileged operations with the selected endpoint. If you’ve already given permissions to Globus for the S3 connector, you might not see all the consent steps described below.
- Now, assuming that you have set up your account and S3 buckets on AWS S3 and configured the IAM access permissions to those buckets, and if you aren't already logged in to Globus, see the instructions in the previous sections for logging into Globus for the first time (or if you've logged in previously), along with the authentication/consent steps.
- In the File Manager or collection search field under the "Collections" sidebar tab of the Globus UI, look for the collection "UCB AWS S3 Collections" and click on that when it comes up among the list of collections that appear.
- Now, click on the “Credentials” tab of the “UCB AWS S3 Collections” endpoint page. At this point, you may be presented with options for choosing an identity required for access and given a choice to login with various different identities. In that case it is usually best to login with an identity that includes your username as part of the ID (e.g., email@example.com or firstname.lastname@example.org), where "username" is your CalNet User ID user name.
- At this point (assuming that this is the first time you are accessing the UCB AWS S3 collection), you will be asked to consent to allow the Globus endpoint at your institution to manage your AWS S3 credential and to access data on your behalf (and see also the Globus documentation at https://docs.globus.org/how-to/access-aws-s3/ for additional examples and information). Click on the Allow button.
- Here is where you register your AWS IAM access key ID and secret key with Globus by entering the AWS IAM access key ID and secret access key that were generated when you set up your AWS S3 account and buckets (see step #2 above).
- After you’ve entered the AWS IAM access key ID and secret access key, click the Continue button, and you’ll be taken back to the full “Credentials” tab where you can see your saved AWS access credentials.
- At this point you are set up to access the AWS S3 buckets with Globus. Click the “Overview” tab, and then the “Open in File Manager” button to see the AWS S3 buckets and data that are available using your AWS credentials.
- Next, you may need to provide authentication/consent for the Globus transfer service to manage data on this collection. Click Continue.
- On the next screen, you may again be asked to consent to allow Globus services to access data on your behalf. Click Allow.
- You can now use the Globus File manager to access the AWS S3 buckets (and data within) that you set up earlier. Navigating back to the “File Manager” page, all of your AWS S3 buckets and the files within should now be visible, and the files within these buckets should be ready to transfer to any other Globus endpoints/collections. You can now access and transfer files between the "UCB AWS S3 Collections" collection and Savio via the “ucb#brc” endpoint on the Savio DTN in similar fashion as described in the instructions above regarding transferring data between Savio and a personal computer (in the "Using Globus with Savio" section above).
Transferring files between Savio and other locations¶
Instructions for transferring to other locations such as Google Cloud Storage and Cloudian are planned. Please contact us if you'd like help in the meantime.
Automating and Scheduling Data Transfers to and from Savio with Globus¶
Globus provides multiple methods to automate and schedule transfers, including the Globus Timer service (see here also), which allows you to schedule data workflows for a certain date and time with the option to repeat these workflows on a specified schedule. Keep in mind, however, that this service is not supported for high assurance collections at this time, so it can't be used with the Savio ucb#brc endpoint. Savio users need to use the ucb#brc-basic (non-High Assurance) endpoint in order to use the Globus Timer service to transfer data to and from Savio. Keep in mind also that both endpoints/collections need to be non-High Assurance in order for data to be transferred between them using the Globus Timer service.
It is also possible to automate transfers with the Globus service using scripts that make use of the Globus API or the Globus CLI. Examples of such scripts are available here. Globus users also now have an option to use application credentials or service accounts to automate data transfers.