Globus at UC Berkeley¶
Globus is a data transfer and storage service that allows one to easily and quickly move data between different resources (e.g., a personal computer, the Savio campus cluster, bDrive, and various others) and to share data on those resources with others. UC Berkeley has a Globus subscription (i.e., "High Assurance and HIPAA/BAA subscription") that provides a wide variety of functionality to all campus affiliates.
The Globus user interface (UI) can be accessed through a browser, such as Google Chrome. It is generally recommended that an “incognito” browser window (e.g., a Google Chrome “incognito” or “private” browser window) be used when accessing the Globus UI so other credentials are ignored. This can also help avoid sending (old) login cookies. Incognito will also automatically forget your passwords when you close out at the end of a session. You can access an Incognito Google Chrome browser by clicking on the three vertical dots in the upper right corner of the browser and clicking “New Incognito Window” in the drop-down menu. You’ll know you are incognito when the address bar turns black at the top of the window and you’ll see the word “Incognito” in the top right of the browser window. For more instructions on this, please see here and here.
For information on how to use Globus, see Globus's Step-by-Step Getting Started Guide.
Data Transfer Between Various Resources¶
With Globus, you can transfer data between various resources. These include:
- Your personal computer, using Globus Connect Personal
- A lab or shared computer (either using Globus Connect Personal, setting up the computer as a Globus endpoint, and/or setting up Globus Connect Server version 5 (GCSv5) on the computer to share collections with users)
- Savio, including home directories, scratch directories, and condo storage
- Research IT’s SRDC secure computing environment
- Departmental computing facilities, including the Statistical Computing Facility and Econometrics Laboratory
- An AEoD virtual machine, using Globus Connect Personal
- bDrive/Google Drive
- Various other cloud platforms, including Wasabi, AWS, and Google Cloud Storage
Sharing Your Data with Collaborators¶
Users can set up guest collections (also termed shared endpoints in some cases) on a resource that they have access to in order to share data with collaborators that they choose. This gives the collaborator access to specific directories that you choose. The collaborator with whom you share does not have to have an account on the resources. This includes:
- Sharing directories in your Savio home, scratch, or condo storage directory
- Sharing folders in bDrive / Google Drive
- Sharing directories/folders on various cloud resources, including Wasabi and Google Cloud Storage
UC Berkeley’s Globus High Assurance and HIPAA/BAA subscription also allows you to set up guest collections (for sharing data) on your personal or lab computer that are discoverable by other Globus users. We can upgrade any user at UC Berkeley to Globus Plus as part of the subscription. Globus Plus users can, for example, create guest collections on their personal computers (using Globus Connect Personal) and transfer files between Globus Connect Personal endpoints (e.g., on their personal computers). Moreover, Globus Plus allows Globus Connect Personal to be used on endpoint devices as identifiable endpoints. The Globus subscription also allows researchers, for example, to install Globus Connect Server (version 5) on a lab computer so they could set up endpoints and collections that others could find and access, and so share data with other researchers. Note that you do not need Globus Plus if transferring files to/from a Globus server endpoint (e.g. your campus cluster or supercomputing center – such as Savio), or if you want to share files from a Globus server endpoint.
When do you need Globus Plus?
- To transfer data between two Globus Connect Personal endpoints (e.g. your desktop system and a laptop).
- To share data from a Globus Connect Personal endpoint (e.g. your laptop or a desktop system).
- To transfer data from/to a guest collection that is hosted on a Globus Connect Personal endpoint.
When do you not need Globus Plus?
- To transfer or share data between two Globus managed endpoints (e.g. two multi-user systems at different universities, each running a Globus server).
- To transfer data between a managed endpoint (e.g. Savio) to a Globus Connect Personal endpoint (e.g. your desktop).
Note that if your collaborator needs Globus Plus to download data, and is not at UC Berkeley, we cannot provide Globus Plus to that person.
Also note that, by default, files on a Globus Connect Personal endpoint (e.g. your laptop or desktop) may not be shareable. You will need to configure that via the instructions at these links: Mac, Windows, Linux.
If you need to upgrade your Globus account to Globus Plus, please contact us at firstname.lastname@example.org to request a Globus Plus invite. For more details see “What is Globus Plus and How do I get Globus Plus?”
Getting help with Globus¶
Follow these links for help with:
- Downloading and installing Globus Connect Personal (free)
- Using Globus to transfer data to/from Savio
- Using Globus with your personal computer
- Using Globus with bDrive and other resources
- How to Access Your Files on AWS S3 with Globus
For the following, please contact email@example.com:
- Setting up a lab computer/facility as an endpoint that can manage collections accessible and discoverable by collaborators.
- Setting up Globus Connect Personal to connect two personal computers to transfer data between or to share data with collaborators.
UCB Globus Collections/Endpoints¶
UC Berkeley’s Globus High Assurance and HIPAA/BAA subscription provides users with access to premium storage connectors (which support storage systems such as Google Drive, AWS S3, Google Cloud Storage, Wasabi, and Cloudian), and thus mapped collections, which are created by the endpoint administrator. Access to mapped collections requires an account on the endpoint’s host system. Mapped collections created on storage gateways that are flagged for high-assurance data are automatically configured for use with protected data.
The endpoint administrator specifies the identities required for access and an authentication assurance timeout period. If a user attempts access without having authenticated as required within the timeout period, the user will be prompted to authenticate with the required identity.
The steps for discovering and using mapped collections to access data are described in the Globus how-to guide, "Find and use a mapped collection from the Globus web app".
The names of the mapped collections users should use to transfer data and set up guest collections for the various services/connectors are specified in the table below. Users can search for the collections within the Globus web app when they wish to transfer files among collections (and to set up guest collections), including their own Globus Connect Personal endpoint. Users can also bookmark these for later reuse without having to search for them.
|UCB System||UCB Globus Collection/Endpoint Name(s)|
UCB BRC Savio Posix Data
|Google Cloud Storage||UCB Google Cloud Storage Collection|
|Wasabi||UCB S3 Wasabi Data|
|Amazon Web Services (AWS)||UCB AWS S3 Collections|
|Google Drive (bDrive)||UCB Google Drive Collections|
|Cloudian Storage||UCB Cloudian Storage Collections|
|Box||UCB Box Collections|
Note that the ucb#brc and UCB BRC Savio Posix Data endpoints/collections point to the same data on Savio. Keep in mind also that there currently may be multiple Savio Globus endpoints/collections with the same name (
ucb#brc or similar). Users can also do a collections search on "8dfcaab6-fe4e-4376-9430-785631512d4a" (which is the Globus UUID associated with the
ucb#brc collection) in the globus collections search box in the Globus UI, which is labeled as "Managed Mapped Collection on UCB/BRC DTN01 Endpoint" in the globus UI. This is the new Globus Connect Server version 5 (GCSv5) connector and this should work for all users.
Also note that in some instances, when users are trying to access Savio Globus collections or the other endpoints/collections in the table above, they may be prompted to re-authenticate first, and they might be given a choice to login with various different identities. In that case it is usually best to login with an identity that includes the username as part of the ID, e.g. firstname.lastname@example.org.
Automating and Scheduling Data Transfers with Globus¶
Globus provides multiple methods to automate and schedule transfers, including the here also), which allows you to schedule data workflows for a certain date and time with the option to repeat these workflows on a specified schedule. Keep in mind, however, that this service is not supported for high assurance collections at this time, so it can't be used with the Savio ucb#brc endpoint or with the other mapped collections listed in the table in the previous section (with the exception of "UCB AWS S3 Collections", which is a non-High Assurance collection). Savio users need to use the ucb#brc-basic (non-High Assurance) endpoint in order to use the Globus Timer service to transfer data to and from Savio. Keep in mind also that both endpoints/collections need to be non-High Assurance in order for data to be transferred between them using the Globus Timer service.
It is also possible to automate transfers with the Globus service using scripts that make use of the Globus API or the Globus CLI. Examples of such scripts are available here. These scripts can also be run on a schedule using cron or some other system scheduler service.