As of 2019, Savio users can set up projects to work with moderately-sensitive data. Moderately-sensitive data includes the P2 and P3 (as well as NIH dbGap data) data security classifications as defined by UC Policy (IS-3) and documented by the campus Information Security Office. Moderately sensitive data used to be described as “PL1” data. For more information on UC Berkeley data classification protection levels, please see here.
Note that researchers must first have one of the regular Savio accounts. The P2/P3 storage will be associated with a Faculty Computing Allocation (FCA), Instructional Computing Allocation (ICA) or Condo computing allocation. Each project will be provided a separate group directory and project users will be provided with a scratch directory where sensitive data can be stored.
Support for P2/P3 data in Savio and AEoD is an important step towards broader support for secure and sensitive data, as part of our Secure Research Data and Compute (SRDC) initiative.
Here are the steps involved in setting up a project for working with sensitive data:
- Researchers can request a P2/P3 Project once they have a regular FCA or Condo account. (For information about creating these accounts, see these instructions).
- The research group should consult with Research Data Management (RDM) to determine whether Savio is an appropriate service based on the sensitivity of the data and computational needs. Send email to email@example.com to start this process. This may be initiated by the PI or someone else in the PI’s group.
- Complete the Savio P2/P3 Project Request form which asks for the following information:
- A description of the kind of P2/P3 data the research group is planning to work with on Savio. Please include: (1) Dataset description (2) Source of dataset (3) Security & Compliance requirements for this dataset(s) (4) Number and sizes of files (5) Anticipated duration of usage of datasets on Savio.
- A list of names of the Research IT or Information Security Policy (ISP) team members with whom the research group has discussed this data/project.
- A description of the project PI and/or UCB Faculty member who will be responsible for this project, including the PI’s name and contact information, along with a single word name to identify the project.
- A yes or no answer as to whether the P2/P3 project PI already has an FCA or Condo allocation, and if yes, whether or not all of the (multiple) faculty or PI associated with the pooled allocation are involved in the P2/P3 data project request.
- If the existing Savio project is a pooled allocation and not all PI/faculty in the pool are working with the P2/P3 data, a new project will be set up for the P2/P3 data and a name between 4-8 characters in length must be specified.
- A Research IT consultant from RDM or BRC will contact the requestor to acknowledge the form has been received and to ask any questions needed to determine whether Savio is the appropriate data management and computation platform.
- The PI will be asked to review and sign a Researcher Use Agreement (RUA) that outlines the PI’s responsibilities for using Savio with their sensitive research data.
- Research IT staff review these agreements with the PI. Both the PI and a responsible Research IT party sign and submit the RUA.
- Once approved, RIT/BRC staff will set up the appropriate storage locations:
- For Faculty Compute Allowance accounts, each user will be given access to a P2/P3 group directory with a 30 GB quota limit, i.e., each P2/P3 project will get a group folder in /global/home/groups/pl1data/ on the home directory server.
- If the PI is a Condo owner (has contributed compute nodes to Savio) then they would be given access to a 200 GB P2/P3 group folder under /global/home/groups/p1data.
- Each P2/P3 user also gets access to a directory in the P2/P3 scratch space located at /global/scratch2/pl1data/. Note this is separate from their scratch directory that will be set up for their non-sensitive data (located at /global/scratch/).
- Note that there are restrictions on how these directories must be used. For example, NIH DbGAP data stored in the P2/P3 group/project directories must be encrypted. Sensitive data can only be stored unencrypted in the associated scratch directory. More information about the handling and use of encrypted data, including NIH data, can also be found in the Researcher Use Agreement, as well as the document NIH Active Research workflow for NIH data for Savio, along with our documentation on sensitive data here.
- A Savio User Management spreadsheet will be created for the PI to specify which users should be members of the group and have access to the resources.
- The project PI is responsible for adding all P2/P3 users within their research group to the Savio User Management spreadsheet. Once a new P2/P3 user is added to the spreadsheet, BRC Savio administrators will process this request to add a new user (and the user account will be added to the file permissions for the group directory/folder where the P2/P3 data is stored), and provide updates. BRC Savio administrators will confirm approval of requests via email before provisioning account access to the restricted folder.
- All P2/P3 projects are set up with a special Unix group. P2/P3 users should use this Unix group and set the directory group ownership and permissions appropriately to limit access to P2/P3 datasets and files to relevant users only.
- P2/P3 project researchers must notify cluster administrators when a user is no longer a member of the project team using the covered system. The Savio User Management spreadsheet will be edited accordingly and the Savio accounts of those who are no longer active users will be deactivated.
- Principal Investigators are responsible for monitoring account access to P2/P3 data within the Savio environment. The Savio team will provide PIs with Linux command line syntax for checking group directory permissions to verify which user accounts have access. If changes are necessary, the PI will submit a request to BRC administrators to add or remove account access. These responsibilities are delineated in the Researcher Use Agreement (RUA) for using P2/P3 data in the Savio HPC environment.
- RIT will conduct a semi-annual email-based confirmation of active users.
- The PI and group are notified by RIT/BRC staff that everything is set up.