Working with Sensitive Data¶
Establishing a Moderately Sensitive (P2/P3) Project¶
Researchers can use Savio to work with moderately sensitive data - what campus classifies as P2/P3 data. To do so, you must create a special project, which uses separate memory, storage and scratch resources than other Savio jobs. You must set up your project before you upload your data.
Savio has been architected to comply with the requirements of many data use agreements. We can work with you when setting up your account to ensure the environment fits the requirements of your project’s data use agreement. Please note that the user is ultimately responsible for complying with the requirements of their data use agreement. In addition, we ask that users follow the policies described here even if not strictly required by the user's data use agreement, as these are the policies we have established for the P2/P3 service on Savio
Highly Sensitive Data
Savio is not appropriate for highly sensitive (P4) data. Please see the Secure Research Data and Compute web page for information on our service that supports researchers working with highly sensitive data.
Getting an Account to Work with Sensitive Data¶
To start the process of creating a P2/P3 project, see Accounts for Sensitive Data. Once you have created your project, you’re ready to move your data into Savio.
Storing Sensitive Data¶
Each P2/P3 project will be provided with a project directory in `/global/home/groups` and a P2/P3 scratch directory with a 1 TB quota in `/global/scratch/p2p3`. Researchers can buy additional condo storage space if they need additional space. The condo storage space will also be available at `/global/scratch/p2p3`.
Sensitive data, including NIH dbGap data, must be stored in either the P2/P3 project or scratch directories. Researchers should not store sensitive data -- either source or derived -- in their user home directories. In addition, as discussed below, unencrypted data should never be present in the project directory.
Users are discouraged from setting up backups of any sensitive data. If a user chooses to arrange for backups or other copies outside the Savio environment, it is the sole responsibility of the user to comply with all requirements related to data security, encryption, and deletion. Research IT and the campus Information Security Office can help assess compliance with your granting agency requirements.
Encrypting Sensitive Data¶
The Researcher Use Agreement (RUA) for using P2/P3 data in the Savio HPC environment includes notes that defines the researcher as responsible for encryption of data.
No covered data in unencrypted form may ever be stored (even temporarily) in project directories, as these are not on self-encrypting drives.
Software (file-level) encryption or equivalent is required for all covered data in a long-term storage environment, and when the data is not being actively used in a research workflow. Researchers must agree that to prevent unauthorized access to covered data, unencrypted data must not be left in the system during vacations, significant absences, or other periods of inactivity.
Transferring Sensitive Data¶
The transfer of encrypted data into Savio scratch will generally either be from the P2/P3 project space in the Savio environment, or from an external data source. The data must be transferred into the P2/P3 designated scratch space for a user approved on the associated P2/P3 project.
Computing with Sensitive Data¶
Decryption of encrypted data must be done as a normal compute job run according to the RUA (in particular, not run on a login node). Decrypted data must be stored in the P2/P3 designated scratch space for a user approved on the associated P2/P3 project.
Analysis of data and derivatives is the active phase of research and may last for days, weeks, or even longer. It is not necessarily a single job in the cluster, and may represent a series of jobs in a workflow. The scheduling of these jobs is subject to scheduling constraints, resource availability, and other factors that the researcher cannot control. Certain steps in a research workflow may involve review of intermediate results, and subsequent resumption of analytic or other processing steps. So long as a researcher is in an active phase of ongoing work, data may remain in an unencrypted form in the P2/P3 scratch area.
For P2/P3 data sets in which the overhead for decryption would amount to a small percentage of the total compute time (e.g., less than 5% of the total), the data should be decrypted at the beginning of the computational workflow, and then any unencrypted data should be be deleted as a clean-up task as soon as the associated computational workflow completes (noting that a computational workflow may consist of a series of compute jobs).
Important
As soon as the research workflow comes to an end, or if the researcher pauses or suspends the workflow for any substantive period (as determined by the PI), all unencrypted data must be deleted from the scratch filesystem. For data covered under the NIH guidelines for cloud providers, see also the document NIH Active Research workflow for NIH data for removing unencrypted data from scratch.
Alert
If you are interested in using Open OnDemand for browser-based access to Jupyter notebooks, RStudio, and virtual desktop with P2/P3 data, please contact us.