Working with Sensitive Data¶
Establishing a Moderately Sensitive (P3) Project¶
Researchers can use Savio to work with moderately sensitive data—what campus classifies as P3 data. Work on Savio involving P3 data uses separate memory, storage, and scratch resources than standard compute jobs. Before accessing and uploading data to the secured Savio partition, the project PI or manager must request and set up a P3 storage directory. If you are working with (sensitive) data that is subject to a data use agreement, we can work with you when setting up your account to confirm whether or not the Savio environment fits the requirements of your project’s data use agreement. Please note that the user is ultimately responsible for complying with the requirements of their data use agreement. We ask that users follow the policies described here in addition to a data use agreement if required, as these are the policies we have established for the P3 service on Savio.
Highly Sensitive Data
Savio is not appropriate for highly sensitive (P4) data. Please see the Secure Research Data and Computing web page for information on our service that supports researchers working with highly sensitive data.
Getting an Account to Work with Sensitive Data¶
To start the process of creating a Savio P3 project, see Accounts for Sensitive Data. Once you have created your project, you’re ready to move your data into Savio.
Storing Sensitive Data¶
Each Savio P3 project will be provided with a project directory in /global/home/groups and a P3 scratch directory with a 1 TB quota in /global/scratch/p2p3to use for storing sensitive data. Researchers can buy additional condo storage space if they need additional space. The condo storage space will also be available at /global/scratch/p2p3. Sensitive data (i.e., P3 data) must be stored in either the Savio P3 project or scratch directories. Researchers should not store sensitive data—either source or derived—in their standard Savio user home directories.
Users are discouraged from setting up backups of any sensitive data. If a user chooses to arrange for backups or other copies outside the Savio environment, it is the sole responsibility of the user to comply with all requirements related to data security, encryption, and deletion. Research IT and the campus Information Security Office (ISO) can help provide guidance on compliance with requirements that call for backups or disaster recovery.
Transferring Sensitive Data¶
The transfer of P3 data into Savio scratch will generally either be from the Savio P3 project space in the Savio environment, or from an external data source. The data must be transferred into the Savio P3 designated scratch space by a user approved on the associated Savio P3 project. We recommend using the campus Globus endpoints to transfer P3 data to and from the Savio cluster, which transfers data in encrypted form and is covered at the campus level by a high-assurance BAA security subscription.
Computing with Sensitive Data¶
Working with P3 data on Savio looks just like typical work on the cluster—the data are just stored on a secured filesystem. Analysis of data and derivatives is the active phase of research and may last for days, weeks, or even longer. It is not necessarily a single job on the cluster, and may represent a series of jobs in a workflow. Certain steps in a research workflow may involve review of intermediate results, and subsequent resumption of analytic or other processing step; all data, derivatives, and/or intermediate results must remain within the secure storage directories during the active phase of research. Decryption of any P3 data which are additionally stored with software (file-level) encryption should only be performed on a compute node (e.g., during the lifetime of a copute batch job), and not on a login node.