Establishing a Moderately Sensitive (P2/P3) Project on Savio

Researchers can use Savio to work with moderately sensitive data - what campus classifies as P2/P3 data. To do so, you must create a special project, which uses separate memory, storage and scratch resources than other Savio jobs. You must set up your project before you upload your data.

Savio has been architected to comply with the requirements of many data use agreements. For example, Savio’s design and its sensitive data policies are in line with NIH guidelines for cloud providers and satisfy NIH requirements for storage and handling of sensitive data. We can work with you when setting up your account to ensure the environment fits the requirements of your project’s data use agreement. While Savio infrastructure supports P2/P3 data, users must use it correctly.

Savio is not appropriate for highly sensitive (P4) data. Please see Secure Research Data and Compute for information on the campus initiative to support researchers working with highly sensitive data.

Getting an Account to Work with Sensitive Data

To start the process of creating a P2/P3 project, see Accounts for Sensitive Data. Once you have created your project, you’re ready to move your data into Savio.

Secure Savio workspaces

Every Savio P2/P3 project is provided a secure group directory and each project user is allotted a secure scratch directory as well. All data used or created by the project must go within one of these two locations.

The following chart shows which actions a user may take on various Savio workspaces. These actions are explained more fully in the paragraphs below.

Actions allowed on Savio workspaces

Workspace Non-secure
HOME DIR
Non-secure scratch directory, non-secure group dir SECURE GROUP DIR SECURE
SCRATCH DIR
Example path /global/home/USERNAME /global/scratch, /global/home/group/GROUPNAME .../group/GROUPNAME /global/scratch2
Direct ingest of sensitive data No No Only if encrypted Only if encrypted
Run job to process sensitive data while in decrypted state No, not enough performance, content is not secure No No Yes
Store unencrypted content while being "active" in workflow No No No Yes, but files that can be decrypted quickly should be cleaned up when computational workflow ends.
Store sensitive data in while research is paused No No Only if encrypted Only if encrypted
Notes Users are responsible for carefully managing file access permissions.

Storing Sensitive Data

Sensitive data, including NIH dbGap data, may only be stored in encrypted form in the shared group directory assigned to your P2/P3 project and in a user’s secure scratch directory. Data may be stored in unencrypted form on the user’s secure scratch directory while research is active. (See ‘Encrypting Sensitive Data’, below.) This applies to both source data and derived data. Researchers should not store sensitive data -- even if encrypted -- in their user home directories or scratch space in their non-P2/P3 project environment.

Users are discouraged from setting up backups of any sensitive data. If users choose to arrange for backups or copies outside the Savio environment, they are responsible for proper data security, encryption, and deletion. Research IT and the campus Information Security Office can help assess compliance with your granting agency requirements.

Encrypting Sensitive Data

Sensitive data may never be stored (even temporarily) in unencrypted form in the shared group directory associated with your project. Software (file-level) encryption or equivalent is required for data in this directory.

So long as the researcher is actively engaged in analysis of data and derivatives -- a process which may extend for days and weeks, or even longer -- sensitive data may remain in unencrypted form in the scratch area. As soon as the research workflow comes to an end, or if the researcher pauses or suspends the workflow for any substantive period (such as vacation, significant absence, or other periods of inactivity), all unencrypted data must be deleted from the scratch filesystem. To prevent unauthorized access to sensitive data in the scratch filesystem, do not leave unencrypted data in the system.

It is the responsibility of the user to remove data from the secure scratch filesystem at the end of the active phase of work.

Staging and Transferring Sensitive Data

Data may be staged in encrypted form in the shared group directory prior to use. Data may also be transferred directly to scratch, in encrypted form, from an external source. The data must be transferred into the designated scratch space for a user approved on the project. Only approved users are allowed to work with sensitive data.

Computing with Sensitive Data

Sensitive data require decryption for use in active research workflows. These data must be decrypted at the beginning of the computational workflow. Decryption may only be done on the user’s secure scratch directory.

Current research projects often involve 10 to 100TB of data or more, and both the size and number of datasets are increasing at a steady pace. Further, it can take several hours per TB to decrypt files. Given all this, it is not reasonable to decrypt datasets of this size before each computing job. For this reason, we allow researchers to keep their data decrypted on scratch storage throughout the active research workflow.

However, for smaller data sets in which the overhead for decryption would amount to a small percentage of the total compute time (e.g., less than 5% of the total), any unencrypted data must be deleted as a clean-up task as soon as the associated computational workflow completes (noting that a computational workflow may consist of a series of compute jobs).

The following Savio nodes are NOT permitted for researchers working with sensitive data:

  • Visualization node, used to host interactive RealVNC sessions for any user needing more than a command line interface
  • JupyterHub node, used to host web-based iPython notebook sessions.

Tags: hpc All Tags