Working with Sensitive Data¶
Increasingly, data security frames our conversations about research data. Data security protects against the loss of confidentiality, integrity and availability of data. Your data are valuable and Research IT can help you take steps in order to protect them and the campus. We, as members of the campus workforce, are all responsible for information security. (New policies and practices are coming soon!)
This page will help you determine the sensitivity of your data and what controls need to be in place to properly secure it. It also describes some of the protected environments that are available to you at UC Berkeley.
1. Determining the sensitivity of your data¶
Agreements and governing frameworks¶
The first step in determining the level of security appropriate for your data is to understand the agreements, regulations, and other considerations that govern how your data must be protected. These can include:
Committee for the Protection of Human Subjects (human subjects, "IRB") protocols
Data use agreements
Federal and state laws and regulations (e.g. FERPA, HIPAA)
Adhering to campus data classification standards (P1, P2, P3, P4)
General Data Protection Regulation (GDPR) for subjects located in the EU and similar regulations for other countries
Intellectual property concerns
Privacy and ethical concerns (e.g. confidentiality, vulnerable research subjects)
If you are producing data¶
Consider the elements of your data:
What types of information do you collect? Think about all of the collection methods you will use, from machines, surveys, etc.
Are any elements personally-identifiable, confidential, or restricted use that could be "notice-triggering" (see, for example, [this UC Santa Cruz webpage on Personal Identity Information]) (https://its.ucsc.edu/security/pii.html)?
What protections do your informed consent agreement and Committee for the Protection of Human Subjects (IRB) protocol guarantee? This will include how you will gather consent, collect, store, and share the data.
Have you signed or made any other agreements that control the security that's needed to protect the data?
If you are using existing data¶
Often there is an agreement between you and whoever owns the data. A common form is a data use agreement that specifies the terms under which you can and cannot use the data and the protections that are required. The data owner could be:
A federal agency (e.g., National Institutes of Health, Centers for Medicare and Medicaid Services, Department of Education, US Census Bureau)
A state agency (e.g., Department of Public Health or Social Services)
Local police, fire, corrections or other office
Private companies, such as health care providers, marketing firms, social media platforms, research organizations, research & development units
Publishers, archives, and other third-parties who have curated the data or otherwise organized the data in a useful form
Typically, it is the superset of all of the above factors that define the protections that you need to put in place. At a minimum, the campus data classification standard sets the bar. Research IT can help you understand how your data are being governed and how to protect them.
Classifying your data at UC Berkeley¶
UC Berkeley's Information Security Office provides a standard for classifying data sensitivity. This also encompasses the equipment on which the data are stored and computed. The data classification standard defines four protection levels, P1 - P4 (formerly PL0-PL3). The levels are characterized by the adverse impact that would be caused by unwanted disclosure or modification of the data. Notice how the frameworks discussed above -- IRB protocols, HIPAA rules, and the GDPR protections, for example -- all inform the standard.
The security office web site offers a step-by-step guideline for classifying data and a separate guide focusing on research data. Both are decision trees to help researchers identify the level appropriate to their data. Once you understand the agreements, regulations, and other frameworks that govern the sensitivity of your data, step through this guideline to determine its correct classification. Research IT can navigate this process with you and help talk to the Information Security Office if needed.
2. Securing your computing and storage systems¶
Knowing the appropriate classification for your data, you can properly secure your research environment. Again, the Information Security Office provides direction via the Minimum Security Standards for Electronic Information (MSSEI). These standards define the technical, administrative, and physical controls that must be in place for systems that handle data at each protection level. If controls are specified in a Data Use Agreement or other document that go beyond what campus requires, they can be added to the campus's baseline.
The MSSEI also defines three categories of device or use: Individual, Privileged and Institutional. A workstation or laptop that handles a small or moderate number of records would generally be considered an Individual device. A machine on which the user must have administrator ('root') privileges is classified as a Privileged device. A server on which more than a thousand records are processed is deemed to be an Institutional device. Each ascending category carries a greater set of protection requirements.
Armed with the protection level (from the Data Classification Standard) and the device/use category, you can now move to the MSSEI Baseline Data Protection Profile Summary to see a list of the required controls, as shown below.
Controls -- nearly four dozen in all -- range from removal of data that is not needed at the moment to security incident response plans and incident reporting training. The MSSEI provides detailed information about each.
Assessing your environment can be a daunting process to navigate. Research IT consultants have guided researchers through the process for years. Please contact us.
3. Resources for working with sensitive data¶
Research IT provides the following services for researchers working with data of varying sensitivity:
Highly sensitive compute and storage¶
The Secure Research Data and Compute (SRDC) platform is designed for researchers working with highly sensitive (P4) data. The SRDC platform includes high performance computing and virtual machines that allow interactive computing in a familiar desktop environment. Secure storage is available with both. Researchers working with medical information protected under HIPAA, biometric data used for authentication, genetic data, or government issued ID numerical data are strong candidates for SRDC.
SRDC virtual machines are available now. Contact Research IT to get started. The SRDC HPC cluster will be online in 2021.
Moderately sensitive compute and storage¶
High Performance Computing¶
Savio, UC Berkeley's high performance computing cluster, is available for P1 data. Researchers may request a separate account to use it for moderately sensitive (P2/P3) data. For example, researchers working with de-identified public health or human genetic data are strong candidates for Savio, especially if their analyses can run in parallel or benefit from access to a large number of processors.
Analytics Environment on Demand (AEoD), a Windows-friendly virtual machine service, may be used for computation over moderately sensitive (P2/P3) data. Researchers performing data analytics or geospatial analysis with moderately sensitive individually-identifiable human subjects research data, public health information, or data related to animal research are strong candidates for AEoD. AEoD is most suitable for researchers needing interactive computing using common software packages (Stata, ArcGIS, RStudio, etc.) in a familiar desktop environment. AEoD virtual machines are scaled to meet computational needs, and are available in different sizes, from 2 - 20 cores, 4 - 256 GB RAM plus performant storage.
Research IT consultants are happy to meet with you to understand your data and computing needs and help match you to appropriate resources. Visit our office hours or get in touch to schedule an appointment.