Condo Cluster Service¶
Summary
BRC manages Savio, the new high-performance computational cluster for research computing. Designed as a turnkey computing resource, it features flexible usage and business models, and professional system administration. Unlike traditional clusters, Savio is a collaborative system wherein the majority of nodes are purchased and shared by the cluster users, known as condo owners.
The model for sustaining computing resources is premised on faculty and principal investigators purchasing compute nodes (individual servers) from their grants or other available funds which are then added to the cluster. This allows PI-owned nodes to take advantage of the high speed Infiniband interconnect and high performance Lustre parallel filesystem storage associated with Savio. Operating costs for managing and housing PI-owned compute nodes are waived in exchange for letting other users make use of any idle compute cycles on the PI-owned nodes. PI owners have priority access to computing resources equivalent to those purchased with their funds, but can access more nodes for their research if needed. This provides the PI with much greater flexibility than owning a standalone cluster.
Program Details¶
Compute node equipment is purchased and maintained based on a 5-year lifecycle. PIs owning the nodes will be notified during year 4 that the nodes will have to be upgraded before the end of year 5. If the hardware is not upgraded by the end of 5 years, the PI may donate the equipment to Savio or take possession of the equipment (removal of the equipment from Savio and transfer to another location is at the PI's expense); nodes left in the cluster after five years may be removed and disposed of at the discretion of the BRC program manager
Once a PI has decided to participate, the PI or his designate works with the HPC Services manager and IST teams to procure the desired number of compute nodes and allocate the needed storage. There is a 4-node minimum buy-in for any given compute pool and all 4 nodes must be the same whether it be the Standard, HTC, Bigmem, or GPU nodes. GPU nodes are the most expensive; therefore, if a group has already purchased the 4-node minimum of any other type of node, they can purchase and add single GPU nodes to their Condo. Generally, procurement takes about three months from start to finish. In the interim, a test condo queue with a small allocation will be set up for the PI's users in anticipation of acquiring the new equipment. Users may submit jobs to the general queues on the cluster using their Faculty Computing Allowance. Jobs are subject to general queue limitations and guaranteed access to contributed cores is not provided until purchased nodes are provisioned.
Hardware Requirements¶
Note
Requirements for Condo Participation (Updated May 29, 2019)
Condo contributors are required to also purchase a 2M EDR InfiniBand cable for each node purchased (currently priced at $100 per cable).
Warranty
Each system has a warranty of 5 years
Basic specifications for the systems listed below:
General Computing Node (96 GB RAM) | |
---|---|
Processors | Dual-socket, 20-core, 2.1 GHz Intel Xeon Gold 6230 processors (40 cores/node) |
Memory | 96 GB (6x 16GB) 2666 Mhz DDR4 RDIMMs |
Interconnect | 100 Gb/s Mellanox ConnectX5 EDR Infiniband interconnect |
Hard Drive | 1TB 7.2K RPM SATA HDD (Local swap and log files) |
Notes | These come in sets of 4, and the minimum buy-in is 4 nodes |
Current Approximate Price (with tax) | $24,970 for a Dell C6420 chassis with 4 nodes + $580 for 4 ea. EDR 2M cables |
Big Memory Computing Node (384 GB RAM) | |
---|---|
Processors | Dual-socket, 16-core, 2.1 GHz Intel Skylake Xeon 6130 processors (32 cores/node) |
Memory | 384 GB (24 x 64 GB) DDR4 memory |
Interconnect | 100 Gb/s Mellanox ConnectX5 EDR Infiniband interconnect |
Hard Drive | 2 TB SSD (Local swap and log files) |
Notes | These come in sets of 4, and the minimum buy-in is 4 nodes |
Current Approximate Price (with tax) | Please contact us for a current quote |
Very Large Computing Node (1.5 TB RAM) | |
---|---|
Processors | Dual-socket, 16-core, 2.3 GHz Intel Cascade Lake Xeon 5218 processors (32 cores/node) |
Memory | 1.5 TB (6x 16GB) 2666 Mhz DDR4 RDIMMs |
Interconnect | 100 Gb/s Mellanox ConnectX5 EDR Infiniband interconnect |
Hard Drive | 2 TB SSD (Local swap and log files) |
Notes | These can be purchased one by one, but the minimum buy-in is 2 nodes |
Current Approximate Price (with tax) | $18,900 per node + $100 for 1 ea. EDR 2M cable |
GPU Computing Node Option 1 (V100) | |
---|---|
Processors | Dual-socket, 4-core, 2.6Ghz Intel Silver 4112 processors (8 cores/node) |
Memory | 192 GB (4 X 16 GB) 2400 Mhz DDR4 RDIMMs |
Interconnect | 100 Gb/s Mellanox ConnectX5 EDR Infiniband interconnect |
GPU | 2 ea. Nvidia Tesla V100 accelerator boards with NVLink |
Hard Drive | 500 GB 10K RPM SATA HDD (Local swap and log files) |
Notes | These can be purchased one by one, and the minimum buy-in is one node |
Current Approximate Price (with tax) | $25,500 for a single node + $100 for 1 ea. EDR 2M cable |
GPU Computing Node Option 2 (2080 ti) | |
---|---|
Processors | Dual-socket, 4-core, 2.6Ghz Intel Silver 4112 processors (8 cores/node) |
Memory | 96 GB (4 X 16 GB) 2400 Mhz DDR4 RDIMMs |
Interconnect | 100 Gb/s Mellanox ConnectX5 EDR Infiniband interconnect |
GPU | 4 ea. Nvidia Geforce RTX 2080Ti accelerator boards |
Hard Drive | 512 GB SSD (Local swap and log files) |
Notes | These can be purchased one by one, but the minimum buy-in is 2 nodes |
Current Approximate Price (with tax) | $11,600 for a single node + $100 for 1 ea. EDR 2M cable |
Hardware Purchasing¶
Prospective condo owners should contact us for current pricing and prior to purchasing any equipment to insure compatibility. If you are interested in other hardware configurations (e.g., HTC/Serial nodes), please contact us. BRC will assist with entering a compute node purchase requisition on behalf of UC Berkeley faculty.
Software¶
Prospective Condo owners should review the System Software section of the System Overview page to confirm that their applications are compatible with Savio's operating system, job scheduler and operating environment.
Storage¶
All institutional and condo users have a 10 GB home directory with backups; in addition, each research group is eligible to receive up to 200 GB of shared project space (30 GB for Faculty Computing Allowance accounts and 200 GB for Condo accounts) to hold research specific application software that is shared among the users of a research group. All users have access to the Savio high performance scratch filesystem for non-persistent data. Users or projects needing more space for persistent data can also purchase additional performance tier storage from IST at the current rate. For even larger storage needs, Condo partners may also take advantage of the Condo Storage service, which provides low-cost storage for very large data needs (minimum 25 TB).
Network¶
A Mellanox infiniband 36-port unmanaged leaf switch is used for every 24 ea. compute nodes.
Charter Condo Contributors¶
The following is a list of all those who contributed Charter nodes to the Savio Condo, thus helping launch the Savio cluster:
Contributor | Affiliation |
---|---|
Eliot Quataert | Theoretical Astrophysics Center, Astronomy Department |
Eugene Chiang | Astronomy Department |
Chris McKee | Astronomy Department |
Richard Klein | Astronomy Department |
Uros Seljak | Physics Department |
Jon Arons | Astronomy Department |
Ron Cohen | Department of Chemistry, Department of Earth and Planetary Science |
John Chiang | Department of Geography and Berkeley Atmospheric Sciences Center |
Fotini Katopodes Chow | Department of Civil and Environmental Engineering |
Jasmina Vujic | Department of Nuclear Engineering |
Jasjeet Sekhon | Department of Political Science and Statistics |
Rachel Slaybaugh | Nuclear Engineering |
Massimiliano Fratoni | Nuclear Engineering |
Hiroshi Nikaido, | Molecular and Cell Biology |
Donna Hendrix, | Computation Genomics Research Lab |
Justin McCrary, | Director D-Lab |
Alan Hubbard, | Biostatistics, School of Public Health |
Mark van der Laan, | Biostatistics and Statistics, School of Public Health |
Michael Manga, | Department of Earth and Planetary Sciences |
Sol Shiang | Goldman School of Public Policy |
Jeff Neaton | Physics |
Eric Neuscamman | College of Chemistry |
M. Alam Reza | Mechanical Engineering |
Elaine Tseng | UCSF School of Medicine |
Julius Guccione | UCSF Department of Surgery |
Ryan Lovett | Statistical Computing Facility |
David Limmer | College of Chemistry |
Doris Bachtrog | Integrative Biology |
Kranthi Mandadapu | College of Chemistry |
Kristin Persson | Department of Materials Science and Engineering |
Daryl Chrzan | Department of Materials Science and Engineering |
William Boos | Earth and Planetary Science |
Daniel Weisz | Department of Astronomy |
Peter Sudmant | Integrative Biology |
Priya Moorjani | Molecular and Cell Biology |
Faculty Perspectives¶
UC Berkeley Professor of Astrophysics Eliot Quataert speaks at the BRC Program Launch (22 May 2014) on the need for local high performance computing (HPC) clusters, distinct from national resources such as NSF, DOE (NERSC), and NASA.
UC Berkeley Professor of Integrated Biology Rasmus Nielsen speaks at the BRC Program Launch (22 May 2014) about the transformative effect of using HPC in genomics research.