Savio scheduler configuration¶
Savio partitions¶
| Partition | Scheduler Allocation† | Service unit per core hour ratio |
|---|---|---|
| savio4_htc | Per core | 3.67 |
| savio4_gpu | Per core | 4.67 (18.68–37.36 / GPU)* |
| savio3 | Per node | 1.00 |
| savio3_bigmem | Per node | 2.67 |
| savio3_htc | Per core | 2.67 |
| savio3_xlmem | Per node | 4.67 |
| savio3_gpu | Per core | 3.67 (7.34–29.36 / GPU)* |
| savio2 | Per node | 0.75 |
| savio2_bigmem | Per node | 1.20 |
| savio2_htc | Per core | 1.20 |
| savio2_1080ti | Per core | 1.67 (3.34 / GPU) |
| savio2_knl | Per node | 0.40 |
* Understanding SU charges per GPU
SU charges per GPU for savio3_gpu and savio4_gpu depend on the specific GPU requested, as resources are allocated per-core and the number of CPU cores per GPU vary across nodes in these partitions. For more details about the CPU:GPU ratios and available GPU hardware, please see our page on Savio's hardware configuration. To learn more about how to request specific GPUs, please see our documentation on specifying resource requests with Slurm.
† Per-core versus per-node scheduling
In many Savio partitions, nodes are assigned for exclusive access by your job. So, if possible, when running jobs in those partitions you generally want to set SLURM options and write your code to use all the available resources on the nodes assigned to your job (e.g., either 32 or 40 cores and 96 GB memory per node in the "savio3" partition).
The exceptions are the shared "HTC" and "GPU" partitions: savio2_htc, savio3_htc, savio4_htc, savio3_gpu, and savio4_gpu, where individual cores are assigned to jobs.
Savio is transitioning to per-core scheduling
With the Savio4 generation of hardware, we are moving away from per-node scheduling and towards per-core scheduling. There is no savio4 partition, just savio4_htc. All future partitions will feature shared nodes.
Overview of QoS Configurations for Savio¶
For details on specific Condo QoS configurations, see below.
| QoS | Accounts allowed | QoS Limits | Partitions |
|---|---|---|---|
| savio_normal | FCA*, ICA | 24 nodes max per job, 72 hour (72:00:00) wallclock limit | all |
| savio_debug | FCA*, ICA | 4 nodes max per job, 4 nodes in total, 3 hour (03:00:00) wallclock limit | all |
| savio_long | FCA*, ICA | 4 cores max per job, 24 cores in total, 10 day (10-00:00:00) wallclock limit | savio2_htc |
| Condo QoS | condos | specific to each condo, see next section | as purchased |
| savio_lowprio | condos, FCA** | 24 nodes max per job, 72 hour (72:00:00) wallclock limit | all |
* Including purchases of additional SUs for an FCA.
** FCAs can utilize savio_lowprio on the savio3_gpu and savio4_gpu partitions only.
QoS Configurations for Savio Condos¶
One can determine the resources available for a Condo by examining the Savio QoS configurations. This invocation prints out relevant information
sacctmgr show qos format=Name%24,Priority%8,GrpTRES%22,MinTRES%26
Savio4 HTC Condo Qos Configurations
| Account | QoS | Qos Limit |
|---|---|---|
| co_minium | minium_htc4_normal | 224 cores max per group |
| co_chrzangroup | chrzangroup_htc4_normal | 448 cores max per group |
| co_moorjani | moorjani_htc4_normal | 616 cores max per group |
| co_haas | haas_htc4_normal | 224 cores max per group |
| co_dweisz | dweisz_htc4_normal | 672 cores max per group |
| co_aqmel2 | aqmel2_htc4_normal | 224 cores max per group |
| co_condoceder | condoceder_htc4_normal | 1120 cores max per group |
| co_genomicdata | genomicdata_htc4_normal | 224 cores max per group |
| co_stratflows | stratflows_htc4_normal | 224 cores max per group |
| co_kslab | kslab_htc4_normal | 224 cores max per group |
| co_chemqmc | chemqmc_savio4_normal | 448 cores max per group |
| co_rosalind | rosalind_htc4_normal | 448 cores max per group |
| co_armada2 | armada2_htc4_normal | 672 cores max per group |
| co_12monkeys | 12monkeys_htc4_normal | 448 cores max per group |
| co_aptelab | aptelab_htc4_normal | 448 cores max per group |
| co_gollner | gollner_htc4_normal | 224 cores max per group |
| co_demography | demography_htc4_normal | 448 cores max per group |
| co_moilab | moilab_htc4_normal | 672 cores max per group |
| co_carleton | carleton_htc4_normal | 224 cores max per group |
Savio4 GPU Condo Qos Configurations
| Account | QoS | Qos Limit | Required gres Type (*) |
|---|---|---|---|
| co_rail | rail_gpu4_normal | 208 GPUs max per group | A5000 |
| co_12monkeys | 12monkeys_gpu4_normal | 8 GPUs max per group | L40 |
| co_lucaslabl40s | lucaslabl40s_gpu4_normal | 16 GPUs max per group | L40 |
(*) Type required in Slurm gres specification (of form gres=gpu:<type>:<number of gpus> eg: --gres=gpu:A5000:1) for regular savio4_gpu condo jobs (i.e., not submitted under low priority).
Savio3 Condo QoS Configurations
| Account | QoS | QoS Limit |
|---|---|---|
| co_chemqmc | chemqmc_savio3_normal | 4 nodes max per group |
| co_laika | laika_savio3_normal | 4 nodes max per group |
| co_noneq | noneq_savio3_normal | 8 nodes max per group |
| co_aiolos | aiolos_savio3_normal | 36 nodes max per group 24:00:00 wallclock limit |
| co_jupiter | jupiter_savio3_normal | 12 nodes max per group |
| co_aqmodel | aqmodel_savio3_normal | 4 nodes max per group |
| co_esmath | esmath_savio3_normal | 4 nodes max per group |
| co_biostat | biostat_savio3_normal | 8 nodes max per group |
| co_fishes | fishes_savio3_normal | 4 nodes max per group |
| co_geomaterials | geomaterials_savio3_normal | 16 nodes max per group |
| co_kpmol | kpmol_savio3_normal | 40 nodes max per group |
| co_eisenlab | eisenlab_savio3_normal | 4 nodes max per group |
Savio3 Bigmem Condo QoS Configurations
| Account | QoS | QoS Limit |
|---|---|---|
| co_genomicdata | genomicdata_bigmem3_normal | 1 nodes max per group |
| co_kslab | kslab_bigmem3_normal | 4 nodes max per group |
| co_moorjani | moorjani_bigmem3_normal | 1 nodes max per group |
| co_armada2 | armada2_bigmem3_normal | 14 nodes max per group |
Savio3 HTC Condo QoS Configurations
| Account | QoS | QoS Limit |
|---|---|---|
| co_genomicdata | genomicdata_htc3_normal | 120 cores max per group |
| co_moorjani | moorjani_htc3_normal | 120 cores max per group |
| co_armada2 | armada2_htc3_normal | 80 cores max per group |
| co_songlab | songlab_htc3_normal | 160 cores max per group |
Savio3 Extra Large Memory Condo QoS Configurations
| Account | QoS | QoS Limit |
|---|---|---|
| co_genomicdata | genomicdata_xlmem3_normal | 1 nodes max per group |
| co_rosalind | rosalind_xlmem3_normal | 2 nodes max per group |
Savio3 GPU Condo QoS Configurations
| Account | QoS | QoS Limit | Required gres Type (*) |
|---|---|---|---|
| co_nilah | nilah_gpu3_normal | 2 GPUs max per group | V100 |
| co_esmath | esmath_gpu3_normal | 16 GPUs max per group | GTX2080TI |
| co_rail | rail_gpu3_normal | 48 GPUs max per group | TITAN |
| co_jksim | jksim_gpu3_normal | 12 GPUs max per group | N/A |
| co_memprotmd | memprotmd_gpu3_normal | 2 GPUs max per group | A40 |
| co_dweisz | dweisz_gpu3_normal | 2 GPUs max per group | A40 |
| co_condoceder | condoceder_gpu3_normal | 4 GPUs max per group | A40 |
| co_noneq | noneq_gpu3_normal | 8 GPUs max per group | A40 |
| co_biohub | biohub_gpu3_normal | 4 GPUs max per group | A40 |
| co_armada2 | armada2_gpu3_normal | 20 GPUs max per group | A40 |
| co_cph20bnodes | cph200bnodes_gpu3_normal | 8 GPUs max per group | A40 |
(*) Type required in Slurm gres specification (of form gres=gpu:<type>:<number of gpus> eg: --gres=gpu:A40:1) for regular savio3_gpu condo jobs (i.e., not submitted under low priority).
Savio2 Condo QoS Configurations
| Account | QoS | QoS Limit |
|---|---|---|
| co_biostat | biostat_savio2_normal | 20 nodes max per group |
| co_chemqmc | chemqmc_savio2_normal | 16 nodes max per group |
| co_dweisz | dweisz_savio2_normal | 8 nodes max per group |
| co_econ | econ_savio2_normal | 2 nodes max per group |
| co_hiawatha | hiawatha_savio2_normal | 40 nodes max per group |
| co_lihep | lihep_savio2_normal | 4 nodes max per group |
| co_mrirlab | mrirlab_savio2_normal | 4 nodes max per group |
| co_planets | planets_savio2_normal | 4 nodes max per group |
| co_stat | stat_savio2_normal | 2 nodes max per group |
| co_bachtrog | bachtrog_savio2_normal | 4 nodes max per group |
| co_noneq | noneq_savio2_normal | 8 nodes max per group |
| co_kranthi | kranthi_savio2_normal | 4 nodes max per group |
Savio2 Bigmem Condo QoS Configurations
| Account | QoS | QoS Limit |
|---|---|---|
| co_laika | laika_bigmem2_normal | 4 nodes max per group |
| co_dweisz | dweisz_bigmem2_normal | 4 nodes max per group |
| co_aiolos | aiolos_bigmem2_normal | 4 nodes max per group 24:00:00 wallclock limit |
| co_bachtrog | bachtrog_bigmem2_normal | 4 nodes max per group |
| co_msedcc | msedcc_bigmem2_normal | 8 nodes max per group |
Savio2 HTC Condo QoS Configurations
| Account | QoS | QoS Limit |
|---|---|---|
| co_rosalind | rosalind_htc2_normal | 8 nodes max per group |
Savio2 1080Ti Condo QoS Configurations
| Account | QoS | QoS Limit |
|---|---|---|
| co_acrb | acrb_1080ti2_normal | 12 GPUs max per group |
| co_mlab | mlab_1080ti2_normal | 16 GPUs max per group |
Savio2 KNL Condo QoS Configurations
| Account | QoS | QoS Limit |
|---|---|---|
| co_lsdi | lsdi_knl2_normal | 28 nodes max per group 5 running jobs max per user 20 total jobs max per user |
(Retired; GPUs no longer accessible) Savio2 GPU Condo QoS Configurations
| Account | QoS | QoS Limit |
|---|---|---|
| co_acrb | acrb_gpu2_normal | 44 GPUs max per group |
| co_stat | stat_gpu2_normal | 8 GPUs max per group |
(Retired) Savio Condo QoS Configurations
| Account | QoS | QoS Limit |
|---|---|---|
| co_acrb | acrb_savio_normal | 8 nodes max per group |
| co_aiolos | aiolos_savio_normal | 12 nodes max per group 24:00:00 wallclock limit |
| co_astro | astro_savio_debug astro_savio_normal |
4 nodes max per group 32 nodes max per group |
| co_dlab | dlab_savio_normal | 4 nodes max per group |
| co_nuclear | nuclear_savio_normal | 24 nodes max per group |
| co_praxis | praxis_savio_normal | 4 nodes max per group |
| co_rosalind | rosalind_savio_normal | 8 nodes max per group 4 nodes max per job per user |
CGRL Scheduler Configuration¶
The clusters uses the SLURM scheduler to manage jobs. When submitting your jobs via sbatch or srun commands, use the following SLURM options:
- The settings for a job in Vector (Note: you don't need to set the "account"):
--partition=vector --qos=vector_batch - The settings for a job in Rosalind (Savio2 HTC):
--partition=savio2_htc --account=co_rosalind --qos=rosalind_htc2_normal
Checking allowed QOS
To check which QoS you are allowed to use, simply run sacctmgr -p show associations user=$USER
Here are the details for each CGRL partition and associated QoS.
| Partition | Scheduler Allocation |
Account | Nodes | Node List | QoS | QoS Limit |
|---|---|---|---|---|---|---|
| vector | By Core | N/A | 11 | n00[00-03].vector0 n0004.vector0 n00[05-08].vector0 n00[09]-n00[10].vector0 | vector_batch | 48 cores max per job 96 cores max per user |
| savio2_htc | By Core | co_rosalind | 8 | n0[000-011].savio2, n0[215-222].savio2 | rosalind_htc2_normal | 8 nodes max per group |
| savio3_xlmem | By Node | co_rosalind | 2 | n0[000-003].savio3 | rosalind_xlmem3_normal | 2 nodes max per group |
| savio4_htc | By Core | co_rosalind | 8 | n0[170-177].savio4 | rosalind_htc4_normal | 448 cores max per group |