Savio scheduler configuration¶
Savio partitions
Partition | Nodes | Node Features | Nodes shared? | SU/core hour ratio |
---|---|---|---|---|
savio2 | 163 | savio2 or savio2_c24 or savio2_c28 | exclusive | 0.75 |
savio2_bigmem | 36 | savio2_bigmem or savio2_m128 | exclusive | 1.20 |
savio2_htc | 20 | savio2_htc | shared | 1.20 |
savio2_1080ti | 8 | savio2_1080ti | shared | 1.67 (3.34 / GPU) |
savio2_knl | 28 | savio2_knl | exclusive | 0.40 |
savio3 | 192 | savio3 or savio3_c40 | exclusive | 1.00 |
savio3_bigmem | 20 | savio3_bigmem or savio3_m384; (savio3_c40 for 40 cores) | exclusive | 2.67 |
savio3_htc | 24 | savio3_htc or savio3_c40 | shared | 2.67 |
savio3_xlmem | 4 | savio3_xlmem or savio3_c52 | exclusive | 4.67 |
savio3_gpu | 2 | savio3_gpu (2x V100) | shared | 3.67 |
savio3_gpu | 9 | 4rtx (4x GTX2080TI) | shared | 3.67 |
savio3_gpu | 6 | 8rtx (8x TITAN) | shared | 3.67 |
savio3_gpu | 16 | a40 (2x A40) | shared | 3.67 |
savio3_gpu | 6 | a40 (4x A40) | shared | 3.67 |
savio4_htc | 156 | savio4_m256 or savio4_m512 | shared | 3.67 |
savio4_gpu | 26 | a5000 (8x A5000) | shared | TBD |
Overview of QoS Configurations for Savio¶
For details on specific Condo QoS configurations, see below.
QoS | Accounts allowed | QoS Limits | Partitions |
---|---|---|---|
savio_normal | FCA*, ICA | 24 nodes max per job, 72 hour (72:00:00) wallclock limit | all** |
savio_debug | FCA*, ICA | 4 nodes max per job, 4 nodes in total, 3 hour (03:00:00) wallclock limit | all** |
savio_long | FCA*, ICA | 4 cores max per job, 24 cores in total, 10 day (10-00:00:00) wallclock limit | savio2_htc |
Condo QoS | condos | specific to each condo, see next section | as purchased |
savio_lowprio | condos | 24 nodes max per job, 72 hour (72:00:00) wallclock limit | all |
(*) Including purchases of additional SUs for an FCA.
(**) Note that savio3 nodes (including the various bigmem, GPU, etc. nodes) are not yet available for use by FCAs or ICAs.
QoS Configurations for Savio Condos¶
One can determine the resources available for a Condo by examining the Savio QoS configurations. This invocation prints out relevant information
sacctmgr show qos format=Name%24,Priority%8,GrpTRES%22,MinTRES%26
(Retired) Savio Condo QoS Configurations
Account | QoS | QoS Limit |
---|---|---|
co_acrb | acrb_savio_normal | 8 nodes max per group |
co_aiolos | aiolos_savio_normal | 12 nodes max per group 24:00:00 wallclock limit |
co_astro |
astro_savio_debug astro_savio_normal |
4 nodes max per group 32 nodes max per group |
co_dlab | dlab_savio_normal | 4 nodes max per group |
co_nuclear | nuclear_savio_normal | 24 nodes max per group |
co_praxis | praxis_savio_normal | 4 nodes max per group |
co_rosalind | rosalind_savio_normal | 8 nodes max per group 4 nodes max per job per user |
Savio2 Condo QoS Configurations
Account | QoS | QoS Limit |
---|---|---|
co_biostat | biostat_savio2_normal | 20 nodes max per group |
co_chemqmc | chemqmc_savio2_normal | 16 nodes max per group |
co_dweisz | dweisz_savio2_normal | 8 nodes max per group |
co_econ | econ_savio2_normal | 2 nodes max per group |
co_hiawatha | hiawatha_savio2_normal | 40 nodes max per group |
co_lihep | lihep_savio2_normal | 4 nodes max per group |
co_mrirlab | mrirlab_savio2_normal | 4 nodes max per group |
co_planets | planets_savio2_normal | 4 nodes max per group |
co_stat | stat_savio2_normal | 2 nodes max per group |
co_bachtrog | bachtrog_savio2_normal | 4 nodes max per group |
co_noneq | noneq_savio2_normal | 8 nodes max per group |
co_kranthi | kranthi_savio2_normal | 4 nodes max per group |
Savio2 Bigmem Condo QoS Configurations
Account | QoS | QoS Limit |
---|---|---|
co_laika | laika_bigmem2_normal | 4 nodes max per group |
co_dweisz | dweisz_bigmem2_normal | 4 nodes max per group |
co_aiolos | aiolos_bigmem2_normal | 4 nodes max per group 24:00:00 wallclock limit |
co_bachtrog | bachtrog_bigmem2_normal | 4 nodes max per group |
co_msedcc | msedcc_bigmem2_normal | 8 nodes max per group |
Savio2 HTC Condo QoS Configurations
Account | QoS | QoS Limit |
---|---|---|
co_rosalind | rosalind_htc2_normal | 8 nodes max per group |
(Retired; GPUs no longer accessible) Savio2 GPU Condo QoS Configurations
Account | QoS | QoS Limit |
---|---|---|
co_acrb | acrb_gpu2_normal | 44 GPUs max per group |
co_stat | stat_gpu2_normal | 8 GPUs max per group |
Savio2 1080Ti Condo QoS Configurations
Account | QoS | QoS Limit |
---|---|---|
co_acrb | acrb_1080ti2_normal | 12 GPUs max per group |
co_mlab | mlab_1080ti2_normal | 16 GPUs max per group |
Savio2 KNL Condo QoS Configurations
Account | QoS | QoS Limit |
---|---|---|
co_lsdi | lsdi_knl2_normal | 28 nodes max per group 5 running jobs max per user 20 total jobs max per user |
Savio3 Condo QoS Configurations
Account | QoS | QoS Limit |
---|---|---|
co_chemqmc | chemqmc_savio3_normal | 4 nodes max per group |
co_laika | laika_savio3_normal | 4 nodes max per group |
co_noneq | noneq_savio3_normal | 8 nodes max per group |
co_aiolos | aiolos_savio3_normal | 36 nodes max per group 24:00:00 wallclock limit |
co_jupiter | jupiter_savio3_normal | 12 nodes max per group |
co_aqmodel | aqmodel_savio3_normal | 4 nodes max per group |
co_esmath | esmath_savio3_normal | 4 nodes max per group |
co_biostat | biostat_savio3_normal | 8 nodes max per group |
co_fishes | fishes_savio3_normal | 4 nodes max per group |
co_geomaterials | geomaterials_savio3_normal | 16 nodes max per group |
co_kpmol | kpmol_savio3_normal | 40 nodes max per group |
co_eisenlab | eisenlab_savio3_normal | 4 nodes max per group |
Savio3 Bigmem Condo QoS Configurations
Account | QoS | QoS Limit |
---|---|---|
co_genomicdata | genomicdata_bigmem3_normal | 1 nodes max per group |
co_kslab | kslab_bigmem3_normal | 4 nodes max per group |
co_moorjani | moorjani_bigmem3_normal | 1 nodes max per group |
co_armada2 | armada2_bigmem3_normal | 14 nodes max per group |
Savio3 HTC Condo QoS Configurations
Account | QoS | QoS Limit |
---|---|---|
co_genomicdata | genomicdata_htc3_normal | 120 cores max per group |
co_moorjani | moorjani_htc3_normal | 120 cores max per group |
co_armada2 | armada2_htc3_normal | 80 cores max per group |
co_songlab | songlab_htc3_normal | 160 cores max per group |
Savio3 Extra Large Memory Condo QoS Configurations
Account | QoS | QoS Limit |
---|---|---|
co_genomicdata | genomicdata_xlmem3_normal | 1 nodes max per group |
co_rosalind | rosalind_xlmem3_normal | 2 nodes max per group |
Savio3 GPU Condo QoS Configurations
Account | QoS | QoS Limit | Required gres Type (*) |
---|---|---|---|
co_nilah | nilah_gpu3_normal | 2 GPUs max per group | V100 |
co_esmath | esmath_gpu3_normal | 16 GPUs max per group | GTX2080TI |
co_rail | rail_gpu3_normal | 48 GPUs max per group | TITAN |
co_jksim | jksim_gpu3_normal | 12 GPUs max per group | N/A |
co_memprotmd | memprotmd_gpu3_normal | 2 GPUs max per group | A40 |
co_dweisz | dweisz_gpu3_normal | 2 GPUs max per group | A40 |
co_condoceder | condoceder_gpu3_normal | 4 GPUs max per group | A40 |
co_noneq | noneq_gpu3_normal | 8 GPUs max per group | A40 |
co_biohub | biohub_gpu3_normal | 4 GPUs max per group | A40 |
co_armada2 | armada2_gpu3_normal | 20 GPUs max per group | A40 |
co_cph20bnodes | cph200bnodes_gpu3_normal | 8 GPUs max per group | A40 |
(*)Type required in Slurm gres specification (of form gres=gpu:<type>:<number of gpus>
eg: --gres=gpu:A40:1
) for regular savio3_gpu condo jobs (i.e., not submitted under low priority).
Savio4 HTC Condo Qos Configurations
Account | QoS | Qos Limit |
---|---|---|
co_minium | minium_htc4_normal | 224 cores max per group |
co_chrzangroup | chrzangroup_htc4_normal | 448 cores max per group |
co_moorjani | moorjani_htc4_normal | 616 cores max per group |
co_haas | haas_htc4_normal | 224 cores max per group |
co_dweisz | dweisz_htc4_normal | 672 cores max per group |
co_aqmel2 | aqmel2_htc4_normal | 224 cores max per group |
co_condoceder | condoceder_htc4_normal | 1120 cores max per group |
co_genomicdata | genomicdata_htc4_normal | 224 cores max per group |
co_stratflows | stratflows_htc4_normal | 224 cores max per group |
co_kslab | kslab_htc4_normal | 224 cores max per group |
co_chemqmc | chemqmc_savio4_normal | 448 cores max per group |
co_rosalind | rosalind_htc4_normal | 448 cores max per group |
co_armada2 | armada2_htc4_normal | 672 cores max per group |
Savio4 GPU Condo Qos Configurations
Account | QoS | Qos Limit |
---|---|---|
co_rail | rail_gpu4_normal | 208 GPUs max per group |
CGRL Scheduler Configuration¶
The clusters uses the SLURM scheduler to manage jobs. When submitting your jobs via sbatch
or srun
commands, use the following SLURM options:
- The settings for a job in Vector (Note: you don't need to set the "account"):
--partition=vector --qos=vector_batch
- The settings for a job in Rosalind (Savio2 HTC):
--partition=savio2_htc --account=co_rosalind --qos=rosalind_htc2_normal
Alert
To check which QoS you are allowed to use, simply run "sacctmgr -p show associations user=$USER"
Here are the details for each CGRL partition and associated QoS.
Partition | Account | Nodes | Node List | Node Feature | QoS | QoS Limit |
---|---|---|---|---|---|---|
vector | N/A | 11 | n00[00-03].vector0 n0004.vector0 n00[05-08].vector0 n00[09]-n00[10].vector0 | vector,vector_c12,vector_m96 vector,vector_c48,vector_m256 vector,vector_c16,vector_m128 vector,vector_c12,vector_m48 | vector_batch | 48 cores max per job 96 cores max per user |
savio2_htc | co_rosalind | 8 | n0[000-011].savio2, n0[215-222].savio2 | savio2_htc | rosalind_htc2_normal | 8 nodes max per group |
savio3_xlmem | co_rosalind | 2 | n0[000-003].savio3 | savio3_xlmem | rosalind_xlmem3_normal | 2 nodes max per group |
savio4_htc | co_rosalind | 8 | n0[170-177].savio4 | savio4_htc | rosalind_htc4_normal | 448 cores max per group |