Skip to content

Savio scheduler configuration

Savio partitions

Partition Scheduler Allocation† Service unit per core hour ratio
savio4_htc Per core 3.67
savio4_gpu Per core 4.67 (18.68–37.36 / GPU)*
savio3 Per node 1.00
savio3_bigmem Per node 2.67
savio3_htc Per core 2.67
savio3_xlmem Per node 4.67
savio3_gpu Per core 3.67 (7.34–29.36 / GPU)*
savio2 Per node 0.75
savio2_bigmem Per node 1.20
savio2_htc Per core 1.20
savio2_1080ti Per core 1.67 (3.34 / GPU)
savio2_knl Per node 0.40
* Understanding SU charges per GPU

SU charges per GPU for savio3_gpu and savio4_gpu depend on the specific GPU requested, as resources are allocated per-core and the number of CPU cores per GPU vary across nodes in these partitions. For more details about the CPU:GPU ratios and available GPU hardware, please see our page on Savio's hardware configuration. To learn more about how to request specific GPUs, please see our documentation on specifying resource requests with Slurm.

† Per-core versus per-node scheduling

In many Savio partitions, nodes are assigned for exclusive access by your job. So, if possible, when running jobs in those partitions you generally want to set SLURM options and write your code to use all the available resources on the nodes assigned to your job (e.g., either 32 or 40 cores and 96 GB memory per node in the "savio3" partition).

The exceptions are the shared "HTC" and "GPU" partitions: savio2_htc, savio3_htc, savio4_htc, savio3_gpu, and savio4_gpu, where individual cores are assigned to jobs.

Savio is transitioning to per-core scheduling

With the Savio4 generation of hardware, we are moving away from per-node scheduling and towards per-core scheduling. There is no savio4 partition, just savio4_htc. All future partitions will feature shared nodes.

Overview of QoS Configurations for Savio

For details on specific Condo QoS configurations, see below.

QoS Accounts allowed QoS Limits Partitions
savio_normal FCA*, ICA 24 nodes max per job, 72 hour (72:00:00) wallclock limit all
savio_debug FCA*, ICA 4 nodes max per job, 4 nodes in total, 3 hour (03:00:00) wallclock limit all
savio_long FCA*, ICA 4 cores max per job, 24 cores in total, 10 day (10-00:00:00) wallclock limit savio2_htc
Condo QoS condos specific to each condo, see next section as purchased
savio_lowprio condos, FCA** 24 nodes max per job, 72 hour (72:00:00) wallclock limit all

* Including purchases of additional SUs for an FCA.

** FCAs can utilize savio_lowprio on the savio3_gpu and savio4_gpu partitions only.

QoS Configurations for Savio Condos

One can determine the resources available for a Condo by examining the Savio QoS configurations. This invocation prints out relevant information

sacctmgr show qos format=Name%24,Priority%8,GrpTRES%22,MinTRES%26
Savio4 HTC Condo Qos Configurations
Account QoS Qos Limit
co_minium minium_htc4_normal 224 cores max per group
co_chrzangroup chrzangroup_htc4_normal 448 cores max per group
co_moorjani moorjani_htc4_normal 616 cores max per group
co_haas haas_htc4_normal 224 cores max per group
co_dweisz dweisz_htc4_normal 672 cores max per group
co_aqmel2 aqmel2_htc4_normal 224 cores max per group
co_condoceder condoceder_htc4_normal 1120 cores max per group
co_genomicdata genomicdata_htc4_normal 224 cores max per group
co_stratflows stratflows_htc4_normal 224 cores max per group
co_kslab kslab_htc4_normal 224 cores max per group
co_chemqmc chemqmc_savio4_normal 448 cores max per group
co_rosalind rosalind_htc4_normal 448 cores max per group
co_armada2 armada2_htc4_normal 672 cores max per group
co_12monkeys 12monkeys_htc4_normal 448 cores max per group
co_aptelab aptelab_htc4_normal 448 cores max per group
co_gollner gollner_htc4_normal 224 cores max per group
co_demography demography_htc4_normal 448 cores max per group
co_moilab moilab_htc4_normal 672 cores max per group
co_carleton carleton_htc4_normal 224 cores max per group
Savio4 GPU Condo Qos Configurations
Account QoS Qos Limit Required gres Type (*)
co_rail rail_gpu4_normal 208 GPUs max per group A5000
co_12monkeys 12monkeys_gpu4_normal 8 GPUs max per group L40
co_lucaslabl40s lucaslabl40s_gpu4_normal 16 GPUs max per group L40

(*) Type required in Slurm gres specification (of form gres=gpu:<type>:<number of gpus> eg: --gres=gpu:A5000:1) for regular savio4_gpu condo jobs (i.e., not submitted under low priority).

Savio3 Condo QoS Configurations
Account QoS QoS Limit
co_chemqmc chemqmc_savio3_normal 4 nodes max per group
co_laika laika_savio3_normal 4 nodes max per group
co_noneq noneq_savio3_normal 8 nodes max per group
co_aiolos aiolos_savio3_normal 36 nodes max per group
24:00:00 wallclock limit
co_jupiter jupiter_savio3_normal 12 nodes max per group
co_aqmodel aqmodel_savio3_normal 4 nodes max per group
co_esmath esmath_savio3_normal 4 nodes max per group
co_biostat biostat_savio3_normal 8 nodes max per group
co_fishes fishes_savio3_normal 4 nodes max per group
co_geomaterials geomaterials_savio3_normal 16 nodes max per group
co_kpmol kpmol_savio3_normal 40 nodes max per group
co_eisenlab eisenlab_savio3_normal 4 nodes max per group
Savio3 Bigmem Condo QoS Configurations
Account QoS QoS Limit
co_genomicdata genomicdata_bigmem3_normal 1 nodes max per group
co_kslab kslab_bigmem3_normal 4 nodes max per group
co_moorjani moorjani_bigmem3_normal 1 nodes max per group
co_armada2 armada2_bigmem3_normal 14 nodes max per group
Savio3 HTC Condo QoS Configurations
Account QoS QoS Limit
co_genomicdata genomicdata_htc3_normal 120 cores max per group
co_moorjani moorjani_htc3_normal 120 cores max per group
co_armada2 armada2_htc3_normal 80 cores max per group
co_songlab songlab_htc3_normal 160 cores max per group
Savio3 Extra Large Memory Condo QoS Configurations
Account QoS QoS Limit
co_genomicdata genomicdata_xlmem3_normal 1 nodes max per group
co_rosalind rosalind_xlmem3_normal 2 nodes max per group
Savio3 GPU Condo QoS Configurations
Account QoS QoS Limit Required gres Type (*)
co_nilah nilah_gpu3_normal 2 GPUs max per group V100
co_esmath esmath_gpu3_normal 16 GPUs max per group GTX2080TI
co_rail rail_gpu3_normal 48 GPUs max per group TITAN
co_jksim jksim_gpu3_normal 12 GPUs max per group N/A
co_memprotmd memprotmd_gpu3_normal 2 GPUs max per group A40
co_dweisz dweisz_gpu3_normal 2 GPUs max per group A40
co_condoceder condoceder_gpu3_normal 4 GPUs max per group A40
co_noneq noneq_gpu3_normal 8 GPUs max per group A40
co_biohub biohub_gpu3_normal 4 GPUs max per group A40
co_armada2 armada2_gpu3_normal 20 GPUs max per group A40
co_cph20bnodes cph200bnodes_gpu3_normal 8 GPUs max per group A40

(*) Type required in Slurm gres specification (of form gres=gpu:<type>:<number of gpus> eg: --gres=gpu:A40:1) for regular savio3_gpu condo jobs (i.e., not submitted under low priority).

Savio2 Condo QoS Configurations
Account QoS QoS Limit
co_biostat biostat_savio2_normal 20 nodes max per group
co_chemqmc chemqmc_savio2_normal 16 nodes max per group
co_dweisz dweisz_savio2_normal 8 nodes max per group
co_econ econ_savio2_normal 2 nodes max per group
co_hiawatha hiawatha_savio2_normal 40 nodes max per group
co_lihep lihep_savio2_normal 4 nodes max per group
co_mrirlab mrirlab_savio2_normal 4 nodes max per group
co_planets planets_savio2_normal 4 nodes max per group
co_stat stat_savio2_normal 2 nodes max per group
co_bachtrog bachtrog_savio2_normal 4 nodes max per group
co_noneq noneq_savio2_normal 8 nodes max per group
co_kranthi kranthi_savio2_normal 4 nodes max per group
Savio2 Bigmem Condo QoS Configurations
Account QoS QoS Limit
co_laika laika_bigmem2_normal 4 nodes max per group
co_dweisz dweisz_bigmem2_normal 4 nodes max per group
co_aiolos aiolos_bigmem2_normal 4 nodes max per group
24:00:00 wallclock limit
co_bachtrog bachtrog_bigmem2_normal 4 nodes max per group
co_msedcc msedcc_bigmem2_normal 8 nodes max per group
Savio2 HTC Condo QoS Configurations
Account QoS QoS Limit
co_rosalind rosalind_htc2_normal 8 nodes max per group
Savio2 1080Ti Condo QoS Configurations
Account QoS QoS Limit
co_acrb acrb_1080ti2_normal 12 GPUs max per group
co_mlab mlab_1080ti2_normal 16 GPUs max per group
Savio2 KNL Condo QoS Configurations
Account QoS QoS Limit
co_lsdi lsdi_knl2_normal 28 nodes max per group
5 running jobs max per user
20 total jobs max per user
(Retired; GPUs no longer accessible) Savio2 GPU Condo QoS Configurations
Account QoS QoS Limit
co_acrb acrb_gpu2_normal 44 GPUs max per group
co_stat stat_gpu2_normal 8 GPUs max per group
(Retired) Savio Condo QoS Configurations
Account QoS QoS Limit
co_acrb acrb_savio_normal 8 nodes max per group
co_aiolos aiolos_savio_normal 12 nodes max per group
24:00:00 wallclock limit
co_astro

astro_savio_debug


astro_savio_normal

4 nodes max per group
4 nodes max per job
00:30:00 wallclock limit

32 nodes max per group
16 nodes max per job

co_dlab dlab_savio_normal 4 nodes max per group
co_nuclear nuclear_savio_normal 24 nodes max per group
co_praxis praxis_savio_normal 4 nodes max per group
co_rosalind rosalind_savio_normal 8 nodes max per group
4 nodes max per job per user

CGRL Scheduler Configuration

The clusters uses the SLURM scheduler to manage jobs. When submitting your jobs via sbatch or srun commands, use the following SLURM options:

  • The settings for a job in Vector (Note: you don't need to set the "account"): --partition=vector --qos=vector_batch
  • The settings for a job in Rosalind (Savio2 HTC): --partition=savio2_htc --account=co_rosalind --qos=rosalind_htc2_normal

Checking allowed QOS

To check which QoS you are allowed to use, simply run sacctmgr -p show associations user=$USER

Here are the details for each CGRL partition and associated QoS.

Partition Scheduler
Allocation
Account Nodes Node List QoS QoS Limit
vector By Core N/A 11 n00[00-03].vector0 n0004.vector0 n00[05-08].vector0 n00[09]-n00[10].vector0 vector_batch 48 cores max per job 96 cores max per user
savio2_htc By Core co_rosalind 8 n0[000-011].savio2, n0[215-222].savio2 rosalind_htc2_normal 8 nodes max per group
savio3_xlmem By Node co_rosalind 2 n0[000-003].savio3 rosalind_xlmem3_normal 2 nodes max per group
savio4_htc By Core co_rosalind 8 n0[170-177].savio4 rosalind_htc4_normal 448 cores max per group