Specifying Job Resources

This section details options for specifying the resource requirements for your jobs. We also provide a variety of example job scripts for setting up parallelization, low-priority jobs, jobs using fewer cores than available on a node, and long-running FCA jobs.

Per-core versus Per-node Scheduling¶

In many Savio partitions, nodes are assigned for exclusive access by your job. So, if possible, when running jobs in those partitions you generally want to set SLURM options and write your code to use all the available resources on the nodes assigned to your job (e.g., either 32 or 40 cores and 96 GB memory per node in the "savio3" partition).

The exceptions are the "HTC" and "GPU" partitions: savio2_htc, savio3_htc, savio4_htc, savio3_gpu, and savio4_gpu, where individual cores are assigned to jobs.

Savio is transitioning to per-core scheduling

With the Savio4 generation of hardware, we are moving away from per-node scheduling and towards per-core scheduling. There is no savio4 partition, just savio4_htc.

Memory Available¶

Do not request memory explicitly

In most cases, please do not request memory using the memory-related flags available for sbatch and srun.

Per-node partitions¶

In partitions in which jobs are allocated entire nodes, by default the full memory on the node(s) will be available to your job. There is no need to pass any memory-related flags when you start your job.

Per-core (HTC and GPU) partitions¶

On the GPU and HTC partitions you get an amount of memory proportional to the number of cores your job requests relative to the number of cores on the node. For example, if the node has 64 GB and 8 cores, and you request 2 cores, you'll have access to 16 GB memory (1/4 of 64 GB).

If you need more memory than the default memory provided per core, you should request additional cores.

The savio4_htc partition has some nodes with 256 GB and some with 512 GB. By default, regardless of which node you end up on, your job will be allocated 4 GB per core. To request more memory, you have a few options:

You can request more cores -- asking for enough cores so that four times that many cores gives enough memory.
You can request that your job use the 512 GB nodes by adding -C savio4_m512. Your job will be allocated 8GB per core. If you need more, you should request more cores.
Users of savio4_htc condos may request more than the default memory per core by using the --mem-per-cpu flag. However, such users should be aware that this will reduce the resources available for use by other jobs in the condo. For an extreme example, a condo job requesting a few cores and all the memory purchased by the condo will prevent any other jobs from running in the condo, because the memory available to the condo is fully allocated. Thus, there is little difference between using --mem-per-cpu and simply requesting additional cores when running savio4_htc condo jobs. The --mem flag should only be used under special circumstances, as it requests memory per node and users don't generally control the number of cores allocated per node.

GPU Jobs¶

Running jobs on Savio which require GPU resources requires some extra resource specification. The type and number of GPUs must be specified, We include example job scripts which may be helpful to reference or adapt for your use.

The key things to remember are:

Submit to a partition with nodes with GPUs (e.g., savio3_gpu).
Include the --gres flag.
Request multiple CPU cores for each GPU requested, using --cpus-per-task, based on the GPU type, using the ratios given in the table below.
You can request multiple GPUs with syntax like this (in this case for two GPUs): --gres=gpu:2.
You can request a particular type of GPU with syntax like this (in this case requesting two A5000 RTX GPUs): --gres=gpu:A5000:2.
- This is required for most condos in savio3_gpu and savio4_gpu (when running regular condo jobs and not low priority jobs) with the GPU types available for specific condos detailed here.
- This is also required for regular FCA jobs in savio3_gpu and savio4_gpu (when not using the low priority queue) as discussed here.
Also, if using an FCA to request an A40 or V100 GPU, you also need to specifically specify the QoS via -q a40_gpu3_normal or -q v100_gpu3_normal, respectively.

This table shows the ratio of CPU cores to GPUs that you should request when submitting GPU jobs, as well as the GPU types available:

Partition	GPU type	GPUs per node	CPU:GPU ratio	FCA QoS available
savio4_gpu	A5000	8	4:1	a5k_gpu4_normal, savio_lowprio
savio4_gpu	L40	8	8:1	savio_lowprio
savio3_gpu	GTX2080TI	4	2:1	gtx2080_gpu3_normal, savio_lowprio
savio3_gpu	TITAN	8	4:1	savio_lowprio
savio3_gpu	V100	2	4:1	v100_gpu3_normal, savio_lowprio
savio3_gpu	A40	2	8:1	a40_gpu3_normal, savio_lowprio
savio2_1080ti	1080TI	4	2:1	savio_normal

Submitting jobs to savio3_gpu is a bit complicated because the savio3_gpu partition contains a variety of GPU types, as indicated in the relevant rows above.

Note that only a number of GPUs equivalent to a subset of the savio3_gpu and savio4_gpu nodes are available for regular priority FCA use:

28 GTX2080TI GPUs
2 V100 GPUs
16 A40 GPUs
136 A5000 GPUs (savio4_gpu)

Job not starting because of QOSGrpCpuLimit or QOSGrpGRES

If you've submitted a job to savio3_gpu or savio4_gpu under an FCA and squeue indicates it is pending with the REASON of QOSGrpCpuLimit or QOSGrpGRES, that indicates that all of the GPUs of the type you requested (or the CPUs that are allocated proportional to GPUs) that are available for FCA use are already being used.

Job not starting because of QOSMinGRES

If you've submitted a job to savio3_gpu or savio4_gpu under an FCA and squeue indicates it is pending with the REASON of QOSMinGRES, that indicates that you forgot to provide the GPU type in your --gres flag, or that you've requested a GPU type not available for FCA use apart from low-priority use (currently this should only apply to TITAN GPUs in savio3_gpu and L40 GPUs in savio4_gpu), or that you haven't requested a QoS when requesting A40 or V100 GPU on savio3_gpu.

Job not starting because of QOSMaxCpuPerUserLimit

If you've submitted a job to savio4_gpu under an FCA and squeue indicates it is pending with the REASON of QOSMaxCpuPerUserLimit, note that each FCA user can use at most 16 CPUs (corresponding to 4 GPUs given the 4:1 CPU:GPU ratio for the A5000 GPUs) with the A5000 GPU nodes in savio4_gpu.

Additional savio3_gpu GPUs (including TITAN GPUs) and savio4_gpu GPUs can be accessed by FCA users through the low priority queue.

Do not modify CUDA_VISIBLE_DEVICES

The environment variable CUDA_VISIBLE_DEVICES will be set by the Slurm scheduler to reference the GPUs actually available to your job, not all of the GPUs on the node. If necssary, in your code, you should refer to the GPU(s) starting with 0 (and then 1 if you request two GPUs, etc.). Manually modifying the CUDA_VISIBLE_DEVICES variable to a value outside the range of GPUs made available to your job can cause GPU access failures in programs which utilize this variable.

Requesting specific node features¶

On some partitions of the Savio cluster, users can make more fine-grained decisions about the specific hardware they request. For example, on savio4_htc, there exist nodes with either 256Gb of RAM or 512Gb of RAM. One way to specifically request the 512Gb nodes would be to add the resource constraint flag -C savio4_m512 to the allocation request. We include a table below detailing all of the available node features which can be passed to the constraint option (-C or --constraint).

Partition	Node Features
savio2	`savio2` `savio2_c24` `savio2_c28`
savio2_bigmem	`savio2_bigmem` `savio2_m128`
savio3	`savio3` `savio3_c40`
savio3_bigmem	`savio3_bigmem` `savio3_m384` `savio3_c40`
savio3_htc	`savio3_htc` `savio3_c40`
savio3_xlmem	`savio3_xlmem` `savio3_c52`
savio3_gpu	`savio3_gpu` (V100) `4rtx` (GTX2080TI) `8rtx` (TITAN) `a40`(A40) `a40` (A40)
savio4_htc	`savio4_m256` `savio4_m512`
savio4_gpu	`a5000` (A5000) `L40` (L40)

Parallelization¶

When submitting parallel code, you usually need to specify the number of tasks, nodes, and CPUs to be used by your job in various ways. For any given use case, there are generally multiple ways to set the options to achieve the same effect; these examples try to illustrate what we consider to be best practices.

The key options for parallelization are:

--nodes (or -N): indicates the number of nodes to use
--ntasks-per-node: indicates the number of tasks (i.e., processes one wants to run on each node)
--cpus-per-task (or -c): indicates the number of cpus to be used for each task

In addition, in some cases it can make sense to use the --ntasks (or -n) option to indicate the total number of tasks and let the scheduler determine how many nodes and tasks per node are needed. In general --cpus-per-task will be 1 except when running threaded code.

Note that if the various options are not set, SLURM will in some cases infer what the value of the option needs to be given other options that are set and in other cases will treat the value as being 1. So some of the options set in the example below are not strictly necessary, but we give them explicitly to be clear.

Here's an example script that requests an entire Savio3 node and specifies 32 cores per task.

#!/bin/bash
#SBATCH --job-name=test
#SBATCH --account=<account_name>
#SBATCH --partition=savio3
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=32
#SBATCH --time=00:00:30
## Command(s) to run:
echo "hello world"

Only the partition, time, and account flags are required.

Running many small tasks in parallel¶

Many users have multiple jobs that each use only a single core or a small number of cores and therefore cannot take advantage of all the cores on a Savio node. There are tools that allow one to automate the parallelization of such jobs, in particular allowing one to group tasks into a single SLURM submission to take advantage of the multiple cores on a given Savio node. For this purpose, we recommend the use of the community-supported GNU parallel tool.

When needing to run many similar jobs with the same resource consumption, another option which can help orchestrate the job submission would be Slurm's job array feature. We detail the use of Slurm job arrays here.

Long Running Jobs¶

Most jobs running under an FCA have a maximum time limit of 72 hours (three days). However users can run jobs using a small number of cores in the long queue, using the savio2_htc partition and the savio_long QoS.

A given job in the long queue can use no more than 4 cores and a maximum of 10 days. Collectively across the entire Savio cluster, at most 24 cores are available for long-running jobs, so you may find that your job may sit in the queue for a while before it starts.

We provide an example job script for such jobs

Tip

Most condos do not have a time limit. Whether there is a limit is up to the PI who owns the condo.