Scheduler Examples

Here we show some example job scripts that allow for various kinds of parallelization, jobs that use fewer cores than available on a node, GPU jobs, low-priority condo jobs, and long-running FCA jobs.

1. Threaded/OpenMP job script

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#!/bin/bash
# Job name:
#SBATCH --job-name=test
#
# Account:
#SBATCH --account=account_name
#
# Partition:
#SBATCH --partition=partition_name
#
# Request one node:
#SBATCH --nodes=1
#
# Specify one task:
#SBATCH --ntasks-per-node=1
#
# Number of processors for single task needed for use case (example):
#SBATCH --cpus-per-task=4
#
# Wall clock limit:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
./a.out

Here --cpus-per-task should be no more than the number of cores on a Savio node in the partition you request. You may want to experiment with the number of threads for your job to determine the optimal number, as computational speed does not always increase with more threads. Note that if --cpus-per-task is fewer than the number of cores on a node, your job will not make full use of the node. Strictly speaking the --nodes and --ntasks-per-node arguments are optional here because they default to 1.

2. Simple multi-core job script (multiple processes on one node)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#!/bin/bash
# Job name:
#SBATCH --job-name=test
#
# Account:
#SBATCH --account=account_name
#
# Partition:
#SBATCH --partition=partition_name
#
# Request one node:
#SBATCH --nodes=1
#
# Specify number of tasks for use case (example):
#SBATCH --ntasks-per-node=20
#
# Processors per task:
#SBATCH --cpus-per-task=1
#
# Wall clock limit:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
./a.out

This job script would be appropriate for multi-core R, Python, or MATLAB jobs. In the commands that launch your code and/or within your code itself, you can reference the SLURM_NTASKS environment variable to dynamically identify how many tasks (i.e., processing units) are available to you.

Here the number of CPUs used by your code at at any given time should be no more than the number of cores on a Savio node.

For a way to run many individual jobs on one or more nodes (more jobs than cores), see this information on our ht_helper tool.

3. MPI job script

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
#!/bin/bash
# Job name:
#SBATCH --job-name=test
#
# Account:
#SBATCH --account=account_name
#
# Partition:
#SBATCH --partition=partition_name
#
# Number of MPI tasks needed for use case (example):
#SBATCH --ntasks=40
#
# Processors per task:
#SBATCH --cpus-per-task=1
#
# Wall clock limit:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
module load gcc openmpi
mpirun ./a.out

As noted in the introduction, for all partitions except for savio2_htc and savio2_gpu, you probably want to set the number of tasks to be a multiple of the number of cores per node in that partition, thereby making use of all the cores on the node(s) to which your job is assigned.

This example assumes that each task will use a single core; otherwise there could be resource contention amongst the tasks assigned to a node.

4. Alternative MPI job script

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#!/bin/bash
# Job name:
#SBATCH --job-name=test
#
# Account:
#SBATCH --account=account_name
#
# Partition:
#SBATCH --partition=partition_name
#
# Number of nodes needed for use case:
#SBATCH --nodes=2
#
# Tasks per node based on number of cores per node (example):
#SBATCH --ntasks-per-node=20
#
# Processors per task:
#SBATCH --cpus-per-task=1
#
# Wall clock limit:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
module load gcc openmpi
mpirun ./a.out

This alternative explicitly specifies the number of nodes, tasks per node, and CPUs per task rather than simply specifying the number of tasks and having SLURM determine the resources needed. As before, one would generally want the number of tasks per node to equal a multiple of the number of cores on a node, assuming only one CPU per task.

5. Hybrid OpenMP+MPI job script

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#!/bin/bash
# Job name:
#SBATCH --job-name=test
#
# Account:
#SBATCH --account=account_name
#
# Partition:
#SBATCH --partition=partition_name
#
# Number of nodes needed for use case (example):
#SBATCH --nodes=2
#
# Tasks per node based on --cpus-per-task below and number of cores
# per node (example):
#SBATCH --ntasks-per-node=4
#
# Processors per task needed for use case (example):
#SBATCH --cpus-per-task=5
#
# Wall clock limit:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
module load gcc openmpi
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
mpirun ./a.out

Here we request a total of 8 (=2x4) MPI tasks, with 5 cores per task.  This would make use of all the cores on two, 20-core nodes in the "savio" partition.

6. Jobs scheduled on a per-core basis (jobs that use fewer cores than available on a node)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
#!/bin/bash
# Job name:
#SBATCH --job-name=test
#
# Account:
#SBATCH --account=account_name
#
# Partition:
#SBATCH --partition=savio2_htc
#
# Number of tasks needed for use case (example):
#SBATCH --ntasks=4
#
# Processors per task:
#SBATCH --cpus-per-task=1
#
# Wall clock limit:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
./a.out

In the savio2_htc pool you are only charged for the actual number of cores used, so the notion of making best use of resources by saturating a node is not relevant.

7. GPU job script

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#!/bin/bash
# Job name:
#SBATCH --job-name=test
#
# Account:
#SBATCH --account=account_name
#
# Partition:
#SBATCH --partition=savio2_gpu
#
# Number of nodes:
#SBATCH --nodes=1
#
# Number of tasks (one for each GPU desired for use case) (example):
#SBATCH --ntasks=1
#
# Processors per task (please always specify the total number of processors twice the number of GPUs):
#SBATCH --cpus-per-task=2
#
#Number of GPUs, this can be in the format of "gpu:[1-4]", or "gpu:K80:[1-4] with the type included
#SBATCH --gres=gpu:1
#
# Wall clock limit:
#SBATCH --time=00:00:30
#
## Command(s) to run (example):
./a.out

To help the job scheduler effectively manage the use of GPUs, your job submission script must request two CPUs for each GPU you will use. Jobs submitted that do not request a minimum of two CPUs for every GPU will be rejected by the scheduler.

Here’s how to request two CPUs for each GPU: the total of CPUs requested results from multiplying two settings: the number of tasks (“--ntasks=”) and CPUs per task ("--cpus-per-task=").

For instance, in the above example, one GPU was requested via “--gres=gpu:1”, and the required total of two CPUs was thus requested via the combination of “--ntasks=1” and "--cpus-per-task=2" . Similarly, if your job script requests four GPUs via "--gres=gpu:4", and uses "--ntasks=8", it should also include "--cpus-per-task=1" in order to request the required total of eight CPUs.

Note that the "--gres=gpu:[1-4]" specification must be between 1 and 4. This is because the feature is associated with a node, and the nodes each have 4 GPUs. If you wish to use more than 4 GPUs, your "--gres=gpu:[1-4]" specification should include how many GPUs to use per node requested. For example, if you wish to use eight GPUs, your job script should include options to the effect of "--gres=gpu:4", "--nodes=2", "--ntasks=8", and "--cpus-per-task=2".

8. Long-running jobs (up to 10 days and 4 cores per job)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#!/bin/bash
# Job name:
#SBATCH --job-name=test
#
# Account:
#SBATCH --account=account_name
#
# QoS: must be savio_long for jobs > 3 days
#SBATCH --qos=savio_long
#
# Partition:
#SBATCH --partition=savio2_htc
#
# Number of tasks needed for use case (example):
#SBATCH --ntasks=2
#
# Processors per task:
#SBATCH --cpus-per-task=1
#
# Wall clock limit (7 days in this case):
#SBATCH --time=7-00:00:00
#
## Command(s) to run (example):
./a.out

A given job in the long queue can use no more than 4 cores and a maximum of 10 days. Collectively across the entire Savio cluster, at most 24 cores are available for long-running jobs, so you may find that your job may sit in the queue for a while before it starts.

In the savio2_htc pool you are only charged for the actual number of cores used, so the notion of making best use of resources by saturating a node is not relevant.

9. Low-priority jobs

Low-priority jobs can only be run using condo accounts. By default any jobs run in a condo account will use the default QoS (generally savio_normal) if not specified. To use the low-priority queue, you need to specify the low-priority QoS, as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
#!/bin/bash
# Job name:
#SBATCH --job-name=test
#
# Account:
#SBATCH --account=account_name
#
# Partition:
#SBATCH --partition=partition_name
#
# Quality of Service:
#SBATCH --qos=savio_lowprio
#
# Wall clock limit:
#SBATCH --time=00:00:30
#
## Command(s) to run:
echo "hello world"