Savio Service Units

Tracking usage of computing time, on the UC Berkeley campus's Savio high-performance computing cluster, is done via abstract measurement units called "Service Units."

Note

Tracking usage is only relevant for FCA users. Condo usage is managed based on constraining jobs using a condo to use no more resources than were purchased for the condo.

A Service Unit (abbreviated as "SU") is equivalent to one “core hour”: that is, the use of one processor core, for one hour of wall-clock time, on one of Savio's standard, current generation compute nodes.

The video below covers node types on Savio as of fall 2019:

Calculating Service Units Used by a Compute Job¶

As described in more detail below, the cost (in "Service Units") of running any particular compute job on Savio is calculated via a straightforward formula, which simply multiplies together the following three values:

How many processor cores are reserved for use by the job. When using certain pools of compute nodes, your job will be charged with using all the cores on that node, even if your job only actually uses a portion of the cores.
How long the job takes to run (in ordinary "wall-clock" time).
The scaling factor for the pool of compute nodes ("partition") on which the job is run.

For instance, if you run a computational job on the savio2 pool of nodes via an sbatch or srun command and reserve one compute node (which means you're effectively reserving all 24 of that node's cores, and your job runs for one hour, that job will use 24 Service Units; i.e., 24 cores x 1 hour x a scaling factor of 1.00 = 24 Service Units. A charge of 24 Service Units would then be made against your group's Faculty Computing Allowance scheduler account (with an account name like fc_projectname). So if you started out with 300,000 Service Units before running this job, for instance, after running it you would now have 299,976 Service Units remaining, for running additional jobs under the fc_projectname account.

Similarly, if you run a computational job on the savio2_htc pool of nodes, and reserve just 5 processor cores (since, when using the cluster's High Throughput Computing nodes, you can optionally schedule the use of just individual cores, rather than entire nodes), and your job runs for 10 hours, that job will use 60 Service Units; i.e., 5 cores x 10 hours x a scaling factor of 1.20 = 60 Service Units.

Scheduling Nodes vs. Cores¶

When you schedule jobs on Savio, depending on the pool of compute nodes (scheduler partition) on which you're running them, you may be automatically provided with exclusive access to entire nodes (including all of their cores), or you may be able to request access just to one or more individual cores on those nodes. When a job you run is provided with exclusive access to an entire node, please note that your account will be charged for using all of that node's cores.

Thus, for example, if you run a job for one hour on a standard 24-core compute node on the savio2 partition, because jobs are given exclusive access to entire nodes on that partition, your job will always use 24 core hours, even if it actually requires just a single core or a few cores. Accordingly, your account will be charged 24 Service Units for one hour of computational time on a savio2 node.

For that reason, if you plan to run single-core jobs (or any other jobs requiring fewer than the total number of cores on a node) you have two recommended options:

When running on a partition that always gives you exclusive access to entire nodes, bundle up multiple, smaller jobs that only require a single core (or a small number of cores) into one, larger job. This allows you to use many or all of the cores on that node during the duration of that job.
Alternately, run your jobs on a partition that offers per-core scheduling of jobs and is appropriately suited for your jobs. When doing so, make sure that your job script file also specifies exactly how many cores your job needs to use.

Scaling of Service Units¶

When you're using types of compute nodes other than Savio's current generation of standard nodes (at this writing, the nodes in the savio3 partition are "standard nodes"), your account will be charged with using more than - or fewer than - one Service Unit per hour of compute time. These scaled values primarily reflect the varying costs of acquiring and replacing different types of nodes in the cluster. When using older pools of standard compute nodes, with earlier generations of hardware, your account will use less than one SU per hour, while when using higher-cost nodes, such as Big Memory or Graphics Processing Unit (GPU) nodes, it will use more than one SU per hour.

Charges for the use of Savio's GPU nodes are based on the number of CPU processor cores used (rather than on GPUs used), as is the case for the charges for other types of compute nodes. Because all jobs using GPUs must request the use of at least two CPU cores for each GPU requested, the effective cost of using one GPU on Savio will be a minimum of 5.34 (2 x 2.67 SUs per CPU) Service Units per hour for savio2_gpu. This applies to the other GPU partitions as well, noting that for certain GPU types in savio3_gpu and savio4_gpu, users must request four or eight CPUS for each GPU requested.

Service unit charge rates for each Savio partiton can be found in our scheduler configuration overview. For more detailed information about each pool of compute nodes, check out our hardware configuration overview.

Viewing Your Service Units¶

You can view how many Service Units have been used to date under a Faculty Computing Allowance as well as an MOU allocation, or by a particular user account on Savio, either via our check_usage.sh script (see below) or the My BRC User Portal (by navigating to the FCA or MOU allocation project page, as well as the "Allocation Detail" page, within the portal).

Savio provides the check_usage.sh command line tool for you to check cluster usage by user or account.

Running check_usage.sh -E will report total usage by the current user, as well as a breakdown of their usage within each of their related project accounts, since the most recent reset/introduction date (normally June 1st of each year). To check usage for another user on the system, add the -u sampleusername option (substituting an actual user name for sampleusername in this example).

You can check usage for a project's account, rather than for an individual user's account, with the -a sampleprojectname option to this command (substituting an actual account name for sampleprojectname in this example).

Also, when checking usage for either users or accounts, you can display usage during a specified time period by adding start date (-s) and/or end date (-e) options, as in -s YYYY-MM-DD and -e YYYY-MM-DD (substituting actual Year-Month-Day values for YYYY-MM-DD in these examples). Run check_usage.sh -h for more information and additional options.

When checking usage for accounts that have overall usage limits (such as Faculty Computing Allowances), the value of the Service Units field is color-coded to help you see at a glance how much computational time is still available: green means your project has used less than 50% of its available SUs; yellow means your project has used more than 50% but less than 100% of its available SUs; and red means your project has used 100% or more of its available SUs (and has likely been disabled). Note that if you specify the starttime and/or endtime with -s and/or -e option(s) you will not get the color coded output.

Here are a couple of output samples from running this command line tool with user and project options, respectively, along with some tips on interpreting that output:

check_usage.sh -E -u sampleusername
Usage for USER sampleusername [2016-06-01T00:00:00, 2016-08-17T18:18:37]: 38 jobs, 1311.40 CPUHrs, 1208.16 SUs used
Usage for USER sampleusername in ACCOUNT co_samplecondoname [2016-06-01T00:00:00 2016-08-17T18:18:37]: 23 jobs, 857.72 CPUHrs, 827.59 SUs
Usage for USER sampleusername in ACCOUNT fc_sampleprojectname [2016-06-01T00:00:00 2016-08-17T18:18:37]: 15 jobs, 453.68 CPUHrs, 380.57 SUs

Total usage from June 1, 2016 through the early evening of August 17, 2016 by the sampleusername cluster user consists of 38 jobs run, using approximately 1,311 CPU hours, and resulting in usage of approximately 1208 Service Units. (The total number of Service Units is less than the total number of CPU hours in this example, because some jobs were run on older or otherwise less expensive hardware pools (partitions) which cost less than one Service Unit per CPU hour.)

Of that total usage, 23 jobs were run under the Condo project account co_samplecondoname, using approximately 858 CPU hours and 828 Service Units (but this tracking does not affect usage of a condo), and 15 jobs were run under the Faculty Computing Allowance project account fc_sampleprojectname, using approximately 454 CPU hours and 381 Service Units.

check_usage.sh -a fc_sampleprojectname
Usage for ACCOUNT fc_sampleprojectname [2016-06-01T00:00:00, 2016-08-17T18:19:15]: 156 jobs, 85263.80 CPUHrs, 92852.12 SUs used from an allocation of 300000 SUs.

Usage from June 1, 2016 through the early evening of August 17, 2016 by all cluster users of the Faculty Computing Allowance account fc_sampleprojectname consists of 156 jobs run, using a total of approximately 85,263 CPU hours, and resulting in usage of approximately 92,852 Service Units. (The total number of Service Units is greater than the total number of CPU hours in this example, because some jobs were run on hardware pools (partitions) which cost more than one Service Unit per CPU hour.) The total Faculty Computing Allowance allocation for this project's account is 300,000 Service Units, so there are approximately 207,148 Service Units still available for running jobs during the remainder of the current Allowance year (June 1 to May 31): 300,000 total Service Units granted, less 92,852 used to date. The total of 92,852 Service Units used to date is colored green, because this project's account has used less than 50% of its total Service Units available.

To also view individual usages by each cluster user of the Faculty Computing Allowance project account fc_sampleprojectname, you can add a -E option to the above command; e.g., check_usage.sh -E -a fc_sampleprojectname.

Finally, if your Faculty Computing Allowance has become completely exhausted, the output from running check_usage.sh command line tool will by default show only information for the period of time after your job scheduler account was disabled; for example:

Usage for ACCOUNT fc_sampleprojectname [2017-04-05T11:00:00, 2017-04-24T17:19:12]: 3 jobs, 0.00 CPUHrs, 0.00 SUs from an allocation of 0 SUs.
ACCOUNT fc_sampleprojectname has exceeded its allowance. Allocation has been set to 0 SUs.
Usage for USER sampleusername in ACCOUNT fc_sampleprojectname [2017-04-05T11:00:00, 2017-04-24T17:19:12]: 0 jobs, 0.00 CPUHrs, 0.00 (0%) SUs

To display the - more meaningful - information about the earlier usage that resulted in the Faculty Computing Allowance becoming exhausted, use the start date (-s) option and specify the most recently-passed June 1st - the first day of the current Allowance year - as that start date. E.g., to view usage for an Allowance that became exhausted anytime during the 2018-19 Allowance year, use a start date of June 1, 2018:

check_usage.sh -E -s 2018-06-01 -a fc_sampleprojectname

Note

Note that when a job is submitted, Slurm estimates the maximum number of Service Units (based on the time limit) that could be used by the job. So that maximum number will be reflected in SUs used output of check_usage.sh while the job is running. After the job finishes, unused SUs will be refunded to the account.

Getting More Compute Time¶

Are you running low on Service Units under your Faculty Computing Allowance? Or have exhausted them entirely? There are a number of options for getting more computing time for your project.