Run and Review Your Jobs

Overview

To submit and run jobs, cancel jobs, and check the status of jobs on the Savio cluster, you'll use the Simple Linux Utility for Resource Management (SLURM), an open-source resource manager and job scheduling system. (SLURM manages jobs, job steps, nodes, partitions (groups of nodes), and other entities on the cluster.)

There are several basic SLURM commands you'll likely use often:

sbatch - Submit a job to the batch queue system, e.g., sbatch myjob.sh, where myjob.sh is a SLURM job script
srun - Submit an interactive job to the batch queue system
scancel - Cancel a job, e.g., scancel 123, where 123 is a job ID
squeue - Check the current jobs in the batch queue system, e.g., squeue -u $USER to view your own jobs
sq - Check why your job is not running, e.g., module load sq; sq
sacctmgr - Check what resources (accounts, partitions, and QoS) you or your FCA or Condo project have access to, e.g., sacctmgr -p show associations user=$USER or sacctmgr -p show associations account=project_name
sinfo - View the status of the cluster's compute nodes, including how many nodes - of what types - are currently available for running jobs.
sacct - Display accounting data for your submitted jobs and job steps from the SLURM job accounting log or SLURM database. This command allows one to inspect jobs which are already completed (although you can look at queued and running jobs with this command as well).

Please see the following for detailed information on: