ht_helper Script

Note

GNU parallel is a widely-used, community-supported alternative to HT Helper. You're welcome to use HT Helper, but we if you're just getting started with aggregating tasks into a single job, we suggest use of GNU parallel.

Overview

If you have a large number of short computational tasks that you would like to perform on the cluster, Savio’s HT Helper tool allows you to easily run all of those tasks as a single Savio job submission that efficiently makes use of all the CPU cores that your job requests. Typical applications for which HT Helper is suitable include parameter/configuration scanning, stratified analyses, and divide and conquer approaches. This type of computation is called High Throughput Computing (but note that this is not directly related to the HTC nodes on Savio, although one can use HT Helper in the HTC partition).

Using HT Helper has these benefits:

  • it uses all the cores on a node even if each computational task is serial (uses one core) or only needs a few cores,
  • it systematically processes many computational tasks as a single job for ease of management, and
  • it avoids overloading the scheduler with thousands of jobs (the scheduler is not designed to handle that kind of load)

The basic idea of HT Helper is to start up a single job using the “ht_helper.sh” script and to cycle through all of your computational tasks within that one SLURM job. (More technically, the “ht_helper.sh” script fires off a FIFO mini scheduler within the real SLURM scheduler allocation and then cycles through all the tasks within the allocation by using the mini scheduler.) Note that the individual tasks could be either serial or parallel.

For example, you might have 1000 serial tasks and use HT Helper to complete those tasks on two Savio compute nodes with a total of 48 cores. At any given time, 48 of the tasks will be active and when a task finishes, HT Helper will dispatch the next task to the core that is available.

Setting up and submitting an HT Helper job

To use *ht_helper.sh* we need a *taskfile*. Often this taskfile will have a single line and we will tell ht_helper.sh to run that line multiple times.  

Here’s an example of a taskfile.

./compute.py --id=$HT_TASK_ID --size=1000 --path=exp_output1

Each task will be uniquely identified by a different id using the --id flag, which is passed into the Python code file, *compute.py*. HT Helper will set the HT_TASK_ID environment variable to a different value for each task as discussed below. Note that ht_helper.sh will work without any use of unique identifiers, but in many cases it will be natural to write your code to rely on a unique identifier to distinguish what each task is supposed to do.

Next we submit our SLURM job with a job script containing a single call to ht_helper.sh that will fire off the execution of all of the tasks in the taskfile for us.
#!/bin/bash
# Job name:
#SBATCH --job-name=test
#
# Account:
#SBATCH --account=account_name
#
# Partition:
#SBATCH --partition=savio2
#
# Tasks per node
#SBATCH --ntasks-per-node=24
#
# Nodes
#SBATCH --nodes=2
#
# Wall clock limit:
#SBATCH --time=00:00:30
#
## Command(s) to run:
module load gcc openmpi # or module load intel openmpi
ht_helper.sh -m "python/2.7" -t taskfile -r 500

 

In all cases, you need to load the openmpi module for ht_helper.sh to work.

Here we have asked HT Helper to run 500 tasks using the -r flag. HT_TASK_ID will be set to 0,1,2,...,499 for these tasks. If we want to number the tasks differently, we can use the -i flag, e.g., “-i 1-400,501-600” if we want id values from one to 400 and also from 501 to 600.

Note that if you wanted each task to use more than one core, you would need to use the SLURM --cpus-per-task flag. (Note that in most cases you do not need to set the -n flag of ht_helper -- that is only used for ht_helper tasks that are MPI jobs, in which case -n should give the number of MPI processes you want to run for each ht_helper task.)

Please see the Running Your Jobs page for details on SLURM job submission with multiple tasks and multiple CPUs per task. More information on the ht_helper.sh flags can be found by running

ht_helper.sh -h

You can also have multiple lines in your taskfile if you need to have different syntax for the different tasks. Later in this document we show how you can create such a taskfile programmatically (i.e., generating the task file using a script), to avoid manually typing the file yourself.

An example computation

Here we’ll see an example Python script that executes an individual task. The script needs to parse the input arguments given in the taskfile and operate based on the id of the individual task for which it is being called. Also note that we write the output for each task to a separate file (as a simple way to avoid collisions in writing to a single output file; see below for an alternative) and then we can post-process the files to collect our results.

#!/usr/bin/env python

def calculate(i, n, m, sd): # function to carry out the core computation
   np.random.seed(i)  # careful, this doesn't guarantee truly independent draws across the various calls to calculate() with different 'i' values
   mat = np.random.normal(m, sd, size = (n,n))
   C = mat.T.dot(mat)
   vals = np.linalg.eigvalsh(C)
   out1 = sum(np.log(vals))
   out2 = vals[n-1]/vals[0]
   return(out1, out2)

if __name__ == '__main__':
   import argparse
   import numpy as np    # parse the input arguments to the script
   parser = argparse.ArgumentParser()
   parser.add_argument('-i', '--id', help = "ID of run")
   parser.add_argument('-n', '--size', default=1000, help = "matrix dimension")
   parser.add_argument('-m', '--mean', default=0,
       help='mean of matrix elements')
   parser.add_argument('-s', '--sd', default=1,
       help='standard deviation of matrix elements')
   parser.add_argument('-p', '--path', default='.',
       help='path to write output files to')
   args = parser.parse_args()    # carry out the computation for this task based on ‘id’
   out = calculate(int(args.id), int(args.size), float(args.mean), float(args.sd))    # write the output for this task
   file = open(args.path + "/output" + args.id + ".txt", "w")
   file.write("%s,%s\n" % (out))

We could also omit the passing of the --id flag in the task file and the parsing of the --id flag in the Python code and instead read the environment variable HT_TASK_ID directly in the Python session and pass that value along to the calculate() function.

Here's how we might post-process in this simple situation:

cat exp_output1/* >> exp_output1_final

If you have a large number of tasks, you may not want to have one output file per task. If you'd like to have all the tasks write to a common file, you’ll need to deal with the fact that multiple tasks may try to write to the common file at the same time, which can cause problems. Therefore, you'll want to lock the file while a given task is writing to it to prevent the other tasks from modifying the file while the given task is writing. Here’s an example Python function that will write output to the file while locking it to prevent other tasks from writing to the file at the same time.

def writeText(txt):
"""
Write text with package logging into to locked file. Example usage: writeText(“Here is the output from my current task.”)
"""
   lg = open ('fileName', 'a')
   # lock the file
   fcntl.flock (lg.fileno(), fcntl.LOCK_EX)
   # seek to the end of the file
   lg.seek (0, 2)
   # write the entry
   lg.write (txt + "\n")
   # close the file
   lg.close ()
   return

Generating your taskfile programmatically

Generally if one has a taskfile with many lines, one would programmatically generate the taskfile. Here’s some example Python code that creates a taskfile with 100 tasks, 50 of one type and 50 of another type, where the id goes from 1 to 50 for each group of tasks (rather than using HT_TASK_ID as done previously).

m = 50
n = 1000
file = open("taskfile", "w")
for i in range(1,(m+1)):
   file.write("./compute.py --id " + str(i) + " --size " + str(n) + " -p exp_output1\n")
n = 2000
for i in range(1,(m+1)):
   file.write("./compute.py --id " + str(i) + " --size " + str(n) + " -p exp_output2\n")

Additional details for using ht_helper.sh

Here’s how to see the various options you can use with ht_helper.sh in your job submission script.

[user@ln001 ~]$ ht_helper.sh -h
Usage: /global/home/groups/allhands/bin/ht_helper.sh [-hLv] [-e variables] [-f hostfile] [-i list] [-l launcher] [-m modules] [-n # of slots per task] [-o launcher options] [-p # of parallel tasks] [-r # of repeat] [-s sleep] [-t taskfile] [-w workdir]
   -e    provide env variables to be populated for tasks (comma separated)
   -f    provide a hostfile with list of slots, one per line
   -h    this help page
   -i    provide list of tasks from the taskfile to run, e.g., 1-3,5,7-9
   -l    override system launcher (mpirun only for now)
   -L    log task stdout/stderr to individual files
   -m    provide env modules to be loaded for tasks (comma separated)
   -n    provide number of slots per task; this would indicate the number of MPI processes per ht_helper task -- if the tasks are not MPI jobs this should be omitted or set to 1
   -o    provide extra launcher options, e.g., "-mca btl openib,sm,self"
   -p    provide number of parallel tasks
   -r    provide repeat number for taskfile
   -s    interval between checks (default to 60s)
   -t    provide a taskfile with list of tasks, one per line (required)
         task could be a binary executable, or a script
         multiple steps within the same task can be semicolon separated, but they have to remain on the same line
         env variable HT_TASK_ID (starts from 0) can be used with individual tasks
   -v    verbose mode
   -w    provide work directory (default to current directory)

If you are running MPI-type tasks, please make sure not to have the mpirun command in the taskfile. Instead, you only need to provide the actual executable and input options. If mpirun command line options are required, please provide them via the "-o" option. For users running parallel tasks, please make sure to turn off CPU affinity settings, if any, to avoid conflicts and serious oversubscription of CPUs.

The next important parameter is the "-n" option - how many MPI processes you want to allocate for each task, the default value is "1" for serial or non-MPI threaded tasks. If you are running short-duration tasks (less than a few minutes), you may also want to reduce the default mini scheduler check interval from 60 seconds to a smaller value with the "-s" option.

Please do not specify the hostfile with "-f" option as that may conflict with the default SLURM allocation (the -f flag is provided because HT Helper works even without a scheduler).

To get familiar with using HT Helper, you may want to turn on "-L" (log output from each task to an individual file) and "-v" (verbose mode) options so that you can better understand how it works. After you are familiar with the process, you can choose which options to use; we recommend "-v".