Skip to content

GNU parallel

GNU Parallel is a shell tool for executing jobs in parallel on one or multiple computers. It's a helpful tool for automating the parallelization of multiple (often serial) jobs, in particular allowing one to group jobs into a single SLURM submission to take advantage of the multiple cores on a given Savio node.

A job can be a single core serial task, multi-core job, or MPI application. A job can also be a command that reads from a pipe. The typical input is a list of input parameters needed as input for all the jobs collectively. GNU parallel can then split the input and pipe it into commands in parallel. GNU parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially, and output names can be easily tied to input file names for simple post-processing. This makes it possible to use output from GNU parallel as input for other programs.

Below we'll show basic usage of GNU parallel and then provide an extended example illustrating submission of a Savio job that uses GNU parallel.

For full documentation see the GNU parallel man page and GNU parallel tutorial.

Basic usage

To motivate usage of GNU parallel, consider how you might automate running multiple individual tasks using a simple bash for loop. In this case, our example command involves copying a file. We will copy file1.in to file1.out, file2.in to file2.out, etc.

for (( i=1; i <= 3; i++ )); do
    cp file${i}.in file${i}.out
done

That's fine, but it won't run the tasks in parallel. Let's use GNU parallel to do it in parallel:

parallel -j 2 cp file{}.in file{}.out ::: 1 2 3
ls file*out
# file1.out  file2.out  file3.out

Based on -j, that will use two cores to process the three tasks, starting the third task when a core becomes free from having finished either the first or second task. The ::: syntax separates the input values: 1 2 3 from the command being run. Each input value is used in place of {} and the cp command is run.

Some bells and whistles

We can use multiple inputs per task, distinguishing the inputs by {1}, {2}, etc.:

parallel --link -j 2 cp file{1}.in file{2}.out ::: 1 2 3 ::: 4 5 6
ls file*out
# file4.out  file5.out  file6.out

Note that --link is needed so that 1 is paired with 4, 2 with 5, etc., instead of doing all possible pairs amongst the two sets.

Of course in many contexts we don't want to have to write out all the input numbers. We can instead generate them using seq:

parallel -j 2 cp file{}.in file{}.out ::: `seq 3`

We can use a file containing the list of inputs as a "task list" instead of using the ::: syntax. Here we'll also illustrate the special syntax {.} to remove the filename extension.

parallel -j 2 -a task.lst cp {} {.}.out

task.lst looks like this; it should have the parameter(s) for separate tasks on separate lines:

file1.in
file2.in
file3.in

Next we could use a shell script instead of putting the command inline:

parallel -j 2 -a task.lst bash mycp.sh {} {.}.out
# copying file1.in to file1.out
# copying file2.in to file2.out
# copying file3.in to file3.out

Here's what mycp.sh looks like:

#!/bin/bash 
echo copying ${1} to ${2}
cp ${1} ${2}

We could also parallelize an arbitrary set of commands rather than using the same command on a set of arbitrary inputs.

parallel -j 2 < commands.lst
# hello
# wilkommen
# hola

Not surprisingly, here's the content of commands.lst:

echo hello
echo wilkommen
echo hola

Finally, let's see how we would use GNU parallel within the context of a SLURM batch job.

To parallelize on one node, using all the cores on the node that are available to the SLURM job:

module load parallel/20220522
parallel -j $SLURM_CPUS_ON_NODE < commands.lst

To parallelize across all the cores on multiple nodes we need to use the --slf flag:

module load parallel/20220522
echo $SLURM_JOB_NODELIST |sed s/\,/\\n/g > hostfile
parallel -j $SLURM_CPUS_ON_NODE --slf hostfile < commands.lst

Request the same number of cores on each node

When using multiple nodes on partitions with per-core scheduling (e.g., savio4_htc, savio3_htc, savio3_gpu), you should request the same number of cores on each node, because the parallel -j syntax specifies a single number for how many cores to use per node. You can do this using --ntasks-per-node (or --cpus-per-task if your code is not threaded), rather than --ntasks. For partitions scheduled on a per-node basis, this often won't be a concern because most partitions have nodes all with the same number of cores, but there are exceptions such as savio3 which is a mix of nodes with 32 or 40 cores.

Working directory when using --slf

When using multiple nodes, the working directory will be your home directory, unless you specify otherwise using the --wd flag (see below for example usage), and NOT the directory from which parallel was called.

Warning messages when using --slf and --progress

If you use the --progress flag with the --slf flag, you'll probably see a warning like this:

parallel: Warning: Could not figure out number of cpus on n0021.savio1 (). Using 1.

This occurs because GNU parallel tries to count the cores on each node and this process fails if the parallel module is not loaded on all the nodes available to your job. This should not be a problem for your job, provided you set the -j flag to explicitly tell GNU parallel how many jobs to run in parallel on each node. (Also note that you can silence the warning by adding module load parallel/20220522 to your .bashrc file so that GNU parallel is on your PATH on all the nodes in your Slurm allocation.)

Extended example

Here we'll put it all together (and include even more useful syntax) to parallelize use of the bioinformatics software BLAST across multiple biological input sequences.

Here's our example task list, task.lst:

../blast/data/protein1.faa
../blast/data/protein2.faa
<snip>

Here's the script we want to run BLAST on a single input file, run-blast.sh:

#!/bin/bash
blastp -query $1 -db ../blast/db/img_v400_PROT.00 -out $2  -outfmt 7 -max_target_seqs 10 -num_threads $3

Now let's use GNU parallel in the context of a SLURM job script:

#!/bin/bash
#SBATCH --job-name=job-name
#SBATCH --account=account_name
#SBATCH --partition=partition_name
#SBATCH --nodes=2
#SBATCH --cpus-per-task=2
#SBATCH --time=2:00:00

## Command(s) to run (example):
module load bio/blast-plus/2.14.1-gcc-11.4.0
module load parallel/20220522

export WDIR=/your/desired/path
cd $WDIR

# set number of jobs based on number of cores available and number of threads per job
export JOBS_PER_NODE=$(( $SLURM_CPUS_ON_NODE / $SLURM_CPUS_PER_TASK ))

echo $SLURM_JOB_NODELIST |sed s/\,/\\n/g > hostfile

parallel --jobs $JOBS_PER_NODE --slf hostfile --wd $WDIR --joblog task.log --resume --progress -a task.lst sh run-blast.sh {} output/{/.}.blst $SLURM_CPUS_PER_TASK

Some things to notice:

  • Here BLAST will use multiple threads for each job, based on the SLURM_CPUS_PER_TASK variable that is set based on the -c (or --cpus-per-task) SLURM flag.
  • We programmatically determine how many jobs to run on each node, accounting for the threading.
  • Setting the working directory with --wd is optional; without that your home directory will be used (if using multiple nodes via --slf) or the current working directory will be used (if using one node).
  • The --resume and --joblog flags allow you to easily restart interrupted work without redoing already completed tasks.
  • The --progress flag causes a progress bar to be displayed.
  • In this case, only one of the three inputs to run-blast.sh is provided in the task list. The second argument is determined from the first, after discarding the path and file extension, and the third is constant across tasks.