GNU parallel
GNU Parallel is a shell tool for executing jobs in parallel on one or multiple computers. It's a helpful tool for automating the parallelization of multiple (often serial) jobs, in particular allowing one to group jobs into a single SLURM submission to take advantage of the multiple cores on a given Savio node.
A job can be a single core serial task, multi-core job, or MPI application. A job can also be a command that reads from a pipe. The typical input is a list of input parameters needed as input for all the jobs collectively. GNU parallel can then split the input and pipe it into commands in parallel. GNU parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially, and output names can be easily tied to input file names for simple post-processing. This makes it possible to use output from GNU parallel as input for other programs.
Below we'll show basic usage of GNU parallel and then provide an extended example illustrating submission of a Savio job that uses GNU parallel.
For full documentation see the GNU parallel man page and GNU parallel tutorial.
Basic usage¶
To motivate usage of GNU parallel, consider how you might automate
running multiple individual tasks using a simple bash for loop. In
this case, our example command involves copying a file. We will copy file1.in
to file1.out
, file2.in
to file2.out
, etc.
for (( i=1; i <= 3; i++ )); do
cp file${i}.in file${i}.out
done
That's fine, but it won't run the tasks in parallel. Let's use GNU parallel to do it in parallel:
parallel -j 2 cp file{}.in file{}.out ::: 1 2 3
ls file*out
# file1.out file2.out file3.out
Based on -j
, that will use two cores to process the three tasks,
starting the third task when a core becomes free from having finished
either the first or second task. The
:::
syntax separates the input values: 1 2 3
from the command
being run. Each input value is used in place of {}
and the cp
command is run.
Some bells and whistles¶
We can use multiple inputs per task, distinguishing the inputs by
{1}
, {2}
, etc.:
parallel --link -j 2 cp file{1}.in file{2}.out ::: 1 2 3 ::: 4 5 6
ls file*out
# file4.out file5.out file6.out
Note that --link
is needed so that 1 is paired with 4, 2 with 5,
etc., instead of doing all possible pairs amongst the two sets.
Of course in many contexts we don't want to have to write out all the
input numbers. We can instead generate them using seq
:
parallel -j 2 cp file{}.in file{}.out ::: `seq 3`
We can use a file containing the list of inputs as a "task list"
instead of using the :::
syntax. Here we'll also illustrate the special
syntax {.}
to remove the filename extension.
parallel -j 2 -a task.lst cp {} {.}.out
task.lst
looks like this; it should have the parameter(s) for
separate tasks on separate lines:
file1.in
file2.in
file3.in
Next we could use a shell script instead of putting the command inline:
parallel -j 2 -a task.lst bash mycp.sh {} {.}.out
# copying file1.in to file1.out
# copying file2.in to file2.out
# copying file3.in to file3.out
Here's what mycp.sh
looks like:
#!/bin/bash
echo copying ${1} to ${2}
cp ${1} ${2}
We could also parallelize an arbitrary set of commands rather than using the same command on a set of arbitrary inputs.
parallel -j 2 < commands.lst
# hello
# wilkommen
# hola
Not surprisingly, here's the content of commands.lst
:
echo hello
echo wilkommen
echo hola
Finally, let's see how we would use GNU parallel within the context of a SLURM batch job.
To parallelize on one node, using all the cores on the node that are available to the SLURM job:
module load parallel/20220522
parallel -j $SLURM_CPUS_ON_NODE < commands.lst
To parallelize across all the cores on multiple nodes we need to use
the --slf
flag:
module load parallel/20220522
echo $SLURM_JOB_NODELIST |sed s/\,/\\n/g > hostfile
parallel -j $SLURM_CPUS_ON_NODE --slf hostfile < commands.lst
Request the same number of cores on each node
When using multiple nodes on partitions with per-core scheduling (e.g., savio4_htc
, savio3_htc
, savio3_gpu
), you should request the same number of cores on each node, because the parallel -j
syntax specifies a single number for how many cores to use per node. You can do this using --ntasks-per-node
(or --cpus-per-task
if your code is not threaded), rather than --ntasks
. For partitions scheduled on a per-node basis, this often won't be a concern because most partitions have nodes all with the same number of cores, but there are exceptions such as savio3
which is a mix of nodes with 32 or 40 cores.
Working directory when using --slf
When using multiple nodes, the working directory will be your home directory, unless you specify otherwise using the --wd
flag (see below for example usage), and NOT the directory from which parallel
was called.
Warning messages when using --slf
and --progress
If you use the --progress
flag with the --slf
flag, you'll probably see a warning like this:
parallel: Warning: Could not figure out number of cpus on n0021.savio1 (). Using 1.
This occurs because GNU parallel tries to count the cores on each node and this process fails if the parallel
module is not loaded on all the nodes available to your job. This should not be a problem for your job, provided you set the -j
flag to explicitly tell GNU parallel how many jobs to run in parallel on each node. (Also note that you can silence the warning by adding module load parallel/20220522
to your .bashrc file so that GNU parallel is on your PATH on all the nodes in your Slurm allocation.)
Extended example¶
Here we'll put it all together (and include even more useful syntax) to parallelize use of the bioinformatics software BLAST across multiple biological input sequences.
Here's our example task list, task.lst
:
../blast/data/protein1.faa
../blast/data/protein2.faa
<snip>
Here's the script we want to run BLAST on a single input file,
run-blast.sh
:
#!/bin/bash
blastp -query $1 -db ../blast/db/img_v400_PROT.00 -out $2 -outfmt 7 -max_target_seqs 10 -num_threads $3
Now let's use GNU parallel in the context of a SLURM job script:
#!/bin/bash
#SBATCH --job-name=job-name
#SBATCH --account=account_name
#SBATCH --partition=partition_name
#SBATCH --nodes=2
#SBATCH --cpus-per-task=2
#SBATCH --time=2:00:00
## Command(s) to run (example):
module load bio/blast-plus/2.14.1-gcc-11.4.0
module load parallel/20220522
export WDIR=/your/desired/path
cd $WDIR
# set number of jobs based on number of cores available and number of threads per job
export JOBS_PER_NODE=$(( $SLURM_CPUS_ON_NODE / $SLURM_CPUS_PER_TASK ))
echo $SLURM_JOB_NODELIST |sed s/\,/\\n/g > hostfile
parallel --jobs $JOBS_PER_NODE --slf hostfile --wd $WDIR --joblog task.log --resume --progress -a task.lst sh run-blast.sh {} output/{/.}.blst $SLURM_CPUS_PER_TASK
Some things to notice:
- Here BLAST will use multiple threads for each job, based on the
SLURM_CPUS_PER_TASK variable that is set based on the
-c
(or--cpus-per-task
) SLURM flag. - We programmatically determine how many jobs to run on each node, accounting for the threading.
- Setting the working directory with
--wd
is optional; without that your home directory will be used (if using multiple nodes via--slf
) or the current working directory will be used (if using one node). - The
--resume
and--joblog
flags allow you to easily restart interrupted work without redoing already completed tasks. - The
--progress
flag causes a progress bar to be displayed. - In this case, only one of the three inputs to
run-blast.sh
is provided in the task list. The second argument is determined from the first, after discarding the path and file extension, and the third is constant across tasks.