Skip to main content

Parallel and Multi-thread

GNU Parallel

Official: https://www.gnu.org/software/parallel/ 
Download: http://ftp.gnu.org/gnu/parallel/ 

Install

# CentOS 7
yum group install "Development Tools"
wget http://ftp.gnu.org/gnu/parallel/parallel-latest.tar.bz2
tar xjf parallel-latest.tar.bz2
cd parallel-*
./configure --prefix=/usr/local
make
make install

Don't change the --prefix if you want to use Man to view the manual of the command.

UsageUse case: The common use case I have is to bzip a large file using Parallels. Files with 40 millions rows (8 GB) are compressed to 400MB bz2 files.

cat largefile.csv | /usr/local/bin/parallel --pipe -k bzip2 --best > largefile.bz2

 

With Bash
# Multiple files  *.lst with lots of file paths such as
# cat 1.lst
# /path/to/file1 
# /path/to/file2
# /path/to/file3
#

files="$@"
## An arbitrary limiting factor so that there are some free processes
## in case I want to run something else
num_processes=3
#
echo "Parallel Processing with $num_processes threads (`date "+%F %T"`)"
for lst in $files
do
    for f in $(cat $lst | sed '/^#/d')
    do
        ((i=i%num_processes)); ((i++==0)) && wait
        yourshell.sh $f &
    done
    sleep 120
done