How to make sort using multiple threads to run faster?

sort supports --parallel N to run N thread. However, it seems it only uses around 100% CPU as I observed although the command specified that N threads can be used.

The command is as follows

cat large-file | sort --parallel `nproc`

where I have 16 from nproc.

How to make sort use multiple threads to run faster?

It is possibly caused by the pipe |: data are sent out as a stream. You can possibly make parallel take more effect by setting its buffer for sort like

cat large-file | sort --parallel=`nproc` -S 20G
-S, --buffer-size=SIZE
    use SIZE for main memory buffer

as from sort manual https://www.systutorials.com/docs/linux/man/1-sort/

Eric Ma

Eric is a systems guy. Eric is interested in building high-performance and scalable distributed systems and related technologies. The views or opinions expressed here are solely Eric's own and do not necessarily represent those of any third parties.

Leave a Reply

Your email address will not be published. Required fields are marked *