Linux Checksum Tools: Performance Comparison
When verifying file integrity on Linux, you’ll typically reach for one of several checksum utilities. A natural question: which one is fastest? The answer depends heavily on whether I/O or CPU is your bottleneck.
Disk I/O Dominates on Most Systems
In realistic scenarios with typical storage, disk I/O is almost always the limiting factor. Test this with a large file:
$ ls -lh largefile.bin
-rw-r--r-- 1 user user 15G Dec 15 10:28 largefile.bin
Testing three common checksums:
$ time sha256sum largefile.bin
a1b2c3d4e5f6... largefile.bin
real 1m21.143s
user 0m21.647s
sys 0m4.668s
$ time md5sum largefile.bin
e2e649030c795ffa9f33a99bcb39dde7 largefile.bin
real 1m27.392s
user 0m25.563s
sys 0m3.936s
$ time b2sum largefile.bin
abc123def456... largefile.bin
real 1m18.205s
user 0m19.341s
sys 0m4.782s
The differences are marginal — all three take roughly 80-90 seconds. Measure raw I/O performance:
$ time dd if=largefile.bin of=/dev/null bs=1M
15000+0 records in
15000+0 records out
15728640000 bytes (16 GB, 15 GiB) copied, 80.4203 s, 199 MB/s
real 1m20.447s
user 0m0.202s
sys 0m7.091s
The disk I/O alone accounts for 80 seconds. The checksum computation is nearly invisible. On spinning disk or SATA SSD, disk I/O is the bottleneck — not the algorithm.
CPU-Limited Benchmarking
If you have fast NVMe or high-throughput storage, disk I/O becomes less of a constraint. Algorithm efficiency then matters. Create a test file in a RAM disk to eliminate I/O latency:
$ mkdir -p /mnt/ramdisk
$ mount -t tmpfs -o size=8G tmpfs /mnt/ramdisk
$ head -c 3G /dev/zero > /mnt/ramdisk/testfile
$ for algo in md5sum sha256sum b2sum; do
echo "=== $algo ==="
time $algo /mnt/ramdisk/testfile
done
Results on a modern 12-core system:
=== md5sum ===
real 0m5.103s
user 0m4.697s
sys 0m0.409s
=== sha256sum ===
real 0m8.451s
user 0m8.082s
sys 0m0.372s
=== b2sum ===
real 0m3.205s
user 0m2.931s
sys 0m0.482s
In CPU-limited scenarios (3GB of RAM disk data):
- b2sum: ~937 MB/s
- md5sum: ~587 MB/s
- sha256sum: ~355 MB/s
Algorithm Security Properties
When choosing a checksum tool, speed shouldn’t be your primary concern. Security properties matter more:
- MD5: Cryptographically broken. Vulnerable to collision attacks. Don’t use for integrity verification of untrusted sources.
- SHA-1: Theoretically broken. Still useful for non-adversarial integrity checks, but avoid for security-sensitive work.
- SHA-256: Industry standard. No known practical attacks. Good choice for most scenarios.
- BLAKE2b/BLAKE3: Modern, fast, and cryptographically secure. Excellent for new projects and high-performance requirements.
Practical Commands
For most file verification tasks:
$ sha256sum myfile
For high-performance environments (NVMe, fast storage):
$ b2sum myfile
For recursive directory verification:
$ find /path -type f -exec sha256sum {} + > checksums.txt
$ sha256sum -c checksums.txt
To parallelize checksum computation across multiple files:
$ parallel sha256sum {} ::: file1 file2 file3 file4
Or with GNU xargs:
$ find /path -type f -print0 | xargs -0 -P 4 -I {} sha256sum {}
The -P 4 flag uses 4 parallel processes. Adjust based on your CPU core count.
Verifying Checksums in Batch
When you have a pre-generated checksum file (e.g., from a download):
$ sha256sum -c SHA256SUMS
file1: OK
file2: OK
file3: FAILED
The tool automatically detects the algorithm from the checksum format. You can verify specific files only:
$ sha256sum -c SHA256SUMS --ignore-missing
When Speed Actually Matters
If you’re computing checksums on multi-terabyte datasets frequently:
Upgrade your storage first. Move from spinning disk to NVMe. This provides 5-10x performance gains and typically costs less than the salary time spent optimizing code.
Use parallel processing. Leverage multiple CPU cores for independent files. The parallel command or GNU xargs -P scales naturally.
Choose BLAKE2b or BLAKE3. These provide 2-3x better throughput than SHA-256 on modern CPUs without sacrificing cryptographic security. BLAKE3 is particularly attractive for new projects—it’s even faster than BLAKE2b and supports parallel hashing of large files.
For very large single files on modern NVMe, consider BLAKE3’s native parallel mode:
$ b3sum largefile.bin # Uses all CPU cores automatically
The Bottom Line
On typical systems with mechanical or SATA storage, your checksum tool choice has minimal impact. Disk I/O is the overwhelming bottleneck. Only when storage approaches 1GB/s throughput does algorithm efficiency become meaningful. In those cases, BLAKE2b/BLAKE3 pull ahead, but SHA-256 remains a practical choice for most deployments.

On your system the bottleneck is disk, though increasingly this is moving up with the advance of SSDs. In that case the computational overhead becomes significant. You can quantify that on your system by avoid disk with something like:
for chk in crc32 md5sum sha1sum; do time head -c 1G /dev/zero | $chk; done
Note that sha1sum and md5sum use system specific instructions for significant speedups on systems congifured –with-openssl (as is the default on arch, fedora, centos7, gentoo at least).
Hi Pádraig,
That’s a good point.
I did some tests on the same system (Fedora 22 x86-64) by doing checksums on a file under /dev/shm/. The results you can find in http://www.systutorials.com/136737/which-checksum-tool-on-linux-is-faster/#what-if-i.2Fo-was-not-the-bottleneck . crc32 turns to be the fastest one.
Hi Eric,
I hope that you know that collisions exist in crc32, md5sum and even sha-0 checksums. But not yet for sha-1 which you actually used. Since I found these collision problems, I only use sha1sum and better (sha224, sha256, sha384 or sha512) for my verifications when I can.
http://preshing.com/20110504/hash-collision-probabilities/
http://www.mathstat.dal.ca/~selinger/md5collision/
https://en.wikipedia.org/wiki/SHA-0
Nice and informative website by the way.
Hi Simard,
Thanks!
Although this post is mainly talking about the speed, that’s a good point taking the collisions into consideration. I will add a note in the post mentioning your comment.
‘sum -s filename’ is significantly faster than all of these.
uni@box:~$ ls -lh kali-linux-1.0.3-i386.iso
-rwxrwxrwx 1 uni uni 2.3G Jun 22 2013 kali-linux-1.0.3-i386.iso
uni@box:~$ time crc32 kali-linux-1.0.3-i386.iso
bd3a7323
real 0m12.701s
user 0m5.263s
sys 0m1.033s
uni@box:~$ time sum kali-linux-1.0.3-i386.iso
11559 2387392
real 0m4.270s
user 0m3.986s
sys 0m0.280s
uni@box:~$ time sum -s kali-linux-1.0.3-i386.iso
47724 4774784 kali-linux-1.0.3-i386.iso
real 0m1.241s
user 0m0.972s
sys 0m0.268s
uni@box:~$ sum –version|head -1
sum (GNU coreutils) 8.21
uni@box:~$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 42
Stepping: 7
CPU MHz: 1674.878
BogoMIPS: 6600.22
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 6144K
NUMA node0 CPU(s): 0-3
Nice to know these numbers desromic!
The `cksum`/`sum -s` which is also a CRC tool as `crc32` seems much faster thatn `crc32`.
One note that the CRC algorithms have the same problems being “useless as secure indicator of intentional manipulation of the data” as discussed in
Simard’s comment http://www.systutorials.com/136737/which-checksum-tool-on-linux-is-faster/#comment-76996 and also discussions at http://www.derkeiler.com/Newsgroups/sci.crypt/2003-07/1451.html :
For many years I have found md5sum to consistently be faster than sha1sum, so I was very surprised when I read this article.
I just tried it again on a file of size 295G and got this:
md5sum
real 10m20.952s
the same file for
sha1sum
real 15m15.332s
This is consistent with what I seem to always see.
Thanks for sharing the numbers. My inference is it depends on the machine used. The CPU, memory and disks taking together matter (assuming the good enough optimization already applied in the implementation). It is okay to assume disk I/O (e.g. SSD) could be faster enough to sustain the CPU and memory. Then the main factors are CPU and memory. md5sum size is smaller and likely has less pressure to the memory and memory bus systems. However, the modern CPU architecture and software implementation seems have better optimizations for sha1sum (sha) computation.
I do not have the original machine I did the test any more. But would you like to do the test as in the post on the machine you used to see how it performs and also share with us the machine details? Thanks.