9 Comments

  1. On your system the bottleneck is disk, though increasingly this is moving up with the advance of SSDs. In that case the computational overhead becomes significant. You can quantify that on your system by avoid disk with something like:

    for chk in crc32 md5sum sha1sum; do time head -c 1G /dev/zero | $chk; done

    Note that sha1sum and md5sum use system specific instructions for significant speedups on systems congifured –with-openssl (as is the default on arch, fedora, centos7, gentoo at least).

  2. Hi Eric,

    I hope that you know that collisions exist in crc32, md5sum and even sha-0 checksums. But not yet for sha-1 which you actually used. Since I found these collision problems, I only use sha1sum and better (sha224, sha256, sha384 or sha512) for my verifications when I can.

    http://preshing.com/20110504/hash-collision-probabilities/
    http://www.mathstat.dal.ca/~selinger/md5collision/
    https://en.wikipedia.org/wiki/SHA-0

    Nice and informative website by the way.

    1. Hi Simard,

      Thanks!

      Although this post is mainly talking about the speed, that’s a good point taking the collisions into consideration. I will add a note in the post mentioning your comment.

  3. ‘sum -s filename’ is significantly faster than all of these.

    uni@box:~$ ls -lh kali-linux-1.0.3-i386.iso
    -rwxrwxrwx 1 uni uni 2.3G Jun 22 2013 kali-linux-1.0.3-i386.iso
    uni@box:~$ time crc32 kali-linux-1.0.3-i386.iso
    bd3a7323

    real 0m12.701s
    user 0m5.263s
    sys 0m1.033s
    uni@box:~$ time sum kali-linux-1.0.3-i386.iso
    11559 2387392

    real 0m4.270s
    user 0m3.986s
    sys 0m0.280s
    uni@box:~$ time sum -s kali-linux-1.0.3-i386.iso
    47724 4774784 kali-linux-1.0.3-i386.iso

    real 0m1.241s
    user 0m0.972s
    sys 0m0.268s
    uni@box:~$ sum –version|head -1
    sum (GNU coreutils) 8.21
    uni@box:~$ lscpu
    Architecture: x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Byte Order: Little Endian
    CPU(s): 4
    On-line CPU(s) list: 0-3
    Thread(s) per core: 1
    Core(s) per socket: 4
    Socket(s): 1
    NUMA node(s): 1
    Vendor ID: GenuineIntel
    CPU family: 6
    Model: 42
    Stepping: 7
    CPU MHz: 1674.878
    BogoMIPS: 6600.22
    Virtualization: VT-x
    L1d cache: 32K
    L1i cache: 32K
    L2 cache: 256K
    L3 cache: 6144K
    NUMA node0 CPU(s): 0-3

  4. One note that the CRC algorithms have the same problems being “useless as secure indicator of intentional manipulation of the data” as discussed in

    Simard’s comment http://www.systutorials.com/136737/which-checksum-tool-on-linux-is-faster/#comment-76996 and also discussions at http://www.derkeiler.com/Newsgroups/sci.crypt/2003-07/1451.html :

    While properly designed CRC’s are good at detecting random errors in
    the data (due to e.g. line noise), the CRC is useless as a secure
    indicator of intentional manipulation of the data. And this is
    because it’s not hard at all to modify the data to produce any CRC
    you desire (e.g. the same CRC as the original data, to try to
    disguise your data manipulation).

  5. For many years I have found md5sum to consistently be faster than sha1sum, so I was very surprised when I read this article.
    I just tried it again on a file of size 295G and got this:
    md5sum
    real 10m20.952s
    the same file for
    sha1sum
    real 15m15.332s

    This is consistent with what I seem to always see.

    1. Thanks for sharing the numbers. My inference is it depends on the machine used. The CPU, memory and disks taking together matter (assuming the good enough optimization already applied in the implementation). It is okay to assume disk I/O (e.g. SSD) could be faster enough to sustain the CPU and memory. Then the main factors are CPU and memory. md5sum size is smaller and likely has less pressure to the memory and memory bus systems. However, the modern CPU architecture and software implementation seems have better optimizations for sha1sum (sha) computation.

      I do not have the original machine I did the test any more. But would you like to do the test as in the post on the machine you used to see how it performs and also share with us the machine details? Thanks.

Leave a Reply

Your email address will not be published. Required fields are marked *