Back to Index

The MaybeBoost Hash Library: Performance

At this time there are no assembly- or intrinsic-using implementations of any of of the algorithms. There are a few portable changes applied that proved effective for the author, which may or may not help with any particular compiler setup.

Currently the implementations are about half the speed of GNU's coreutils or better. Here are the author's numbers for hashing a 244.25 MiB file (a plasma rendered in GIMP) 8 times using the example/hashsum.cpp program:
Algorithm GNU coreutils Boost.Hash1 Slower by
SHA-1 9.406s 16.119s 73%
MD5 6.330s 7.750s 23%
SHA-512 12.483 16.466 32%
1 Best time from table below

Here are a few general suggestions if you require high throughput.

Play with your Optimizer Settings

Compiler flags make a big difference, and not always in an intuitive way. As shown below, going from -O1 to -O3 can double throughput, but going from -O2 to -O3 can also half it.

Here are the author's numbers for hashing a 244.25 MiB file (a plasma rendered in GIMP) 8 times using the example/hashsum.cpp program compiled with a SVN version of g++ 4.5.1:
Algorithm -O1 -O2 -O3
Adler-32 1.996s 2.053s 2.013s
CRC-32/PNG 12.946s 8.050s 8.093s
CubeHash16/32-512 17.369s 16.559s 15.896s
CubeHash16/32-512 (No SSE2) 39.894s 45.487s 43.747s
MD5 8.220s 7.750s 7.890s
SHA-1 32.344s 18.512s 16.119s
SHA-512 29.121s 16.466s 31.011s

Try Providing Pointers for Input

Each hash algorithm defines the endianness used to convert the provided bits into words. When that byte order matches that of the host (Little-Endian for MD4, MD5, and CubeHash; Big-Endian for SHA, SHA-1, and SHA-2) and the input type is sized appropriately, the endianness can be handled with a simple memcpy.

For most users (those on x86 or amd64), this means that MD5 will perform best when given a char*s (or unsigned char*s) for input.

Use Random-Access Iterators or the _n Functions

Because most hash algorithms are block-oriented internally, the preprocessing stage can skip an internal buffering stage if it knows that a full block is available in the input.

Random-access iterators will automatically use the length, but if you're buffering from something else and have the length available, provide it.

Previous: Concepts Next: Validation


Copyright Scott McMurray 2010

Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt).