So there is no silver-bullet mechanism for testing the effectiveness of hash functions. Things like the avalanche effect are very important. Here we present the results of a brute force approach to testing the hash collision vulnerabilities of various hash functions. The hash functions to be tested are FNV (32bit), lookup3 (32bit), lookup3 (64bit), murmurhash2 (32bit), murmurhash2_160 (160bits), SHA1 (160bits), SuperFastHash (32bits), and SuperFastHash_64 (64bits). We test each hash function by generating 2^32 unique 24 byte strings, each of which gets hashed. The 2^32 24 byte strings are created by generating all possible 32bit unsigned integers and repeating six times; so the third unique 24 byte string will be 0x000000020000000200000002000000020000000200000002 and so on. We then record the value of each resulting 32bit hash in a 2^32 element array (read: buckets). At the end of the test we determine which percentage of array elements are empty. This percentage is also the percentage of hash collision. If a hash function returns multiple 32bit hashes then we test each of the resulting hash values as a separate test, e.g. a 64bit hash function would be tested twice; once for the lower 32bit hash result and once for the upper 32bit hash result. In addition, we test combinations of hash results in the same test by cycling through the hash results (read: round robin), e.g. a 64bit hash function would use the lower 32bit hash result for unique 24 byte string #1, and the upper 32bit hash result for unique 24 byte string #2, and the lower 32bit hash result for unique 24 byte string #3, and so on. Here are the results:
The best result is 36.8% hash collision for lookup3 (32bit; 4.4 million hashes/second), lookup3 (64bit; 4.4 million hashes/second), murmurhash2 (160bit; but not the round robin; 4.0 million hashes/second), and SHA1 (160bit; 0.9 million hashes/second). If anybody cares to comment, I’d be interested in understanding the math behind why it’s not possible to get a result with a lower hash collision rate than 36.8% for a large sample like this.