[ACCEPTED]-Checksumming: CRC or hash?-error-detection

Accepted answer
Score: 13

Only you can say whether 1-2-32 is good enough 51 or not for your application. The error 50 detection performance between a CRC-n and 49 n bits from a good hash function will be 48 very close to the same, so pick whichever 47 one is faster. That is likely to be the 46 CRC-n.


The above "That is likely to be 45 the CRC-n" is only somewhat likely. It 44 is not so likely if very high performance 43 hash functions are used. In particular, CityHash appears 42 to be very nearly as fast as a CRC-32 calculated 41 using the Intel crc32 hardware instruction! I 40 tested three CityHash routines and the Intel 39 crc32 instruction on a 434 MB file. The crc32 instruction 38 version (which computes a CRC-32C) took 37 24 ms of CPU time. CityHash64 took 55 ms, CityHash128 36 60 ms, and CityHashCrc128 50 ms. CityHashCrc128 35 makes use of the same hardware instruction, though 34 it does not compute a CRC.

In order to get 33 the CRC-32C calculation that fast, I had 32 to get fancy with three crc32 instructions on 31 three separate buffers in order to make 30 use of the three arithmetic logic units 29 in parallel in a single core, and then writing 28 the inner loop in assembler. CityHash is 27 pretty damned fast. If you don't have the 26 crc32 instruction, then you would be hard-pressed 25 to compute a 32-bit CRC as fast as a CityHash64 24 or CityHash128.

Note however that the CityHash 23 functions would need to be modified for 22 this purpose, or an arbitrary choice would 21 need to be made in order to define a consistent 20 meaning for the CityHash value on large 19 streams of data. The reason is that those 18 functions are not set up to accept buffered 17 data, i.e. feeding the functions a chunk 16 at a time and expecting to get the same 15 result as if the entire set of data were 14 fed to the function at once. The CityHash 13 functions would need to modified to update 12 an intermediate state.

The alternative, and 11 what I did for the quick and dirty testing, is 10 to use the Seed versions of the functions 9 where I would use the CityHash from the 8 previous buffer as the seed for the next 7 buffer. The problem with that is that the 6 result is then dependent on the buffer size. If 5 you feed CityHash different size buffers 4 with this approach, you get different hash 3 values.

Another Update four years later:

Even faster is the xxhash family. I would now 2 recommend that over a CRC for a non-cryptographic 1 hash.

Score: 0

Putting aside "performance" issues; you 2 might want to consider using one of the 1 SHA-2 functions (say SHA-256).

More Related questions