Google has open-sourced an internal project aimed at improving the speed of hashing for certain internal applications. The project, entitled “CityHash”, is written in C++ and provides hashing of strings suitable for hash tables and lookups.
The code provides two functions, CithHash64 and CityHash128, producing 64- and 128-bit hashes of strings respectively. “These functions aren’t suitable for cryptography, but our experience so far shows that they’re great for, say, hash tables” writes Geoff Pike and Jyrki Alakuijala, the two developers who worked on the project.
The code takes advantage of features found in modern CPUs that Google use in their datacenters, as well as what you would find in PCs and laptops; these include 64-bit registers, parallelism at the instruction level, and quick access to unaligned memory. The code is littered with comments hinting to the optimisation that has been performed on the code, such as: “Bitwise right rotate. Normally this will compile to a single instruction, especially if the shift is a manifest constant.”
“The disadvantage of our approach is that the code is more complicated than most popular alternatives. We decided to optimize for speed rather than simplicity and even included special cases for short inputs” write the developers. “Under real-life conditions we expect CityHash64 to outperform previous work by at least 30% in speed, and perhaps as much as a factor of two.”
They are encouraging developers looking for this kind of functionality to give the code a go in their application.
Comments