Universal Hash Functions
Mathematics Accomplishment | 1977 - 1979
IBM researchers: J. Lawrence Carter, Mark Wegman
Where the work was done: IBM T.J. Watson Research Center
What we accomplished: Develops a set of efficient hash functions allowing linear time retrieval of data, which is useful in compilers, databases, cryptography and many other things. Hash functions compute a hash value for something big, and typically produce a value a lot shorter than the item being hashed. So it's a many-to-one function. At the same time, however, we want any two things that are different to map to different hash values. Thus there can be no worse- case guarantees about the number of collisions a given hash function has for an arbitrary set of items to be hashed. Universal Hash functions solves this conundrum by introducing classes of hash functions with the property that for any given set most of them will have few collisions.
Related links:
- "Universal Classes of Hash Functions," Journal of Computer and System Sciences 18, p.143-154, 1979
- "This paper gives an input independent average linear time algorithm for storage and retrieval on keys. The algorithm makes a random choice of hash function from a suitable class of hash functions. Given any sequence of inputs the expected time (averaging over all functions in the class) to store and retrieve elements is linear in the length of the sequence. The number of references to the data base required by the algorithm for any input is extremely close to the theoretical minimum for any possible hash function with randomly distributed inputs."
- 1977 extended abstract in ACM STOC [Symposium on the Theory of Computation]
Image credit: Data Structures: Hashing by Uri Zwick
BACK TO IBM RESEARCH ACCOMPLISHMENTS