Copyright © Philip M. Parker, INSEAD. Terms of Use.

| Domain | Definition |
Computing | Hash function |
Math | A function that maps keys to integers, usually to get an even distribution on a smaller set of values. (references) |
Source: compiled by the editor from various references; see credits. | |
(From Wikipedia, the free Encyclopedia)
A hash function is a function that converts an input from a (typically) large domain into an output in a (typically) smaller range (the hash value, often a subset of the integers). Hash functions vary in the domain of their inputs and the range of their outputs and in how patterns and similarities of input data affect output data. Hash functions are used in hash tables, cryptography and data processing.
Formally speaking, a hash function is defined by its domain (typically variable length strings of bytes), its range (typically fixed length bit-sequences) and the defining function (let's call this function H). The key desired characteristics of a hash function in general are that H(x) ≠ H(y) implies x ≠ y and that H(x) = H(y) probably implies that x = y. Unfortunately, the use of probably here is just too inexact to let us succeed. If the set of all possible values of H(x) is much smaller than the set of all possible values of x, then this latter condition cannot be true if all sequences are equally likely. Thus, the second condition is often restated in different ways to make it plausible. For instance, cryptographic hashes use the second condition in the form that given x, it is computationally very difficult to find y such that H(x) = H(y). Additionally, it is desirable that if we are given x and H(x + s) where + can be bit changing or concatenation, then we can't find s short of exhaustive enumeration. In practice, for most applications other than error correction, cryptographic hashes serve very nicely. The MD5 and SHA-1 algorithms are two of the most popular algorithms although any cryptosystem can be used to create a hash function (as, indeed, any cryptographically secure hash can be used to create a cryptosystem).
For error correction, a distribution of likely perturbations is assumed at least approximately. Perturbations to a string are then classified into large (improbable) and small (probable) errors. The second criterion is then restated so that if we are given H(x) and x+s, then we can compute x efficiently if s is small. Such hash functions are known as error correction codes. Important sub-class of these correction codes are cyclic redundancy checks and Reed-Solomon codes.
For audio identification such as finding out whether an MP3 file matches one of a list of known items, one could use a conventional hash function such as MD5, but this would be very sensitive to highly likely perturbations such as time-shifting, CD read errors, different compression algorithms or implementations or changes in volume. Using something like MD5 is useful as a first pass to find exactly identical files, but another more advanced algorithm is required to find all identical items. Contrary to an assumption often voiced by people who don't follow the industry carefully, hashing algorithms do exist that are robust to these minor differences. Most of the algorithms available are not extremely robust, but some are so robust that they can identify music played on loud-speakers in a noisy room.
The term "hash" apparently comes by way of analogy with its standard meaning in the physical world, to "chop and mix." In the SHA-1 algorithm, for example, the domain is "flattened" and "chopped" into "words" which are then "mixed" with one another using carefully chosen mathematical functions. The range ("hash value") is made to be a definite size, 160 bits (which may be either smaller or larger than the domain), through the use of modular addition.
Hash tables, a major application for hash functions, speed up the lookup of a record of information, given a key (note: this use of the word key is separate from its meaning as a code for encrypting data). For example, an alphabetic string might be used to look up an employee record in a database system.
Hashing can provide almost direct access to records, which means that, on average, a lookup can require just one or two probes into the memory or file containing the records. (At the other extreme is the worst case, in which lookup proceeds by comparing the key with each record in turn -- this linear search can require examining all the records.)
Assuming the key to be a string of bytes, the hash function should be an index into the records that has random distribution over expected strings. Otherwise, there will be more key "collisions" and hence degradation of lookup time. If, for example, a key is alphabetic, then each byte can have only 26 of its possible 256 values. So simple functions will not probe all record indices evenly.
When speed and good distribution are important, and the key is an array of bytes, one good function to use is Pearson Hashing.
See also: message digest, HMAC, cryptography
Source: adapted by the editor from Wikipedia, the free encyclopedia under a copyleft GNU Free Documentation License (GFDL) from the article "Hash function."
Crosswords: HASH FUNCTION |
| Specialty definitions using "HASH FUNCTION": Certificate Authority, coalesced hashing ♦ digital signature, direct chaining hashing, double hashing, dynamic hashing ♦ extendible hashing ♦ hash heap, hash table ♦ Keyed-Hashing Message Authentication ♦ message digest function, multiplication method ♦ Pearson's hash, perfect hashing, primary clustering ♦ separate chaining hashing, System Account Manager. (references) |
| The following statistics estimate the number of searches per day across the major English-language search engines as identified by various trade publications. Hyperlinks lead to commercial use of the expression at Amazon.com. |
| Expression | Frequency per Day |
hash function | 27 |
| Source: compiled by the editor from various references; see credits. | |
Scrabble® Enable2K-Verified Anagrams | |
| Words within the letters "a-c-f-h-h-i-n-n-o-s-t-u" | |
-3 letters: chthonian, countians, fountains, functions, stanchion. | |
-4 letters: actinons, anchusin, auctions, canonist, cautions, chitosan, contains, continua, countian, factions, factious, fontinas, fountain, function, nonfacts, sanction, sonantic, unchains, unctions, unfaiths. | |
-5 letters: acinous, actinon, actions, anoints, atonics, auction, canthus, cantons, catfish, cations, caution, chanson, chaunts, chitons, conatus, confits, contain, cushion, faction, fanions, fashion, fontina, fuchsia, fuchsin, fustian, hunnish. | |
| Source: compiled by the editor from various references; see credits. SCRABBLE® is a registered trademark. All intellectual property rights in and to the game are owned in the U.S.A and Canada by Hasbro Inc., and throughout the rest of the world by J.W. Spear & Sons Limited of Maidenhead, Berkshire, England, a subsidiary of Mattel Inc. Mattel and Spear are not affiliated with Hasbro. | |
Hexadecimal (or equivalents, 770AD-1900s) (references)48 41 53 48      46 55 4E 43 54 49 4F 4E |
| Leonardo da Vinci (1452-1519; backwards) (references)
|
Binary Code (1918-1938, probably earlier) (references)01001000 01000001 01010011 01001000 00100000 01000110 01010101 01001110 01000011 01010100 01001001 01001111 01001110 |
HTML Code (1990) (references)H A S H   F U N C T I O N |
ISO 10646 (1991-1993) (references)0048 0041 0053 0048      0046 0055 004E 0043 0054 0049 004F 004E |
Encryption (beginner's substitution cypher): (references)4235534224055483754434948 |
| 1. Crosswords 2. Expressions: Internet 3. Anagrams 4. Orthography | 5. Bibliography |
Copyright © Philip M. Parker, INSEAD. Terms of Use.