# Features of Hash

## What are the requirements for an excellent hash algorithm?

a), the original data cannot be reversely deduced from the hash value.  해시게임 This can be clearly seen from the above MD5 example, the mapped data has no corresponding relationship with the original data.

b) A small change in the input data will result in a completely different hash value, and the same data will get the same value. It can be seen that we only changed one text, but the entire hash value obtained has changed a lot.

c) The execution efficiency of the hash algorithm should be efficient, and the hash value can be quickly calculated for long texts

d), the collision probability of the hash algorithm is small
Because the principle of hash is to map the value of the input space into the hash space, the space of the hash value is much smaller than the input space. According to the drawer principle, there must be cases where different inputs are mapped to the same output. So as a good hash algorithm, the probability of this conflict needs to be as small as possible.

There are ten apples on the table. Put these ten apples into nine drawers. No matter how you put them, we will find at least one drawer with no less than two apples in it. This phenomenon is what we call the “drawer principle”. The general meaning of the drawer principle is: “If each drawer represents a set, and each apple can represent an element if there are n+1 elements in n sets, there must be at least two sets in one set. Elements.” The drawer principle is also sometimes referred to as the pigeonhole principle. It is an important principle in combinatorics.

## A solution to Hash Collision

As mentioned earlier, the hash algorithm is bound to have conflicts, so what should we do if we encounter a hash conflict that needs to be resolved? The more commonly used algorithms are the chain address method and the open address method.

The linked list address method uses a linked list array to store the corresponding data, and when the hash encounters a conflict, it is added to the back of the linked list for processing.

Schematic diagram of chain address method

The process of chain address processing is as follows:
When adding an element, first calculate the hash value of the element key to determine the position to insert into the array. If there is no duplicate data under the current position, it will be directly added to the current position. When a conflict is encountered, it is added to the elements of the same hash value to form a linked list. The characteristic of this linked list is that the Hash values ​​on the same linked list are the same. The Java data structure HashMap uses this method to deal with conflicts. In JDK1.8, when the data on the linked list exceeds 8, the red-black tree is used for optimization.

## Open address method

The open address method means that an array of size M holds N key-value pairs, where M > N. We need to rely on empty spaces in the array to resolve collision conflicts. All methods based on this policy are collectively referred to as “open address” hash tables. The linear detection method is an implementation of a commonly used “open address” hash table. The core idea of ​​the linear detection method is that when a conflict occurs, the next unit in the table is sequentially checked until an empty unit is found or the entire table is searched. Simply put: once a conflict occurs, look for the next empty hash table address. As long as the hash table is large enough, an empty hash address can always be found.

The mathematical description of the linear detection method is h(k, i) = (h(k, 0) + i) mod m, where I indicate which round of detection is currently being performed. When i=1, it is the next one to probe h(k, 0); i=2, it is the next one. The method