This is an old revision of the document!
Table of Contents
Chess - Programming - Zobrist Hashing
Zobrist Hashing is a technique to transform a board position of arbitrary size into a number of a set length, with an equal distribution over all possible numbers.
- The main purpose of Zobrist hash codes in chess programming is to get an almost unique index number for any chess position, with a very important requirement that two similar positions generate entirely different indices.
- These index numbers are used for faster and more space efficient Hash tables or databases, e.g. transposition tables and opening books.
Initialization
At program initialization, we generate an array of pseudorandom numbers:
- One number for each piece at each square.
- One number to indicate the side to move is black.
- Four numbers to indicate the castling rights, though usually 16 (2^4) are used for speed.
- Eight numbers to indicate the file of a valid En Passant square, if any.
This results in an array with 781 (12*64 + 1 + 4 + 8) random numbers.
- Since pawns do not occur on first and eighth rank, one might be fine with 12*64 though.
- There are even proposals and implementations to use overlapping keys from unaligned access up to an array of only 12 numbers for every piece and to rotate that number by square.
Programs usually implement their own Pseudorandom number generator (PRNG), both for better quality random numbers than standard library functions, and also for reproducibility.
- This means that whatever platform the program is run on, it will use the exact same set of Zobrist keys.
- This is also useful for things like opening books, where the positions in the book can be stored by hash key and be used portably across machines, considering endianess.
Runtime
To get the Zobrist hash code of a certain position, the hash key is initialized by xoring all random numbers linked to the given feature.
For example, the starting position:
[Hash for White Rook on a1] xor [White Knight on b1] xor [White Bishop on c1] xor ... ( all pieces ) ... xor [White castling short] xor [White castling long] xor ... ( all castling rights )
NOTE: The fact that xor-operation is own inverse and can be undone by using the same xor-operation again, is often used by chess engines.
- It allows a fast incremental update of the hash key during make or unmake moves.
Another example, for a White Knight that jumps from b1 to c3 capturing a Black Bishop, these operations are performed:
[Original Hash of position] xor [Hash for White Knight on b1] ... ( removing the knight from b1 ) ... xor [Hash for Black Bishop on c3] ( removing the captured bishop from c3 ) ... xor [Hash for White Knight on c3] ( placing the knight on the new square ) ... xor [Hash for Black to move] ( change sides)
What size the hash keys should have
Smaller hash keys are faster and more space efficient, while larger ones reduce the risk of a hash collision.
- A collision occurs if two positions map the same key.
- Usually 64 bit is used as a standard size in modern chess programs.
Collisions
Using too small a hash code, such as only 32 bits, may result in different positions being assigned the same hash code.
Programs have various strategies for deciding what to do when a clash occurs.
- Some programs attempt to store the position at the next location.
- If that location is found to be occupied also, the program can give up or it can replace the older of the two entries just examined.
- Most programs try more than once to find an unoccupied location.
A supposedly good hashing function workaround for clashing:
- Given a 64 bit (or whatever word size you want) take the low order “n” bits where “n” is log2 (tablesize) and use this as the “first” entry (x) to probe.
- Take the next “m” bits (for Cray Blitz, m=9 and use this as a “vector stride” (y) and then load eight words, table(x), table(x+y), table(x+2y), …, table(x+7y).
- One of these will be the one chosen to be replaced by the current store, or else one of these will be the “match” for the current lookup.
- The only minor difficulty to handle is that you do not dimension the table for (n) entries [table(n)] but, rather, we have to dimension the table larger [table(n+7*512)] to handle the positions where we hash to near the end of the table, and the vector loads would then step beyond the end…
- No, there is not a “modulus” mechanism for vector loads to make this load “wrap around” to the beginning.
Remember that a table entry contains a “key”, score, score type, best move, etc.
- This key must EXACTLY match the hash key before we consider this a “hit”.
- This means that the 64 bit hash key must match completely, even though we only used the low order “n” bits of this key to probe into the table.
- In fact, since we used the low order “n” bits to prob into the table, we do not even bother storing them since we “know” what they were already (which saves table space.)
NOTE: It is highly recommended to use 64-bit keys to not have to deal with collisions.
- A value of 48 bits has shown to be just as safe as 64, but 48 is not a reasonable size as 64 bit signatures are the natural signature length for 64 bit processors.