chess:programming:huffman:example
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
chess:programming:huffman:example [2021/10/29 13:00] – peter | chess:programming:huffman:example [2021/10/29 13:56] (current) – [Stepping through the code] peter | ||
---|---|---|---|
Line 54: | Line 54: | ||
<WRAP info> | <WRAP info> | ||
- | **NOTE: | + | **NOTE: |
+ | |||
+ | * It uses the frequency of the letters in the text to determine the optimal code for each letter. | ||
+ | |||
+ | The text here can be encoded in 3 * 12 + 4 * 5 + 5 * 4 + 19 * 1 + 12 * 4 + 4 * 5 + 17 * 3 + 4 * 4 = 230 bits. | ||
* That is a savings of 1 bit. | * That is a savings of 1 bit. | ||
Line 104: | Line 108: | ||
</ | </ | ||
+ | |||
+ | <WRAP info> | ||
+ | **NOTE: | ||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | ==== Label each internal node ==== | ||
+ | |||
+ | Label each internal node recursively with the sum of the values of its children, starting at the leaves. | ||
+ | |||
+ | The tree looks like this: | ||
+ | |||
+ | < | ||
+ | _77 | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | d _58 | ||
+ | | ||
+ | / | ||
+ | / \ | ||
+ | / | ||
+ | / \ | ||
+ | | ||
+ | / | ||
+ | / | ||
+ | | ||
+ | | ||
+ | / | ||
+ | | ||
+ | 12 | ||
+ | b f | ||
+ | 4 4 | ||
+ | </ | ||
+ | |||
+ | <WRAP info> | ||
+ | **NOTE: | ||
+ | |||
+ | * The number of bits needed to encode the data is the the sum, for each character, of the number of bits in its code times its frequency. | ||
+ | * Let T be the tree, C be the set of characters c that comprise the alphabet, and f(c) be the frequency of character c. | ||
+ | * Since the number of bits is the same as the depth in the binary tree, we can express the sum in terms of dT, the depth of character c in the tree: < | ||
+ | total_bits_for_a_character = f(c) * dT | ||
+ | </ | ||
+ | * For example: | ||
+ | * **a** had frequency of 12 and depth of 3, so total_bits = 36. | ||
+ | * **b** had frequency of 4 and depth of 5, so total_bits = 20. | ||
+ | * **c** had frequency of 5 and depth of 4, so total_bits = 20. | ||
+ | * **d** had frequency of 19 and depth of 1, so total_bits = 19. | ||
+ | * **e** had frequency of 12 and depth of 4, so total_bits = 48. | ||
+ | * **f** had frequency of 4 and depth of 5, so total_bits = 20. | ||
+ | * **(spaces)** had frequency of 17 and depth of 3, so total_bits = 51. | ||
+ | * **.** had frequency of 4 and depth of 4, so total_bits = 16. | ||
+ | |||
+ | * This gives a total of 36 + 20 + 20 + 19 + 48 + 20 + 51 + 16 = 230 bits. | ||
+ | * This is the sum we want to minimize. | ||
+ | * Lets call it the cost, B(T) of the tree. | ||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | ===== Build the low-cost tree ===== | ||
+ | |||
+ | < | ||
+ | Huffman (C) | ||
+ | n = the size of C | ||
+ | |||
+ | // Insert all the elements of C into Q, using the value of the node as the priority. | ||
+ | for i in 1..n-1 do | ||
+ | z = a new tree node | ||
+ | x = Extract-Minimum (Q) | ||
+ | y = Extract-Minimum (Q) | ||
+ | left node of z = x | ||
+ | right node of z = y | ||
+ | f[z] = f[x] + f[y] | ||
+ | Insert (Q, z) | ||
+ | end for | ||
+ | | ||
+ | return Extract-Minimum (Q) as the complete tree | ||
+ | </ | ||
+ | |||
+ | <WRAP info> | ||
+ | **NOTE: | ||
+ | |||
+ | * **B(T)** is defined as above; | ||
+ | * it can be stored efficiently in an array indexed by characters. | ||
+ | * It is extended as needed to accommodate the values of internal tree nodes. | ||
+ | * **C** is the set of characters represented as leaf tree nodes. | ||
+ | * **Q** is a priority queue of tree nodes where we can quickly extract the minimum element; | ||
+ | * this can be done with a heap where the heap property is reversed. | ||
+ | * We build the tree in a bottom up manner, starting with the individual characters and ending up with the root of the tree as the only element of the queue. | ||
+ | |||
+ | At first, the queue contains all the leaf nodes as a " | ||
+ | |||
+ | * Then the two nodes with least value are grabbed out of the queue, joined by a new tree node, and put back on the queue. | ||
+ | * After n-1 iterations, there is only one node left in the queue: the root of the tree. | ||
+ | |||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | ==== Stepping through the code using the Huffman algorithm ==== | ||
+ | |||
+ | Here are the contents of **Q** after each step through the for loop: | ||
+ | |||
+ | Initially, all nodes are leaf nodes. | ||
+ | |||
+ | We stick all 8 in Q: | ||
+ | |||
+ | < | ||
+ | (space) | ||
+ | 17 12 4 5 | ||
+ | </ | ||
+ | |||
+ | We join two of the nodes with least value; now there are 7 things in Q: | ||
+ | |||
+ | < | ||
+ | 8 | ||
+ | / \ | ||
+ | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | Then the next two with least value, Q has 6 elements: | ||
+ | |||
+ | < | ||
+ | | ||
+ | / \ | ||
+ | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | Now the two nodes with least values are the two trees we just made, and Q has 5 elements: | ||
+ | |||
+ | < | ||
+ | 17 | ||
+ | __/ \__ | ||
+ | | ||
+ | / \ | ||
+ | | ||
+ | / \ / \ | ||
+ | | ||
+ | | ||
+ | < | ||
+ | |||
+ | ---- | ||
+ | |||
+ | The next two with least value, Q has 4 elements: | ||
+ | |||
+ | < | ||
+ | 17 | ||
+ | __/ \__ | ||
+ | | ||
+ | / | ||
+ | | ||
+ | / \ / \ | ||
+ | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | The next two with least value, three items left: | ||
+ | |||
+ | < | ||
+ | 34 | ||
+ | ______/ | ||
+ | | ||
+ | / (space) | ||
+ | 17 17 | ||
+ | __/ \__ | ||
+ | | ||
+ | / | ||
+ | | ||
+ | / \ / \ e a d | ||
+ | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | The next two with least value, two big trees left: | ||
+ | |||
+ | < | ||
+ | 34 | ||
+ | ______/ | ||
+ | | ||
+ | / (space) | ||
+ | 17 17 | ||
+ | __/ | ||
+ | | ||
+ | / | ||
+ | | ||
+ | / \ / \ e a | ||
+ | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | Finally, we join the whole thing up: | ||
+ | |||
+ | < | ||
+ | 77 | ||
+ | | ||
+ | / \ | ||
+ | 34 43 | ||
+ | ______/ | ||
+ | | ||
+ | / (space) | ||
+ | 17 | ||
+ | __/ | ||
+ | | ||
+ | / \ | ||
+ | | ||
+ | / \ / \ | ||
+ | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | <WRAP info> | ||
+ | **NOTE:** At each point, we chose the joining that would force less frequent characters to be deeper in the tree. | ||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | So an optimal prefix code is: | ||
+ | |||
+ | < | ||
+ | 01 (space) | ||
+ | 101 a | ||
+ | 0010 b | ||
+ | 0011 c | ||
+ | 11 d | ||
+ | 100 e | ||
+ | 0000 f | ||
+ | 0001 . | ||
+ | </ | ||
+ | |||
+ | And | ||
+ | |||
+ | < | ||
+ | B(T) = 17 * 2 + 12 * 3 + 4 * 4 + 5 * 4 + 19 * 2 + 12 * 3 + 4 * 4 = 196 bits | ||
+ | </ | ||
+ | |||
+ | <WRAP info> | ||
+ | **NOTE: | ||
+ | </ | ||
+ |
chess/programming/huffman/example.1635512435.txt.gz · Last modified: 2021/10/29 13:00 by peter