User Tools

Site Tools


chess:programming:huffman:example

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
chess:programming:huffman:example [2021/10/29 13:08] peterchess:programming:huffman:example [2021/10/29 13:56] (current) – [Stepping through the code] peter
Line 149: Line 149:
 <WRAP info> <WRAP info>
 **NOTE:**  The root node has value 77, which is just the number of characters. **NOTE:**  The root node has value 77, which is just the number of characters.
 +
 +  * The number of bits needed to encode the data is the the sum, for each character, of the number of bits in its code times its frequency.
 +  * Let T be the tree, C be the set of characters c that comprise the alphabet, and f(c) be the frequency of character c.
 +  * Since the number of bits is the same as the depth in the binary tree, we can express the sum in terms of dT, the depth of character c in the tree: <code>
 +total_bits_for_a_character = f(c) * dT
 +</code>
 +  * For example:
 +    * **a** had frequency of 12 and depth of 3, so total_bits = 36.
 +    * **b** had frequency of 4 and depth of 5, so total_bits = 20.
 +    * **c** had frequency of 5 and depth of 4, so total_bits = 20.
 +    * **d** had frequency of 19 and depth of 1, so total_bits = 19. 
 +    * **e** had frequency of 12 and depth of 4, so total_bits = 48.
 +    * **f** had frequency of 4 and depth of 5, so total_bits = 20.
 +    * **(spaces)** had frequency of 17 and depth of 3, so total_bits = 51.
 +    * **.** had frequency of 4 and depth of 4, so total_bits = 16.
 +
 +  * This gives a total of 36 + 20 + 20 + 19 + 48 + 20 + 51 + 16 = 230 bits.
 +    * This is the sum we want to minimize.
 +    * Lets call it the cost, B(T) of the tree.
 </WRAP> </WRAP>
  
 +----
 +
 +===== Build the low-cost tree =====
 +
 +<code>
 +Huffman (C)
 +  n = the size of C
 +
 +  // Insert all the elements of C into Q, using the value of the node as the priority.
 +  for i in 1..n-1 do
 +    z = a new tree node
 +    x = Extract-Minimum (Q)
 +    y = Extract-Minimum (Q)
 +    left node of z = x
 +    right node of z = y
 +    f[z] = f[x] + f[y]
 +    Insert (Q, z)
 +  end for
 +  
 +  return Extract-Minimum (Q) as the complete tree
 +</code>
 +
 +<WRAP info>
 +**NOTE:**  In the following algorithm, 
 +
 +  * **B(T)** is defined as above;
 +    * it can be stored efficiently in an array indexed by characters.
 +    * It is extended as needed to accommodate the values of internal tree nodes.
 +  * **C** is the set of characters represented as leaf tree nodes.
 +  * **Q** is a priority queue of tree nodes where we can quickly extract the minimum element;
 +    * this can be done with a heap where the heap property is reversed.
 +    * We build the tree in a bottom up manner, starting with the individual characters and ending up with the root of the tree as the only element of the queue.
 +
 +At first, the queue contains all the leaf nodes as a "forest" of singleton binary trees.
 +
 +  * Then the two nodes with least value are grabbed out of the queue, joined by a new tree node, and put back on the queue.
 +  * After n-1 iterations, there is only one node left in the queue: the root of the tree.
 +
 +</WRAP>
 +
 +----
 +
 +==== Stepping through the code using the Huffman algorithm ====
 +
 +Here are the contents of **Q** after each step through the for loop:
 +
 +Initially, all nodes are leaf nodes.
 +
 +We stick all 8 in Q:
 +
 +<code>
 +(space)  a    b    c    d    e    f    .
 +  17    12    4    5   19   12    4    4
 +</code>      
 +
 +We join two of the nodes with least value; now there are 7 things in Q:
 +
 +<code>
 +                 8
 +                / \   (space) a    b    c    d    e
 +                    17   12    4    5   19   12
 +                 4
 +</code>
 +
 +----
 +
 +Then the next two with least value, Q has 6 elements:
 +
 +<code>                                       
 +                                    9 
 +                / \   (space) a      / \        e
 +                    17   12         19   12
 +                                4   5
 +</code>
 +
 +----
 +
 +Now the two nodes with least values are the two trees we just made, and Q has 5 elements:
 +
 +<code>
 +                          17
 +                      __/   \__
 +                   __/         \__
 +                  /               \
 +                                 
 +                / \               / \        e  (space)  a
 +                                 19   12    17    12
 +                               5
 +<code>
 +
 +----
 +
 +The next two with least value, Q has 4 elements:
 +
 +<code>
 +                          17
 +                      __/   \__
 +                   __/         \__
 +                  /                            24
 +                                            /  \
 +                / \               / \        e    a   (space)
 +                                 19   12    12    17
 +                               5
 +</code>
 +
 +----
 +
 +The next two with least value, three items left:
 +
 +<code>
 +                                            34
 +                                    ______/   \_____
 +                             ______/                \_____
 +                            /                            (space)
 +                          17                               17
 +                      __/   \__
 +                   __/         \__
 +                  /                                    24
 +                                                    /  \
 +                / \               / \                  e    a      d
 +                                              12    12     19
 +                               5
 +</code>
 +
 +----
 +
 +The next two with least value, two big trees left:
 +
 +<code>
 +                                            34
 +                                    ______/   \_____
 +                             ______/                \_____
 +                            /                          (space)
 +                          17                             17
 +                      __/   \__                             43
 +                   __/         \__                         /  \
 +                  /                                    24    d
 +                                                    /  \  19
 +                / \               / \                  e    a      
 +                                              12    12
 +                               5
 +</code>
 +
 +Finally, we join the whole thing up:
 +
 +<code>
 +                                                                  77
 +                                               __________________/  \_______
 +                                              /                             \
 +                                            34                               43
 +                                    ______/   \_____                        /  \
 +                             ______/                \_____                24    d
 +                            /                          (space)           /  \  19
 +                          17                             17               a  
 +                      __/   \__                                        12  12
 +                   __/         \__ 
 +                  /               \
 +                                 9
 +                / \               / \
 +                               c
 +                               5
 +</code>
 +
 +<WRAP info>
 +**NOTE:** At each point, we chose the joining that would force less frequent characters to be deeper in the tree.
 +</WRAP>
 +
 +----
 +
 +So an optimal prefix code is:
 +
 +<code>
 +01 (space)
 +101 a
 +0010 b
 +0011 c
 +11 d
 +100 e
 +0000 f
 +0001 .
 +</code>
 +
 +And 
 +
 +<code>
 +B(T) = 17 * 2 + 12 * 3 + 4 * 4 + 5 * 4 + 19 * 2 + 12 * 3 + 4 * 4 = 196 bits
 +</code>
 +
 +<WRAP info>
 +**NOTE:**  A savings of 15% in the size of the data.
 +</WRAP>
  
chess/programming/huffman/example.1635512914.txt.gz · Last modified: 2021/10/29 13:08 by peter

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki