User Tools

Site Tools


chess:programming:huffman:example

This is an old revision of the document!


Chess - Programming - Huffman - Example

Consider the following sentence:

dead beef cafe deeded dad.  dad faced a faded cab.  dad acceded.  dad be bad.

NOTE: This sentence is 77 characters long.

  • There are 12 a's, 4 b's, 5 c's, 19 d's, 12 e's, 4 f's, 17 spaces, and 4 periods.

Using a Fixed-length code

A fixed-length code could be used:

codecharacter
000(space)
001a
010b
011c
100d
101e
110f
111.

NOTE: Then the sentence, which is of length 77, consumes 77 * 3 = 231 bits.

  • Each character is taking up 3 bits.

Using a Variable-length code

Alternative, a Variable length code is used like this:

codecharacter
100(space)
110a
11110b
1110c
0d
1010e
11111f
1011.

NOTE: The optimal number of bits needed to represent each letter is obtained by the Huffman Algorithm.

  • It uses the frequency of the letters in the text to determine the optimal code for each letter.

The text here can be encoded in 3 * 12 + 4 * 5 + 5 * 4 + 19 * 1 + 12 * 4 + 4 * 5 + 17 * 3 + 4 * 4 = 230 bits.

  • That is a savings of 1 bit.
  • It does not seem like much, but it is a start.

This code must be a prefix code, where we can distinguish where one code stops and another starts;

  • One code may not be a prefix of another code or there will be confusion.

If the characters have non-uniform frequency distributions, then finding such a code can lead to great savings in storage space.

  • A process with this effect is called data compression.
  • This can be applied to any data where the frequency distribution is known or can be computed, not just sentences in languages.
  • Examples are computer graphics, digitized sound, binary executables, etc.

A prefix code can be represented as a binary tree, where the leaves are the characters and the codes are derived by tracing a path from root to leaf, using 0 when we go left and 1 when we go right.


Represent the Variable-length code in a tree

The Variable-length code above would be represented by this tree:

                                _@_
                             _/     \_
                           _/         \_
                         _/             \_
                       _/                 \_
                     _/                     \_
                   _/                         \_
                  d                            _@_
                                              /   \
                                             /     \ 
                                            /       \
                                           /         \
                                          /           \
                                         _@_          _@_
                                        /   \        /   \
                                       /     \      /     \
                                   (space)    @    a       @
                                             / \          / \
                                            /   \        /   \
                                           e    "."     c     @
                                                             / \
                                                            b   f

NOTE: In this tree, the code for e is found by going right, left, right, left, i.e., 1010.


Label each internal node

Label each internal node recursively with the sum of the values of its children, starting at the leaves.

The tree looks like this:

                                _77
                             _/     \_
                           _/         \_
                         _/             \_
                       _/                 \_
                     _/                     \_
                   _/                         \_
                  d                            _58
                 19                           /   \
                                             /     \ 
                                            /       \
                                           /         \
                                          /           \
                                         _33          _25
                                        /   \        /   \
                                       /     \      /     \
                                   (space)    16   a      13
                                     17      / \  12      / \
                                            /   \        /   \
                                           e    "."     c     8
                                          12     4      5    / \
                                                            b   f
                                                            4   4

NOTE: The root node has value 77, which is just the number of characters.

chess/programming/huffman/example.1635512914.txt.gz · Last modified: 2021/10/29 13:08 by peter

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki