This is an old revision of the document!
Table of Contents
Chess - Programming - Huffman - Example
Consider the following sentence:
dead beef cafe deeded dad. dad faced a faded cab. dad acceded. dad be bad.
NOTE: This sentence is 77 characters long.
- There are 12 a's, 4 b's, 5 c's, 19 d's, 12 e's, 4 f's, 17 spaces, and 4 periods.
Using a Fixed-length code
A fixed-length code could be used:
000 | (space) |
001 | a |
010 | b |
011 | c |
100 | d |
101 | e |
110 | f |
111 | . |
NOTE: Then the sentence, which is of length 77, consumes 77 * 3 = 231 bits.
- Each character is taking up 3 bits.
Using a Variable-length code
Alternative, a Variable length code is used like this:
100 | (space) |
110 | a |
11110 | b |
1110 | c |
0 | d |
1010 | e |
11111 | f |
1011 | . |
NOTE: Then the text can be encoded in 3 * 12 + 4 * 5 + 5 * 4 + 19 * 1 + 12 * 4 + 4 * 5 + 17 * 3 + 4 * 4 = 230 bits.
- That a savings of 1 bit.
- It doesn't seem like much, but it is a start.
This code must be a prefix code, where we can distinguish where one code stops and another starts;
- One code may not be a prefix of another code or there will be confusion.
If the characters have non-uniform frequency distributions, then finding such a code can lead to great savings in storage space.
- A process with this effect is called data compression.
- This can be applied to any data where the frequency distribution is known or can be computed, not just sentences in languages.
- Examples are computer graphics, digitized sound, binary executables, etc.
A prefix code can be represented as a binary tree, where the leaves are the characters and the codes are derived by tracing a path from root to leaf, using 0 when we go left and 1 when we go right.
- For example, the code above would be represented by this tree:
_@_ _/ \_ _/ \_ _/ \_ _/ \_ _/ \_ _/ \_ d _@_ / \ / \ / \ / \ / \ _@_ _@_ / \ / \ / \ / \ (space) @ a @ / \ / \ / \ / \ e "." c @ / \ b f