Friday, March 30, 2012

How does Huffman Code work?


  • How does Huffman Code work?

http://www.youtube.com/watch?v=0PahtaFK640




  • Huffman coding

In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression.
http://en.wikipedia.org/wiki/Huffman_coding



  • How does Compression Work?

There are two main types of compression methods - lossless compression and lossy compression. Huffman Tree compression is a method by which the original data can be reconstructed perfectly from the compressed data, and hence it belongs to the lossless compression family.

http://www.sfu.ca/~vwchu/howcompression.html



  • Text Compression with Huffman Coding
https://www.youtube.com/watch?v=ZdooBTdW5bM



  • A simple example of Huffman coding on a string


you need to have a basic understanding of binary tree data structure and the priority queue data structure

Let’s say we have the string “beep boop beer!” which in his actual form, occupies 1 byte of memory for each character. That means that in total, it occupies 15*8 = 120 bits of memory.

(Theoretically, in this application we’ll output to the console a string of 40 char elements of 0 and 1 representing the encoded version of the string in bits. For this to occupy 40 bits we need to convert that string directly into bits using logical bit operations which we’ll not discuss now.)

In order to obtain the code for each element depending on it’s frequency we’ll need to build a binary tree such that each leaf of the tree will contain a symbol (a character from the string). The tree will be build from the leafs to the root, meaning that the elements of least frequency will be farther from the root than the elements that are more frequent. You’ll see soon why we chose to do this.

To build the tree this way we’ll use a priority queue with a slight modification, that the element with the least priority is the most important. Meaning that the elements that are the least frequent will be the first ones we get from the queue

Firstly we calculate the frequency of each character :

we’ll create binary tree nodes for each character and we’ll introduce them in the priority queue with the frequency as priority :



http://en.nerdaholyc.com/huffman-coding-on-a-string/

No comments:

Post a Comment