Skip to content

Latest commit

 

History

History
108 lines (80 loc) · 3.19 KB

README.md

File metadata and controls

108 lines (80 loc) · 3.19 KB

Huffman Coding

This project is C++ implementation of a simple lossless compression algorithm, based on the Huffman Coding. It was made as a practical exam for the Data Structures (INF 213) course of the Computer Science major of the Federal University of Viçosa (UFV).

Compiling

To compile the code, just run the Makefile:

~$ make

If it doesn't work, try to compile the files manualy with the following commands:

~$ g++ main.cpp -std=c++11 -O3 -c
~$ g++ Huffman.cpp -std=c++11 -O3 -c
~$ g++ main.o Huffman.o -std=c++11 -O3 -o huffman

If you updated the code and got a 'up to date' message when trying to run the Makefile, try:

~$ make clear

or

~$ rm main.o Huffman.o huffman

Running

Compressing files

To compress a file, run:

~$ ./huffman c input_file_name compressed_output_name

Decompressing files

To decompress a file, run:

~$ ./huffman d compressed_file_name decompressed_output_name

Algorithm and Implementation

Proposed by David A. Huffman in 19511, the Huffman Coding an optimal prefix code commonly used for lossless data compression. It generates a variable-length code to represent each symbol a of sequence, based on their frequency of occurrence in the given stream or on their probability of appearing in it. The implementation receives a file as input, counts the number of occurrences of each one of the 256 ASCII symbols, builds a tree of frequencies and generates a set of binary codes based on such tree. After that, it creates a file composed by the frequencies of each symbol and the encoded symbols of the original file. To decompress the created files, the implementation uses the frequencies to rebuild the tree and reverse engineers the symbols using the codes.

Example

A simple example of how the algorithm would encode the string "ABBBABBBCBDB".

The figure below represents a possible tree generated by the algorithm.

Each one of the leaves represents a symbol of the string and the path from root to them describes their binary code, with each movement to left child being a 1 and each movement to the right child being a 0. By the way, note how the most frequent symbols stay on the leaves closer to the root, creating a shorter code for them.

Result:

Symbol Frequency ASCII Code Huffman Code
A 2 1000001 01
B 8 1000010 1
C 1 1000011 001
D 1 1000100 000

References

1. HUFFMAN, David A. A method for the construction of minimum-redundancy codes. Proceedings of the IRE, v. 40, n. 9, p. 1098-1101, 1952.

Author