Huffman compression

An algorithm to compress data using Huffman encode

About it

Current program is based of Huffman encode which ensure a good data compression. It can be use for both encoding and decoding a file.

How to use it

It has an CLI using an intuitive menu:

Read from file - reads the data from a file
Encode data - generates the Huffman encoding of the data
Write data - writes the input file encoded
Decode data - writes the input file decoded
Display data - shows how the symbols were encoded
Display entropy - shows the Shannon entropy along with the current encode entropy
Test the files - compares two given files and shows the content differences

To encode a file you would read from a file, encode the data and then write to another file.

Note that the output file should be a binary one!

To decode a file simply run decode data.

How to run

Compile, here I'm using gcc.

After compiling, lunch the executable, depending on the os:

Windows

gcc test.c util.c code.c encode.c menu.c main.c -o main
main.exe

Linux

gcc test.c util.c code.c encode.c menu.c main.c -o main -lm
./main

How it works

It reads all the symbols from a file and store them in a data struct array. The it adds all the missing nodes from the Huffman tree, and start applying the algorithm to encode.
After completing the tree the program writes the dictionary as key - value pair, with the key being the symbol itself and the value its apparation probability.
From a given file, all the characters are then written encoded in a buffer (for this project I choose a 32 bits array - int) and when the buffer is full it is written to the output file.

When decoding a file, it first reads the dictionary and with it parses all the file and writes the decoded version to an output file.

Tech specs

The entire program has its functions documtented as comments above the their definitions.

A probably confusing thing would be the adding of a char* that represents the encoding of a symbol to a bit array that I will explain below.

All the functionality is implemented via add_to_buffer function that takes 3 parameters:

int buffer - where it stores the bits
int buffer_size - that represents how many bits we have written
char* to_encode - the string that needs to be added to buffer

The function checks how many bits we can write from the to_encode parameter and based on that it returns coresponding.

The add part is done by first shifting the current index to left by 1 so we "allocate" a new bit in which we can write. The second part, the writing, implies bitwise or '|' with the current value of the to_encode parameter.

Assume that for the letter a we have the encode "10011" store as char* and our buffer is 0 and so its index.

Buffer representation: (for simplicity assume that our it has only 8 bits)

+-----------+

| 0000 0000 |

+-----------+

We take the first char of out string (from left to rigth) which is '1' and we shift out number by 1 and the bitwe oring the value with 1.

+-----------+

| 0000 0001 |

+-----------+

Now we continue the same way and the result would be:

+-----------+

| 0001 0011 |

+-----------+

Performance

I have tested the program with a big file - 365 MB - and the compression rate would be ~60%. The time required to all reading the file, encoding and decoding it was, on my machine, ~12 minutes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Huffman compression

An algorithm to compress data using Huffman encode

About it

How to use it

How to run

How it works

Tech specs

Performance

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Headers		Headers
README.md		README.md
code.c		code.c
encode.c		encode.c
main.c		main.c
menu.c		menu.c
test.c		test.c
util.c		util.c

w-i-l/huffman-text-compression

Folders and files

Latest commit

History

Repository files navigation

Huffman compression

An algorithm to compress data using Huffman encode

About it

How to use it

How to run

How it works

Tech specs

Performance

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages