Chapter 55: Huffman Coding

Huffman Coding – Let’s learn this topic properly, slowly and clearly — like we are sitting together and I’m teaching you from scratch with a notebook and pencil, drawing trees, writing bits, and counting everything so you really understand how it works and why it’s so clever.

Huffman Coding is one of the most beautiful and most practical compression algorithms ever invented. It is used everywhere — in ZIP files, JPEG images, MP3 audio, PNG images, PDFs, network protocols, and many more places.

1. What problem does Huffman Coding solve?

Most text files and data contain some characters that appear very frequently and others that appear very rarely.

Examples:

In English text: space, e, t, a, o, i, n appear very often z, q, x, j appear very rarely
In source code: ; { } ( ) appear very often rare symbols like @ # $ appear rarely

Normal encoding (like ASCII) uses fixed 8 bits for every character — even for very rare ones.

Huffman Coding solves this by giving:

Short codes (few bits) to frequent characters
Long codes (more bits) to rare characters

The result: total number of bits needed is much smaller than fixed-length encoding.

2. Core Idea – Variable-length prefix codes

Huffman Coding creates a variable-length code where:

More frequent characters get shorter bit sequences
Less frequent characters get longer bit sequences
No code is a prefix of any other code (very important!)

This last property is called prefix-free or prefix code — it means you can decode the bit stream unambiguously without any markers between codes.

Example (very simple):

Character	Frequency	Huffman code
space	very high	0
e	high	10
t	high	110
z	very low	1111

You can see: frequent letters get short codes, rare letters get long codes.

3. How Huffman Coding builds the codes (the algorithm)

The algorithm builds a binary tree (Huffman tree) using a greedy strategy.

Steps:

Count the frequency of each character in the text
Create a min-heap (priority queue) where each node contains:
- a character (or combined characters)
- its frequency
While there is more than one node in the heap:
- Remove the two nodes with the smallest frequencies
- Create a new parent node with frequency = sum of the two children
- Make the two nodes its left and right children (convention: left = 0, right = 1)
- Put the new parent node back into the heap
When only one node remains → that is the root of the Huffman tree
Traverse the tree from root to each leaf:
- Every left branch = 0
- Every right branch = 1
- The path from root to leaf = Huffman code for that character

4. Complete Worked Example (very detailed)

Text to compress: “abacabad”

Step 1: Count frequencies

Character	Count
a	4
b	2
c	1
d	1

Step 2: Create initial min-heap (priority queue sorted by frequency)

Nodes:

c : 1
d : 1
b : 2
a : 4

Step 3: Build the tree

Take two smallest: c (1) and d (1) Create parent node with freq = 1+1 = 2 Tree so far:

text

Put this node (2) back into heap.

Heap now: 2 (c-d), b (2), a (4)

Take two smallest: c-d (2) and b (2) Create parent node with freq = 2+2 = 4 Tree now:

text

Put node 4 back.

Heap now: 4 (c-d-b), a (4)

Take two remaining: 4 and a (4) Create root node with freq = 4+4 = 8 Final tree:

text

Step 4: Assign codes (left = 0, right = 1)

From root to a: right → 1
From root to b: left → right → 10
From root to c: left → left → left → 000
From root to d: left → left → right → 001

Final Huffman codes:

Character	Code
a	1
b	10
c	000
d	001

Step 5: Encode the message “abacabad”

a b a c a b a d → 1 10 1 000 1 10 1 001

Bit string: 1 10 1 000 1 10 1 001

Total bits = 1+2+1+3+1+2+1+3 = 14 bits

Normal ASCII (8 bits per char × 8 chars) = 64 bits

Savings: from 64 bits → 14 bits ≈ 78% compression

6. Time Complexity of Huffman Coding

Step	Time Complexity	Explanation
Count frequencies	O(n)	Scan the string once
Build initial min-heap	O(σ)	σ = number of unique symbols
Build Huffman tree (n-1 merges)	O(σ log σ)	Each merge + heap operation ≈ log σ
Generate codes (DFS traversal)	O(σ)	Visit each node once
Encode the message	O(n)	Replace each character with its code

Total time complexity ≈ O(n + σ log σ) Where σ = number of distinct characters (usually very small, e.g. 256 for ASCII)

In practice → almost linear in length of text

7. Summary – What you should remember & say in interviews

Huffman Coding is a greedy, optimal prefix code compression algorithm that assigns variable-length codes to characters based on their frequencies — shorter codes for more frequent characters. It builds a binary tree by repeatedly combining the two nodes with smallest frequencies. The path from root to leaf gives the Huffman code (left=0, right=1). Time complexity is O(n + σ log σ) where n is text length and σ is number of unique symbols. It is optimal among all prefix codes for a given set of frequencies. Very widely used in: ZIP, JPEG, MP3, PNG, PDF, network protocols, etc.

Would you like to continue with any specific part?

Very common next topics:

How to decode a Huffman-encoded bit stream
Why Huffman is optimal (proof intuition)
Huffman vs Shannon-Fano coding
Adaptive Huffman coding (when frequencies change)
Building Huffman tree with priority queue (step-by-step code dry run)
Huffman coding for real compression (handling EOF, tree transmission)

Just tell me which direction you want — I’ll explain it in the same detailed, friendly, teacher style with lots of drawings and examples 😊

Languages

Database

Web Technologies

Wordpress Tutorial

Top Online Compilers

PHP Projects

CRUD Management
PHP Search
Blog/CMS
E-commerce Website
Event Management System
Online Learning Platform
Task Management System
Social Networking Site
Inventory Management System
Real Estate Listing Website
Job Portal
Discussion Forum
Online Quiz/Test Platform
File Sharing System
Travel Booking System
Expense Management System
Recipe Sharing Platform
Online Survey System
Library Management System
Health and Fitness Tracker
Online Marketplace

Home

About Us

Disclaimer

+91 9433 511 250

Email

info@bestwebteacher.com

Chapter 55: Huffman Coding

1. What problem does Huffman Coding solve?

2. Core Idea – Variable-length prefix codes

3. How Huffman Coding builds the codes (the algorithm)

4. Complete Worked Example (very detailed)

6. Time Complexity of Huffman Coding

7. Summary – What you should remember & say in interviews

You may also like...

Leave a Reply Cancel reply

DSA Tutorial

Languages

Database

Web Technologies

Web Technologies

Wordpress Tutorial

Top Online Compilers

PHP Projects

WhatsApp

Email

Connect with us

Chapter 55: Huffman Coding

1. What problem does Huffman Coding solve?

2. Core Idea – Variable-length prefix codes

3. How Huffman Coding builds the codes (the algorithm)

4. Complete Worked Example (very detailed)

6. Time Complexity of Huffman Coding

7. Summary – What you should remember & say in interviews

You may also like...

Chapter 66: DSA Study Plan

Chapter 65: DSA Syllabus

Chapter 64: DSA Quiz

Leave a Reply Cancel reply

DSA Tutorial

Languages

Database

Web Technologies

Web Technologies

Wordpress Tutorial

Top Online Compilers

PHP Projects

WhatsApp

Email

Connect with us