Chapter 18: DSA Hash Tables
DSA Hash Tables – Let’s understand this topic properly, like I’m sitting next to you explaining it slowly and clearly with a whiteboard, lots of real-life examples, visuals in words, code, common pitfalls, interview patterns, and most importantly — why hash tables are one of the most powerful and frequently used data structures in real programming.
What is a Hash Table?
A Hash Table (also called Hash Map or Dictionary) is a data structure that provides very fast lookup, insertion, and deletion — usually in average O(1) time.
Core idea in one simple sentence:
We use a magic function (called hash function) that converts any key into an array index, so we can directly jump to the place where the value is stored or should be stored.
It’s like a super-fast phonebook:
- Key = person’s name
- Value = phone number
- You don’t search page by page — you calculate exactly which page the name should be on
Real-life examples where Hash Tables are used everywhere
| Real-world usage | What is the Key? | What is the Value? |
|---|---|---|
| Dictionary / spell checker | Word | Meaning / spelling suggestions |
| Login system | Username | Password hash / user object |
| Phone contacts | Name or number | Contact details |
| Browser cache | URL | Cached webpage / image |
| Database indexing (simplified) | ID / email | Record location |
| Counting frequencies | Character / word | How many times it appears |
| Two Sum / Subarray problems | Number | Index where it appeared |
| Memoization in DP | State (string / tuple) | Already computed result |
| Python dict, Java HashMap, C++ unordered_map | Anything hashable | Anything |
How Hash Table Works – Step by Step (very important)
- You give a key (string, number, tuple, object, etc.)
- Hash function calculates a number (hash code)
- Hash code → modulo array size → gives array index (bucket)
- Go directly to that index in the array
- Store / retrieve the value there
Simple visual:
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
Key: "apple" ↓ Hash function → 4873921 ↓ 4873921 % 10 = 1 ← array index ↓ array[1] contains ("apple", 150) ← key-value pair |
The most important part: Collision
Collision = when two different keys get the same array index
Example:
|
0 1 2 3 4 5 6 7 |
"apple" → hash → 4873921 % 10 = 1 "banana" → hash → 9231742 % 10 = 1 |
Both want to go to index 1 → collision!
How we solve collisions (two main methods)
1. Separate Chaining (most common in teaching & practice)
Each array position (bucket) contains a linked list (or sometimes tree in modern implementations).
|
0 1 2 3 4 5 6 7 8 9 10 |
array size = 10 Index: 0 1 2 3 ... 9 [] [apple→banana] [] [] [] (linked list) |
- Insert: add to the linked list at that index
- Search: hash → go to index → linear search in the small linked list
Time complexity (average case): O(1 + α) α = load factor = number of elements / array size If load factor is kept low (e.g. < 0.7) → very small linked lists → almost O(1)
2. Open Addressing (Linear Probing, Quadratic Probing, Double Hashing)
No linked lists — we look for the next free slot in the array.
Linear Probing example:
|
0 1 2 3 4 5 6 7 8 |
"apple" wants index 1 → taken → try index 2 → taken → try index 3 → free → put here |
Main variants:
- Linear Probing → next slot (index + 1, +2, +3…)
- Quadratic Probing → index + 1², +2², +3²…
- Double Hashing → use second hash function for step size
Advantage: better cache performance (sequential memory) Disadvantage: clustering (many collisions → long probe sequences)
Hash Table Operations & Time Complexity
| Operation | Average Case | Worst Case (bad hash) | With good hash & load factor control |
|---|---|---|---|
| Insert | O(1) | O(n) | O(1) |
| Search / Get | O(1) | O(n) | O(1) |
| Delete | O(1) | O(n) | O(1) |
| Space | O(n) | O(n) | O(n) |
Important: The average O(1) is only possible with:
- Good hash function (uniform distribution)
- Reasonable load factor (usually resize when > 0.7–0.8)
- Collision resolution strategy
Simple Hash Table Implementation (Python style – educational)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
class HashTable: def __init__(self, size=10): self.size = size self.table = [[] for _ in range(size)] # separate chaining def _hash(self, key): # Very simple hash for illustration (real hash is much better) return hash(key) % self.size def insert(self, key, value): index = self._hash(key) # Check if key already exists → update for i, (k, v) in enumerate(self.table[index]): if k == key: self.table[index][i] = (key, value) return # Else append new pair self.table[index].append((key, value)) def get(self, key): index = self._hash(key) for k, v in self.table[index]: if k == key: return v return None def delete(self, key): index = self._hash(key) for i, (k, v) in enumerate(self.table[index]): if k == key: del self.table[index][i] return True return False def display(self): for i, chain in enumerate(self.table): print(f"Bucket {i}: {chain}") # Usage ht = HashTable(5) ht.insert("apple", 150) ht.insert("banana", 80) ht.insert("orange", 60) ht.insert("grapes", 120) # likely collision ht.display() # Example output: # Bucket 0: [] # Bucket 1: [('apple', 150), ('grapes', 120)] # Bucket 2: [('banana', 80)] # Bucket 3: [] # Bucket 4: [('orange', 60)] print(ht.get("apple")) # 150 ht.delete("banana") |
Very Common Interview Questions on Hash Tables
| Level | Question examples (very frequently asked) |
|---|---|
| Easy | Implement hash map / two sum |
| Easy | Count frequency of elements |
| Medium | Valid Anagram / Group Anagrams |
| Medium | Longest Substring Without Repeating Characters |
| Medium | LRU Cache (hash map + doubly linked list) |
| Medium | Subarray Sum Equals K |
| Medium | Contains Duplicate II / III |
| Hard | Design HashMap (without built-in) |
| Hard | Design Twitter / TinyURL |
| Hard | Word Pattern / Isomorphic Strings |
Summary – Hash Table in one line each
- Provides average O(1) lookup, insert, delete
- Uses hash function to convert key → array index
- Handles collisions with separate chaining (most common) or open addressing
- Most used data structure after arrays in real programming
- Python dict, Java HashMap, C++ unordered_map are all hash tables
- Load factor and good hash function are critical for performance
- Basis for countless medium & hard interview problems
Now that you understand hash tables properly…
Which direction would you like to go next?
Very natural follow-ups:
- How to implement LRU Cache (very famous interview question)
- Two Sum / 3Sum / 4Sum using hash map
- Group Anagrams – classic hash table problem
- Longest Substring Without Repeating Characters
- What makes a good hash function?
- Hash Table vs TreeMap / Balanced BST (when to use which)
Just tell me which one you want — I’ll explain it with the same level of detail and examples! 😄
