Chapter 20: DSA Hash Maps
DSA Hash Maps – Let’s understand this topic properly, like I’m sitting next to you with a notebook, explaining slowly, clearly, with real-life analogies, memory visuals, step-by-step examples, code, common mistakes, and most importantly — when and why you should actually use Hash Maps in real coding and interviews.
What is a Hash Map?
A Hash Map (also called Hash Table, Dictionary, Map, Associative Array) is a data structure that stores key-value pairs and gives you extremely fast access to the value using the key.
In simple words:
You give it a key (like a word), and it instantly tells you the value (like the meaning) — usually in average O(1) time.
It is one of the most frequently used data structures after arrays in real-world programming.
Real-life analogy everyone understands
Imagine a huge library with millions of books, but instead of searching shelf by shelf:
- Every book has a unique catalog number (the key)
- There is a magic index book that directly tells you exactly which shelf & position the book is on
- You go straight to that position and get the book instantly
That magic index book is exactly what a hash map does using a hash function.
Core Concept – How Hash Map Works
- You have a key (string, number, object, etc.)
- A hash function converts the key into a number (hash code)
- That number is modulo the size of the internal array → gives a bucket/index
- You go directly to that bucket
- Store / retrieve the value there
Simple example flow:
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
Key: "apple" ↓ Hash function → 7492813 ↓ 7492813 % 16 = 5 ← array index / bucket ↓ Go to array[5] → find pair ("apple", 150) or store it there |
The most important problem: Collision
Collision = when two different keys produce the same bucket/index
Example:
|
0 1 2 3 4 5 6 7 |
"apple" → hash → ... % 16 = 5 "banana" → hash → ... % 16 = 5 |
Both want to go to bucket 5 → collision!
Two Main Ways to Handle Collisions
1. Separate Chaining (most common, most used in teaching & real libraries)
Each bucket contains a linked list (or sometimes a balanced tree when list gets long).
|
0 1 2 3 4 5 6 7 8 9 10 |
Internal array (size = 16): Index: 0 1 2 3 4 5 6 ... [] [] [keyA] [] [] ["apple":150 → "banana":80] [] (small linked list) |
- Insert: append to the linked list at that index
- Get: hash → go to index → linear search in small list
Average time → O(1 + α) α = load factor = number of elements / number of buckets If load factor kept reasonable (< 0.7–0.8) → very short lists → almost O(1)
2. Open Addressing (Linear Probing, Quadratic Probing, Double Hashing)
No linked lists — we find the next free slot in the array.
Linear Probing example:
|
0 1 2 3 4 5 6 7 8 |
"apple" wants index 5 → already occupied → try index 6 → occupied → try index 7 → free → put here |
Trade-off:
- Better cache performance (sequential memory access)
- Suffers from primary clustering (long probe sequences)
Most modern languages use separate chaining or hybrid approaches.
Hash Map Operations & Time Complexity
| Operation | Average Case | Worst Case (bad hash / high load) | Typical in practice with resizing |
|---|---|---|---|
| put / insert | O(1) | O(n) | O(1) |
| get / access | O(1) | O(n) | O(1) |
| remove / delete | O(1) | O(n) | O(1) |
| containsKey | O(1) | O(n) | O(1) |
| size | O(1) | O(1) | O(1) |
Very important note: The famous average O(1) only holds when:
- Hash function distributes keys uniformly
- Load factor is controlled (most implementations auto-resize when load > ~0.7–0.75)
- Good collision resolution
Very Simple Example – Real usage
Problem: Count frequency of each number in array
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# Using dictionary (Hash Map in Python) arr = [1, 3, 1, 4, 1, 2, 3, 4, 2, 1, 5] frequency = {} for num in arr: if num in frequency: frequency[num] += 1 else: frequency[num] = 1 print(frequency) # Output: {1: 4, 3: 2, 4: 2, 2: 2, 5: 1} |
Same thing in Java (HashMap)
|
0 1 2 3 4 5 6 7 8 9 10 |
Map<Integer, Integer> freq = new HashMap<>(); for (int num : arr) { freq.put(num, freq.getOrDefault(num, 0) + 1); } |
Very Common Interview Questions using Hash Map
| Level | Question examples (very very common) |
|---|---|
| Easy | Two Sum |
| Easy | Contains Duplicate |
| Easy | Valid Anagram |
| Medium | Group Anagrams |
| Medium | Longest Substring Without Repeating Characters |
| Medium | Subarray Sum Equals K |
| Medium | LRU Cache (HashMap + Doubly Linked List) |
| Medium | Minimum Window Substring |
| Hard | Design Twitter |
| Hard | Word Pattern |
| Hard | Design Underground System |
Hash Map vs Other Data Structures – Quick Comparison
When to choose Hash Map instead of others:
| Structure | Lookup time | Ordered? | Allows duplicates? | Use when you need… |
|---|---|---|---|---|
| Hash Map | O(1) avg | No | Keys no, values yes | Fast key → value lookup |
| TreeMap / SortedMap | O(log n) | Yes | Keys no | Sorted keys + fast lookup |
| Array / List | O(n) search | Yes | Yes | Index-based access |
| Linked List | O(n) | Yes | Yes | Frequent insert/delete |
Summary – Hash Map in one line each
- Stores key → value pairs
- Gives average O(1) lookup, insert, delete
- Uses hashing + collision resolution (usually chaining)
- No order guarantee (unless special variant like LinkedHashMap)
- Python dict, Java HashMap, C++ unordered_map
- One of the most powerful & most used data structures in interviews and real code
- Foundation for countless easy → hard problems
Would you like to continue with any specific Hash Map topic next?
Popular next topics:
- Two Sum (classic first HashMap problem)
- Group Anagrams (very common pattern)
- LRU Cache (famous design question)
- Longest Substring Without Repeating Characters
- How HashMap resizes internally
- HashMap vs TreeMap vs LinkedHashMap (when to choose which)
Just tell me which one you want — I’ll explain it in the same detailed, friendly way! 😄
