Chapter 19: DSA Hash Sets
DSA Hash Sets – Let’s understand this topic properly, like I’m sitting next to you explaining it slowly and clearly with examples, visuals in words, code, real use-cases, common mistakes, and most importantly — how HashSet is different from HashMap and when you should prefer one over the other.
What is a Hash Set?
A HashSet is a data structure that stores only unique elements and gives you very fast membership checking (whether an element exists or not).
In simple words:
HashSet = Hash Table that only keeps keys (no values attached)
It is exactly a HashMap where the value is irrelevant — we only care whether the key is present or not.
Core Properties of HashSet
| Property | Meaning / Explanation |
|---|---|
| Uniqueness | No duplicates allowed (adding duplicate does nothing) |
| No order guarantee | Elements are not stored in insertion order (except in some special implementations) |
| Very fast lookup | Average O(1) time to check if element exists |
| Very fast insert / delete | Average O(1) |
| Uses hashing internally | Same mechanism as HashMap |
| Does not allow null key | (in most languages — Java, C++ yes; Python set allows None) |
Real-life examples where HashSet is used
| Situation | Why HashSet is perfect |
|---|---|
| Check if username already exists | Fast duplicate check |
| Remove duplicates from a list | Add all elements to set → size = unique count |
| Find common elements between two lists | Put one list in set → check second list fast |
| Detect if array contains duplicate | Try to add → if already present → duplicate |
| Find elements that appear only once | Use set + frequency logic |
| Track visited nodes in graph traversal | Avoid revisiting nodes |
| Validate if string has all unique characters | Add chars to set → compare sizes |
HashSet vs HashMap – Very Important Comparison
| Feature / Question | HashSet | HashMap / Dictionary |
|---|---|---|
| Stores | Only keys | Key-Value pairs |
| Allows duplicate keys? | No | No (keys unique) |
| Allows duplicate values? | — (no values) | Yes |
| Can get value for key? | No | Yes |
| Main operation | contains() / membership test | get() / put() |
| Typical use case | Uniqueness, existence check | Mapping, storing data |
| Internal implementation | Hash table with dummy values | Hash table with key-value nodes |
| Python name | set() | dict() |
| Java name | HashSet | HashMap |
| C++ name | unordered_set | unordered_map |
How HashSet Works Internally (same as HashMap)
- You add an element (say 42)
- Hash function computes hash code
- hash code % table_size → gives bucket/index
- If bucket is empty → store element
- If collision → handled by chaining (linked list) or probing
- When you check contains(42) → same hash → go to same bucket → search in small list
Visual (Separate Chaining):
|
0 1 2 3 4 5 6 7 8 9 |
table size = 7 Index: 0 1 2 3 4 5 6 [] [17→51] [42] [] [19] [] [88] |
All elements in same bucket form a small linked list.
HashSet Operations & Time Complexity
| Operation | Average Case | Worst Case (bad hash / high load) | Typical in practice |
|---|---|---|---|
| add / insert | O(1) | O(n) | O(1) |
| contains / search | O(1) | O(n) | O(1) |
| remove / delete | O(1) | O(n) | O(1) |
| size | O(1) | O(1) | O(1) |
| iteration | O(n) | O(n) | O(n) |
Simple HashSet Examples (Python – easiest to read)
Example 1: Remove duplicates from list
|
0 1 2 3 4 5 6 7 8 9 10 |
numbers = [4, 2, 4, 8, 2, 9, 1, 4, 7, 2] unique = set(numbers) print(unique) # {1, 2, 4, 7, 8, 9} print(len(unique)) # 6 |
Example 2: Check if array has duplicates
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
def has_duplicates(arr): seen = set() for num in arr: if num in seen: return True seen.add(num) return False print(has_duplicates([1, 2, 3, 4])) # False print(has_duplicates([1, 2, 2, 4])) # True |
Example 3: Find common elements between two arrays
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
a = [1, 2, 3, 4, 5] b = [4, 5, 6, 7, 8] set_a = set(a) common = [x for x in b if x in set_a] print(common) # [4, 5] |
Example 4: Check if string has all unique characters
|
0 1 2 3 4 5 6 7 8 9 10 |
def is_unique(s): return len(s) == len(set(s)) print(is_unique("hello")) # False print(is_unique("world")) # True |
Very Common Interview Questions on HashSet
| Level | Question examples (very frequently asked) |
|---|---|
| Easy | Contains Duplicate |
| Easy | Remove duplicates from array |
| Easy | Single Number (all appear twice except one) |
| Medium | Valid Anagram |
| Medium | Intersection of Two Arrays |
| Medium | Happy Number (detect cycle using set) |
| Medium | Longest Substring Without Repeating Characters |
| Medium | First Repeating Element |
| Hard | Longest Consecutive Sequence |
| Hard | Design Underground System (average journey time) |
Summary – HashSet in one line each
- Stores only unique elements
- Provides average O(1) contains / insert / delete
- No values — only keys (unlike HashMap)
- No order guarantee (unordered)
- Perfect when you only care about existence / uniqueness
- Python set, Java HashSet, C++ unordered_set
- One of the most useful data structures for interviews
- Basis for many easy to medium problems, and some famous hard ones
Would you like to go deeper into any part?
Very natural next topics:
- Longest Consecutive Sequence (classic HashSet problem)
- Contains Duplicate II (sliding window + set)
- Happy Number (cycle detection with set)
- Intersection / Union / Difference operations on sets
- HashSet vs HashMap detailed comparison with examples
- How HashSet is implemented internally (open addressing vs chaining)
Just tell me which one you want next — I’ll explain it in the same detailed, human-teacher style! 😄
