Chapter 19: Database Normalization
One of the most important theoretical chapters in the entire SQL course!
Database Normalization is the process of organizing data in a database to reduce redundancy (duplicate data) and prevent data anomalies (problems when inserting, updating, or deleting data).
Think of normalization like organizing a messy cupboard:
- You don’t want the same shirt in 5 different places (redundancy).
- You don’t want to tear the cupboard apart every time you want to add a new shirt (anomalies).
We follow a series of normal forms (rules) — each one stricter than the previous — to make the database cleaner and safer.
Today we’ll cover the four most important normal forms that almost every professional database aims for:
- 1NF (First Normal Form)
- 2NF (Second Normal Form)
- 3NF (Third Normal Form)
- BCNF (Boyce-Codd Normal Form – a stronger version of 3NF)
And finally, we’ll talk about Denormalization — when and why we intentionally break these rules.
Let’s use a realistic example: an online bookstore order system with a badly designed table that we’ll normalize step by step.
Bad Starting Table – “Orders” (completely unnormalized)
| order_id | customer_name | customer_email | customer_phone | book_titles | book_prices | author_names | order_date | total_amount |
|---|---|---|---|---|---|---|---|---|
| 101 | Raj Sharma | raj@mumbai.com | 9876543210 | 2 States, The White Tiger | 299, 399 | Chetan Bhagat, Jhumpa Lahiri | 2026-01-15 | 698 |
| 102 | Priya Patel | priya@gmail.com | 9123456789 | The Immortals of Meluha, Wise and Otherwise | 349, 199 | Amish Tripathi, Sudha Murty | 2026-01-16 | 548 |
Problems already visible:
- Multiple books in one cell (repeating groups)
- Customer info repeated if same customer orders again
- Hard to update one book price without affecting others
- Hard to find all orders by one author
Let’s fix this step by step.
1. First Normal Form (1NF) – Eliminate Repeating Groups
Rule for 1NF:
- Each column must contain atomic (single, indivisible) values
- No repeating groups or multi-valued columns
- Each row must have a unique primary key
How to achieve 1NF: Split multi-valued columns into separate rows.
After 1NF – “Orders” table becomes:
| order_id | customer_name | customer_email | customer_phone | book_title | book_price | author_name | order_date | total_amount |
|---|---|---|---|---|---|---|---|---|
| 101 | Raj Sharma | raj@mumbai.com | 9876543210 | 2 States | 299 | Chetan Bhagat | 2026-01-15 | 698 |
| 101 | Raj Sharma | raj@mumbai.com | 9876543210 | The White Tiger | 399 | Jhumpa Lahiri | 2026-01-15 | 698 |
| 102 | Priya Patel | priya@gmail.com | 9123456789 | The Immortals of Meluha | 349 | Amish Tripathi | 2026-01-16 | 548 |
| 102 | Priya Patel | priya@gmail.com | 9123456789 | Wise and Otherwise | 199 | Sudha Murty | 2026-01-16 | 548 |
Still problems:
- Customer info (name, email, phone) repeated for every book in the order
- Total_amount repeated for every row of the same order
- Hard to update customer phone without updating multiple rows
2. Second Normal Form (2NF) – Eliminate Partial Dependencies
Rule for 2NF:
- Must be in 1NF
- All non-key attributes must depend on the entire primary key (no partial dependency)
In our table, the primary key is composite: (order_id, book_title) But customer_name, customer_email, customer_phone depend only on order_id (partial dependency).
How to achieve 2NF: Split into two tables:
Table 1: Orders (order-level info)
| order_id | customer_name | customer_email | customer_phone | order_date | total_amount |
|---|---|---|---|---|---|
| 101 | Raj Sharma | raj@mumbai.com | 9876543210 | 2026-01-15 | 698 |
| 102 | Priya Patel | priya@gmail.com | 9123456789 | 2026-01-16 | 548 |
Table 2: Order_Items (item-level info)
| order_id | book_title | book_price | author_name |
|---|---|---|---|
| 101 | 2 States | 299 | Chetan Bhagat |
| 101 | The White Tiger | 399 | Jhumpa Lahiri |
| 102 | The Immortals of Meluha | 349 | Amish Tripathi |
| 102 | Wise and Otherwise | 199 | Sudha Murty |
Foreign key: order_id in Order_Items references Orders
Now it’s in 2NF — no partial dependencies.
3. Third Normal Form (3NF) – Eliminate Transitive Dependencies
Rule for 3NF:
- Must be in 2NF
- No transitive dependencies — non-key columns should not depend on other non-key columns
In our Orders table: customer_name, customer_email, customer_phone depend on each other (all depend on customer, but customer isn’t a key).
Transitive dependency: order_id → customer_email → customer_phone
How to achieve 3NF: Split customers into their own table.
Table 1: Customers
| customer_id | customer_name | customer_email | customer_phone |
|---|---|---|---|
| 1 | Raj Sharma | raj@mumbai.com | 9876543210 |
| 2 | Priya Patel | priya@gmail.com | 9123456789 |
Table 2: Orders (now normalized)
| order_id | customer_id | order_date | total_amount |
|---|---|---|---|
| 101 | 1 | 2026-01-15 | 698 |
| 102 | 2 | 2026-01-16 | 548 |
Table 3: Order_Items (unchanged)
Now it’s in 3NF — no transitive dependencies.
4. Boyce-Codd Normal Form (BCNF) – Stronger 3NF
Rule for BCNF:
- Must be in 3NF
- For every dependency X → Y, X must be a superkey (candidate key)
Example of violation: Suppose we have a table:
Table: Student_Courses
| student_id | course_id | instructor_name |
|---|---|---|
| 101 | CS101 | Prof. Sharma |
| 102 | CS101 | Prof. Sharma |
| 103 | MATH202 | Prof. Patel |
Here: course_id → instructor_name (one course has one instructor) But course_id is not a key (student_id + course_id is the key). This is 3NF but not BCNF.
Fix for BCNF: Split into:
Courses
| course_id | instructor_name |
|---|---|
| CS101 | Prof. Sharma |
| MATH202 | Prof. Patel |
Student_Courses
| student_id | course_id |
|---|---|
| 101 | CS101 |
| 102 | CS101 |
| 103 | MATH202 |
Now every determinant is a key → BCNF.
Denormalization – When We Intentionally Break Normalization
Normalization reduces redundancy but can slow down queries (lots of JOINs) and make some operations complex.
Denormalization means intentionally adding redundancy for performance.
Common reasons to denormalize:
- Speed up frequent read-heavy queries (e.g., reporting, dashboards)
- Reduce number of JOINs
- Support NoSQL-like queries in RDBMS
- Caching aggregated data
Example – Denormalized Order Summary Instead of joining Orders + Order_Items + Customers every time, we create:
denormalized_orders_report
| order_id | customer_name | customer_email | total_books | total_amount | order_date |
|---|---|---|---|---|---|
| 101 | Raj Sharma | raj@mumbai.com | 2 | 698 | 2026-01-15 |
| 102 | Priya Patel | priya@gmail.com | 2 | 548 | 2026-01-16 |
We duplicate customer info and pre-calculate total_books → super fast reads, but updates must maintain consistency (usually via triggers or application logic).
Trade-off summary:
| Aspect | Normalized (3NF/BCNF) | Denormalized |
|---|---|---|
| Storage | Less (no duplicates) | More (redundancy) |
| Insert/Update/Delete | Safer, fewer anomalies | Risk of inconsistency |
| Read performance | Slower (many JOINs) | Faster (fewer JOINs) |
| Best for | OLTP systems (transactions) | OLAP / reporting / data warehouses |
