Chapter 19: Database Normalization

One of the most important theoretical chapters in the entire SQL course!

Database Normalization is the process of organizing data in a database to reduce redundancy (duplicate data) and prevent data anomalies (problems when inserting, updating, or deleting data).

Think of normalization like organizing a messy cupboard:

  • You don’t want the same shirt in 5 different places (redundancy).
  • You don’t want to tear the cupboard apart every time you want to add a new shirt (anomalies).

We follow a series of normal forms (rules) — each one stricter than the previous — to make the database cleaner and safer.

Today we’ll cover the four most important normal forms that almost every professional database aims for:

  1. 1NF (First Normal Form)
  2. 2NF (Second Normal Form)
  3. 3NF (Third Normal Form)
  4. BCNF (Boyce-Codd Normal Form – a stronger version of 3NF)

And finally, we’ll talk about Denormalization — when and why we intentionally break these rules.

Let’s use a realistic example: an online bookstore order system with a badly designed table that we’ll normalize step by step.

Bad Starting Table – “Orders” (completely unnormalized)

order_id customer_name customer_email customer_phone book_titles book_prices author_names order_date total_amount
101 Raj Sharma raj@mumbai.com 9876543210 2 States, The White Tiger 299, 399 Chetan Bhagat, Jhumpa Lahiri 2026-01-15 698
102 Priya Patel priya@gmail.com 9123456789 The Immortals of Meluha, Wise and Otherwise 349, 199 Amish Tripathi, Sudha Murty 2026-01-16 548

Problems already visible:

  • Multiple books in one cell (repeating groups)
  • Customer info repeated if same customer orders again
  • Hard to update one book price without affecting others
  • Hard to find all orders by one author

Let’s fix this step by step.

1. First Normal Form (1NF) – Eliminate Repeating Groups

Rule for 1NF:

  • Each column must contain atomic (single, indivisible) values
  • No repeating groups or multi-valued columns
  • Each row must have a unique primary key

How to achieve 1NF: Split multi-valued columns into separate rows.

After 1NF – “Orders” table becomes: 

order_id customer_name customer_email customer_phone book_title book_price author_name order_date total_amount
101 Raj Sharma raj@mumbai.com 9876543210 2 States 299 Chetan Bhagat 2026-01-15 698
101 Raj Sharma raj@mumbai.com 9876543210 The White Tiger 399 Jhumpa Lahiri 2026-01-15 698
102 Priya Patel priya@gmail.com 9123456789 The Immortals of Meluha 349 Amish Tripathi 2026-01-16 548
102 Priya Patel priya@gmail.com 9123456789 Wise and Otherwise 199 Sudha Murty 2026-01-16 548

Still problems:

  • Customer info (name, email, phone) repeated for every book in the order
  • Total_amount repeated for every row of the same order
  • Hard to update customer phone without updating multiple rows

2. Second Normal Form (2NF) – Eliminate Partial Dependencies

Rule for 2NF:

  • Must be in 1NF
  • All non-key attributes must depend on the entire primary key (no partial dependency)

In our table, the primary key is composite: (order_id, book_title) But customer_name, customer_email, customer_phone depend only on order_id (partial dependency).

How to achieve 2NF: Split into two tables:

Table 1: Orders (order-level info)

order_id customer_name customer_email customer_phone order_date total_amount
101 Raj Sharma raj@mumbai.com 9876543210 2026-01-15 698
102 Priya Patel priya@gmail.com 9123456789 2026-01-16 548

Table 2: Order_Items (item-level info)

order_id book_title book_price author_name
101 2 States 299 Chetan Bhagat
101 The White Tiger 399 Jhumpa Lahiri
102 The Immortals of Meluha 349 Amish Tripathi
102 Wise and Otherwise 199 Sudha Murty

Foreign key: order_id in Order_Items references Orders

Now it’s in 2NF — no partial dependencies.

3. Third Normal Form (3NF) – Eliminate Transitive Dependencies

Rule for 3NF:

  • Must be in 2NF
  • No transitive dependencies — non-key columns should not depend on other non-key columns

In our Orders table: customer_name, customer_email, customer_phone depend on each other (all depend on customer, but customer isn’t a key).

Transitive dependency: order_id → customer_email → customer_phone

How to achieve 3NF: Split customers into their own table.

Table 1: Customers

customer_id customer_name customer_email customer_phone
1 Raj Sharma raj@mumbai.com 9876543210
2 Priya Patel priya@gmail.com 9123456789

Table 2: Orders (now normalized)

order_id customer_id order_date total_amount
101 1 2026-01-15 698
102 2 2026-01-16 548

Table 3: Order_Items (unchanged)

Now it’s in 3NF — no transitive dependencies.

4. Boyce-Codd Normal Form (BCNF) – Stronger 3NF

Rule for BCNF:

  • Must be in 3NF
  • For every dependency X → Y, X must be a superkey (candidate key)

Example of violation: Suppose we have a table:

Table: Student_Courses

student_id course_id instructor_name
101 CS101 Prof. Sharma
102 CS101 Prof. Sharma
103 MATH202 Prof. Patel

Here: course_id → instructor_name (one course has one instructor) But course_id is not a key (student_id + course_id is the key). This is 3NF but not BCNF.

Fix for BCNF: Split into:

Courses

course_id instructor_name
CS101 Prof. Sharma
MATH202 Prof. Patel

Student_Courses

student_id course_id
101 CS101
102 CS101
103 MATH202

Now every determinant is a key → BCNF.

Denormalization – When We Intentionally Break Normalization

Normalization reduces redundancy but can slow down queries (lots of JOINs) and make some operations complex.

Denormalization means intentionally adding redundancy for performance.

Common reasons to denormalize:

  • Speed up frequent read-heavy queries (e.g., reporting, dashboards)
  • Reduce number of JOINs
  • Support NoSQL-like queries in RDBMS
  • Caching aggregated data

Example – Denormalized Order Summary Instead of joining Orders + Order_Items + Customers every time, we create:

denormalized_orders_report

order_id customer_name customer_email total_books total_amount order_date
101 Raj Sharma raj@mumbai.com 2 698 2026-01-15
102 Priya Patel priya@gmail.com 2 548 2026-01-16

We duplicate customer info and pre-calculate total_books → super fast reads, but updates must maintain consistency (usually via triggers or application logic).

Trade-off summary:

Aspect Normalized (3NF/BCNF) Denormalized
Storage Less (no duplicates) More (redundancy)
Insert/Update/Delete Safer, fewer anomalies Risk of inconsistency
Read performance Slower (many JOINs) Faster (fewer JOINs)
Best for OLTP systems (transactions) OLAP / reporting / data warehouses

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *