Chapter 21: SQL Best Practices and Real-World Examples
The final chapter that brings everything together and shows you how to write SQL like a senior data engineer or analyst who works on real production systems every day.
By now you know the syntax. This chapter is about writing clean, fast, safe, and maintainable SQL that won’t break when your table grows from 1,000 to 100 million rows.
We’ll cover three big areas:
- Writing efficient queries – The golden rules that make your queries run fast
- Common patterns you’ll use every day (finding duplicates, ranking, deduplication, etc.)
- Working with large datasets – Tips for billion-row tables, data warehouses, and performance
Let’s dive in with real-world examples from our bookshop database (and imagine it now has 10 million books and 50 million orders).
1. Writing Efficient Queries – The Golden Rules (2026 Style)
| Rule # | Rule | Why it matters | Bad (slow) example | Good (fast) example |
|---|---|---|---|---|
| 1 | **Never use SELECT *** | Returns unnecessary columns → more I/O, slower network | SELECT * FROM books WHERE price < 200; | SELECT title, price, stock FROM books WHERE price < 200; |
| 2 | Filter as early as possible | Reduce rows before JOINs and aggregations | SELECT … FROM books JOIN authors … WHERE books.price < 200; | SELECT … FROM books JOIN authors … ON … WHERE books.price < 200; |
| 3 | Use proper indexes | Avoid full table scans | No index on price → scans 10M rows | CREATE INDEX idx_books_price ON books(price); |
| 4 | Avoid functions on indexed columns | Prevents index usage | WHERE YEAR(published_date) = 2025 | WHERE published_date BETWEEN ‘2025-01-01’ AND ‘2025-12-31’ |
| 5 | Use EXISTS instead of IN (subqueries) | Stops as soon as match found | WHERE author_id IN (SELECT …) | WHERE EXISTS (SELECT 1 FROM … WHERE …) |
| 6 | Prefer INNER JOIN over WHERE | Easier to read, optimizer-friendly | FROM books, authors WHERE books.author_id = authors.author_id | FROM books INNER JOIN authors ON books.author_id = authors.author_id |
| 7 | Limit result set early | Especially for paginated reports | SELECT … ORDER BY price DESC | SELECT … ORDER BY price DESC LIMIT 10 OFFSET 20 |
| 8 | Use covering indexes | Index contains all columns needed → no table lookup | CREATE INDEX idx_books_price_title ON books(price, title); | Query only price and title → uses index only |
| 9 | Avoid correlated subqueries when possible | Runs per row → slow on large data | WHERE price > (SELECT AVG(price) FROM books) | Use CTE or JOIN with pre-calculated average |
| 10 | Always test with EXPLAIN | See if query uses indexes | — | EXPLAIN SELECT … |
Real-world example – Efficient top 10 bestsellers report
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
-- Efficient version WITH RecentSales AS ( SELECT book_id, SUM(quantity) AS total_sold FROM order_items WHERE order_date >= DATE_SUB(CURDATE(), INTERVAL 30 DAY) GROUP BY book_id ) SELECT b.title, a.name AS author, b.price, rs.total_sold, ROUND(b.price * rs.total_sold, 2) AS revenue FROM RecentSales rs INNER JOIN books b ON rs.book_id = b.book_id INNER JOIN authors a ON b.author_id = a.author_id ORDER BY rs.total_sold DESC LIMIT 10; |
2. Common Patterns You’ll Use Every Day
Here are the top 10 patterns that appear in almost every real project.
Pattern 1: Find duplicates
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
-- Find duplicate ISBNs SELECT isbn, COUNT(*) AS count FROM books GROUP BY isbn HAVING COUNT(*) > 1; -- Find duplicate emails (with full rows) WITH Duplicates AS ( SELECT email, COUNT(*) AS cnt FROM customers GROUP BY email HAVING cnt > 1 ) SELECT c.* FROM customers c INNER JOIN Duplicates d ON c.email = d.email ORDER BY c.email; |
Pattern 2: Deduplicate (keep only one row)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
-- Keep the latest row for each duplicate email DELETE FROM customers WHERE customer_id NOT IN ( SELECT MAX(customer_id) FROM customers GROUP BY email ); |
Pattern 3: Ranking / Top-N per group
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 |
-- Top 3 most expensive books per author SELECT author_name, title, price, ROW_NUMBER() OVER (PARTITION BY author_id ORDER BY price DESC) AS rank FROM book_catalog WHERE rank <= 3; |
Pattern 4: Running / cumulative totals
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
SELECT order_date, daily_revenue, SUM(daily_revenue) OVER (ORDER BY order_date) AS running_total FROM ( SELECT order_date, SUM(total_amount) AS daily_revenue FROM orders GROUP BY order_date ) daily ORDER BY order_date; |
Pattern 5: Previous / next row comparison
|
0 1 2 3 4 5 6 7 8 9 10 11 |
SELECT order_date, total_amount, LAG(total_amount) OVER (ORDER BY order_date) AS previous_day, total_amount - LAG(total_amount) OVER (ORDER BY order_date) AS day_over_day_change FROM daily_sales; |
Pattern 6: Gap analysis (missing sequence)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
WITH AllDays AS ( SELECT DATE('2026-01-01') + INTERVAL n DAY AS dt FROM (SELECT a.N + b.N*10 + c.N*100 AS n FROM numbers a CROSS JOIN numbers b CROSS JOIN numbers c) nums WHERE dt <= CURDATE() ) SELECT dt FROM AllDays LEFT JOIN orders o ON DATE(o.order_date) = dt WHERE o.order_id IS NULL; |
3. Working with Large Datasets – 100M+ Rows Tips
| Tip | Why | How |
|---|---|---|
| Partitioning | Split huge tables by date/range | PARTITION BY RANGE (YEAR(order_date)) |
| Columnar storage (ClickHouse, Snowflake, BigQuery) | Great for analytics | Switch to columnar DB for reporting |
| Materialized views | Pre-compute aggregates | CREATE MATERIALIZED VIEW mv_daily_sales AS … |
| Incremental loading | Don’t reprocess everything | Use WHERE updated_at > LAST_RUN |
| Batch processing | Avoid locking entire table | UPDATE … WHERE id BETWEEN 1 AND 10000 in loop |
| Use approximate aggregates | Fast on huge data | APPROX_COUNT_DISTINCT() in BigQuery |
| Avoid DELETE on large tables | Slow + locks | Use TRUNCATE or soft-delete + partition drop |
| Sharding | Scale horizontally | Split data across multiple DB instances |
Real-world example – Handling 1 billion order_items
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
-- Efficient monthly sales summary (BigQuery style) CREATE OR REPLACE TABLE monthly_sales AS SELECT EXTRACT(YEAR FROM order_date) AS year, EXTRACT(MONTH FROM order_date) AS month, COUNT(*) AS order_count, APPROX_COUNT_DISTINCT(customer_id) AS unique_customers, SUM(total_amount) AS revenue FROM `project.dataset.order_items` WHERE order_date >= '2025-01-01' GROUP BY year, month; |
That’s Chapter 21: SQL Best Practices and Real-World Examples complete! You’ve now gone from zero to production-ready SQL master.
You can confidently:
- Write fast, clean queries
- Handle real-world messy data
- Optimize for scale
- Pass any SQL interview
Would you like me to create a complete final project (full bookstore database with tables, views, procedures, triggers, indexes, and sample reports)? Or a 100-question mega-quiz covering all 21 chapters? Or help you with any real SQL problem you’re facing at work/study? Just say the word — I’m here for you! 😊
