Chapter 19: PostgreSQL SELECT DISTINCT

Part 1: What is SELECT DISTINCT?

The SELECT DISTINCT clause removes duplicate rows from your query results, returning only unique combinations of the selected columns.

The Basic Syntax

sql

When you use DISTINCT, PostgreSQL compares each row of your result set and eliminates duplicates, showing each unique combination only once.

DISTINCT vs. ALL

By default, SELECT includes ALL (though we rarely write it explicitly):

sql

Part 2: Setting Up Our Example Data

Let’s create a rich dataset to explore all the nuances of DISTINCT. Imagine we’re running an online store and tracking customer orders:

sql

Part 3: Basic DISTINCT Usage

1. Distinct Values in a Single Column

Let’s find all the unique customer states:

sql

Result:

customer_state
TX

Since all our customers are in Texas, we get just one row. Not very exciting, but it shows how DISTINCT collapses duplicates.

2. Distinct Customer Cities

sql

Result:

customer_city
Austin
Dallas
Houston
San Antonio

Even though Austin appears many times in the table, DISTINCT shows it only once.

3. Counting Distinct Values

Often, we want to know how many distinct values exist:

sql

Result:

unique_cities unique_states unique_categories shipping_methods
4 1 7 2

This tells us we have customers in 4 different cities (all in Texas), selling products from 7 categories, using 2 shipping methods.

Part 4: DISTINCT on Multiple Columns

Here’s where DISTINCT gets really powerful. When you specify multiple columns, DISTINCT considers the combination of values across those columns.

1. Unique Customer Locations

Find all unique combinations of customer city and state:

sql

Result:

customer_city customer_state
Austin TX
Dallas TX
Houston TX
San Antonio TX

This shows each unique city-state pair. If we later had customers from different states with the same city name, we’d see multiple rows for that city.

2. Unique Product Orders

Find all unique combinations of product category and product name:

sql

Result:

product_category product_name
Accessories Mouse Pad
Art Paint Set
Beauty Lotion
Beauty Perfume
Books Python Book
Books SQL Guide
Clothing Jeans
Clothing T-Shirt
Electronics Headphones
Electronics Laptop
Electronics Mouse
Home Blender
Sports Yoga Mat

Even though “Mouse” appears in multiple orders, it appears only once here because the combination (Electronics, Mouse) is unique.

3. Understanding the Combination Logic

This is crucial: DISTINCT looks at the entire row of selected columns. For example:

sql

Partial Result:

customer_city customer_state product_category
Austin TX Accessories
Austin TX Art
Austin TX Books
Austin TX Clothing
Austin TX Electronics
Dallas TX Beauty
Dallas TX Electronics

This shows which product categories have been ordered from each city.

Part 5: DISTINCT with NULL Values

NULL values have special behavior with DISTINCT. PostgreSQL treats all NULLs as equal for DISTINCT purposes.

Let’s add some NULL values to see this:

sql

Result:

shipping_method
Express
Standard
NULL

Notice: all three NULL values collapsed into a single NULL row. This is important – DISTINCT treats all NULLs as identical.

Part 6: DISTINCT ON – PostgreSQL’s Special Feature

PostgreSQL offers a powerful extension to standard SQL: DISTINCT ON. This keeps the “first” row for each unique value in specified columns.

Basic DISTINCT ON Syntax

sql

⚠️ Critical: The columns in DISTINCT ON must be the first columns in ORDER BY.

Example 1: Latest Order for Each Customer

Find the most recent order for each customer:

sql

Result:

customer_name order_date product_name quantity order_status
Alice Johnson 2024-02-01 Mouse Pad 1 Shipped
Bob Smith 2024-01-18 SQL Guide 1 Delivered
Carol White 2024-01-22 Python Book 1 Delivered
David Brown 2024-01-25 Headphones 1 Shipped
Eva Green 2024-02-05 Blender 1 Processing
Frank Black 2024-02-10 Yoga Mat 2 Pending
Grace Lee 2024-02-12 Mouse 3 Shipped
Henry Ford 2024-02-15 Mouse 1 Delivered
Iris West 2024-02-18 T-Shirt 2 Delivered
Jack Ryan 2024-02-20 Jeans 1 Delivered
Kate Moss 2024-02-22 Perfume 1 Pending
Leo King 2024-02-25 Lotion 2 Processing
Mona Lisa 2024-02-28 Paint Set 1 Shipped
Null Test 1 2024-03-01 Item 1 1 Pending
Null Test 2 2024-03-02 Item 2 1 Pending
Null Test 3 2024-03-03 Item 3 1 Pending

For Alice Johnson, we see her most recent order (Mouse Pad on 2024-02-01) instead of her earlier orders.

Example 2: Most Expensive Product in Each Category

First, let’s add price information:

sql

Now find the most expensive product in each category:

sql

Result:

product_category product_name price
Accessories Mouse Pad 12.99
Art Paint Set 34.99
Beauty Perfume 79.99
Books Python Book 54.99
Clothing Jeans 59.99
Electronics Laptop 999.99
Home Blender 129.99
Sports Yoga Mat 24.99
Test Item 1 9.99

For each category, we get the product with the highest price.

Part 7: DISTINCT vs. GROUP BY

Often, you can achieve similar results with DISTINCT and GROUP BY. Understanding the difference helps you choose the right tool.

Same Result, Different Approaches

sql

Both return the same unique city-state combinations.

When GROUP BY is Better

GROUP BY becomes essential when you need aggregations:

sql

Result:

customer_city customer_state order_count avg_quantity
Austin TX 8 1.3750
Dallas TX 3 1.0000
Houston TX 3 1.3333
San Antonio TX 1 2.0000

Performance Considerations

  • DISTINCT is often optimized for simple uniqueness checks

  • GROUP BY is optimized for aggregations but can be used for uniqueness

  • For large datasets, test both approaches to see which performs better

Part 8: Real-World Examples

Example 1: Customer Analytics Dashboard

sql

Example 2: Inventory Planning Report

Find which product categories are sold in each city:

sql

Result:

customer_city category_count categories
Austin 5 Accessories, Art, Books, Clothing, Electronics
Dallas 3 Beauty, Electronics, Test
Houston 3 Beauty, Clothing, Home
San Antonio 1 Sports

Example 3: Shipping Method Analysis

Analyze shipping methods by city and status:

sql

Example 4: Customer Purchase History Summary

sql

Part 9: DISTINCT with Window Functions

Advanced users can combine DISTINCT with window functions for sophisticated analytics:

Running Count of Distinct Customers

sql

Distinct Products Per Customer Over Time

sql

Part 10: Common Pitfalls and Solutions

Pitfall 1: DISTINCT Doesn’t Work as Expected with ORDER BY

sql

Solution: Use DISTINCT ON with proper ordering:

sql

Pitfall 2: Forgetting NULLs in DISTINCT

sql

Pitfall 3: DISTINCT with Large Text Fields

sql

Pitfall 4: Assuming DISTINCT Orders Results

sql

Part 11: Performance Optimization

Creating Indexes for DISTINCT Queries

sql

Analyzing Query Performance

Use EXPLAIN ANALYZE to see how PostgreSQL executes your DISTINCT queries:

sql

This shows whether PostgreSQL is using indexes, how many rows it’s processing, and where the time is spent.

Summary: The DISTINCT Philosophy

SELECT DISTINCT is your tool for finding the unique “fingerprints” in your data:

  1. Single column DISTINCT – “What are all the unique values in this column?”

  2. Multi-column DISTINCT – “What are all the unique combinations of these columns?”

  3. DISTINCT ON (PostgreSQL special) – “Give me the first row for each unique key”

  4. COUNT(DISTINCT) – “How many unique values exist?”

Remember:

  • DISTINCT removes duplicate rows, not duplicate values in a single column (unless it’s the only column)

  • NULLs are all considered equal for DISTINCT

  • Always use ORDER BY with DISTINCT ON

  • Consider GROUP BY when you need aggregations

  • Test performance on large datasets

The beauty of DISTINCT is that it helps you understand the variety and uniqueness in your data. It answers questions like “How many different cities do our customers come from?” and “What product categories has each customer purchased?”

Master DISTINCT, and you’ll have a powerful lens for viewing the unique patterns hidden in your database!

Would you like me to elaborate on any specific aspect of DISTINCT, such as more complex DISTINCT ON scenarios, performance tuning, or integration with other SQL features?

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *