Chapter 38: PostgreSQL GROUP BY Clause
The GROUP BY clause βπ
You already know:
- SELECT, WHERE, ORDER BY, LIMIT
- Aggregates like COUNT, SUM, AVG, MIN, MAX
- JOIN, UNION
Now we reach GROUP BY β the command that says:
βI donβt want one big total β I want totals per group.β
Without GROUP BY β aggregates give you one single number for the whole table. With GROUP BY β you get one number per category (per city, per year, per course, per customer, etc.).
This is how almost every real dashboard, report, and analytics query is built.
1. What GROUP BY actually does (teacher drawing on imaginary board)
Imagine your students table has 100 rows.
You want:
- Not the overall average GPA
- But average GPA per enrollment_year
- Or number of students per city
- Or total fees paid per status
GROUP BY says:
- Split the rows into groups based on the value(s) in the GROUP BY column(s)
- For each group β calculate the aggregates (COUNT, AVG, SUMβ¦) separately
- Return one row per group with the aggregate results
2. Basic syntax skeleton (memorize this shape)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
SELECT grouping_column(s), aggregate_function(column), aggregate_function_2(column), ... FROM table [WHERE filter_before_grouping] GROUP BY grouping_column(s) [HAVING filter_after_grouping] [ORDER BY β¦] [LIMIT β¦]; |
Golden rule #1 (the one most beginners break):
Every column in the SELECT list that is not inside an aggregate function must appear in the GROUP BY clause (or be functionally dependent on it).
3. Real, detailed examples β using our students table
Current students data (simplified + added city & fees_paid columns):
| id | first_name | last_name | city | enrollment_year | gpa | fees_paid |
|---|---|---|---|---|---|---|
| 1 | Aarav | Patel | Hyderabad | 2025 | 8.95 | TRUE |
| 2 | Priya | Reddy | Secunderabad | 2024 | 9.60 | FALSE |
| 3 | Rahul | Sharma | Hyderabad | 2025 | 7.85 | FALSE |
| 4 | Sneha | Kumar | Hyderabad | 2025 | 9.20 | TRUE |
| 5 | Vikram | Naik | Hyderabad | 2025 | 8.40 | TRUE |
| 6 | Meera | Iyer | Secunderabad | 2025 | 10.00 | TRUE |
Example 1: Average GPA per enrollment year (classic GROUP BY)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
SELECT enrollment_year, COUNT(*) AS student_count, ROUND(AVG(gpa), 2) AS avg_gpa, MIN(gpa) AS min_gpa, MAX(gpa) AS max_gpa FROM students GROUP BY enrollment_year ORDER BY enrollment_year; |
Result:
|
0 1 2 3 4 5 6 7 8 9 |
enrollment_year | student_count | avg_gpa | min_gpa | max_gpa βββββββββββββββββΌββββββββββββββββΌββββββββββΌββββββββββΌββββββββ 2024 β 1 β 9.60 β 9.60 β 9.60 2025 β 5 β 8.88 β 7.85 β 10.00 |
β One row per year β aggregates calculated separately for each year
Example 2: Students & fees status per city
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
SELECT city, fees_paid, COUNT(*) AS student_count, ROUND(AVG(gpa), 2) AS avg_gpa, SUM(CASE WHEN fees_paid THEN 1 ELSE 0 END) AS paid_count FROM students GROUP BY city, fees_paid ORDER BY city, fees_paid; |
Result:
|
0 1 2 3 4 5 6 7 8 9 10 11 |
city | fees_paid | student_count | avg_gpa | paid_count -------------βΌββββββββββββΌββββββββββββββββΌββββββββββΌββββββββββββ Hyderabad | FALSE | 1 | 7.85 | 0 Hyderabad | TRUE | 3 | 8.85 | 3 Secunderabad | FALSE | 1 | 9.60 | 0 Secunderabad | TRUE | 1 | 10.00 | 1 |
β Groups by two columns β 4 groups total
Example 3: Wrong query (the famous error beginners make)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
SELECT city, fees_paid, COUNT(*) FROM students; -- ERROR: column "students.city" must appear in the GROUP BY clause -- or be used in an aggregate function |
Fix β add GROUP BY:
|
0 1 2 3 4 5 6 |
GROUP BY city, fees_paid |
Or β if you really want one row β remove non-aggregated columns or wrap them:
|
0 1 2 3 4 5 6 7 8 9 |
SELECT STRING_AGG(DISTINCT city, ', ') AS all_cities, COUNT(*) FROM students; |
Example 4: GROUP BY + HAVING (filter after grouping)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
SELECT city, COUNT(*) AS student_count, ROUND(AVG(gpa), 2) AS avg_gpa FROM students GROUP BY city HAVING COUNT(*) >= 2 -- only cities with 2+ students AND AVG(gpa) >= 9.0 -- and average GPA β₯ 9.0 ORDER BY avg_gpa DESC; |
Result (example):
|
0 1 2 3 4 5 6 7 8 |
city | student_count | avg_gpa -------------βΌββββββββββββββββΌββββββββ Secunderabad | 2 | 9.80 |
β HAVING filters groups (after aggregation) β WHERE would filter individual rows (before grouping)
4. Very important rules & gotchas (2026 PostgreSQL reality)
| Rule / Mistake | What happens | Correct way / Explanation |
|---|---|---|
| Non-aggregated column not in GROUP BY | ERROR: must appear in GROUP BY or aggregate | Add to GROUP BY or wrap in aggregate (MIN, MAX, STRING_AGGβ¦) |
| GROUP BY with no aggregate functions | Works β but just removes duplicates (like DISTINCT) | Use DISTINCT instead β clearer |
| NULL values in GROUP BY column | All NULLs go into one group | Normal & correct behavior |
| GROUP BY ordinal position (GROUP BY 1, 2) | Works but very bad practice | Always use column names β safer & readable |
| Performance on large tables | Slow without index on GROUP BY columns | Index the grouping columns + frequently filtered columns |
| GROUP BY + JOIN | Aggregates after join β careful with duplication | Use subquery / CTE if join causes row multiplication |
5. Quick cheat-sheet (keep in your notes)
| Goal | Typical GROUP BY pattern | HAVING example |
|---|---|---|
| Count per category | GROUP BY city | HAVING COUNT(*) > 10 |
| Average / total per group | GROUP BY enrollment_year | HAVING AVG(gpa) >= 8.5 |
| Multiple grouping levels | GROUP BY city, fees_paid | HAVING COUNT(*) >= 2 |
| Filter groups after aggregation | GROUP BY β¦ HAVING β¦ | β |
| Get list of values per group | GROUP BY city, STRING_AGG(name, ‘, ‘) | β |
Your mini homework right now
- Calculate number of students + average GPA per city
- Same query but only show cities with average GPA β₯ 9.0 (use HAVING)
- Group by enrollment_year and fees_paid β show count & avg GPA per combination
- Try the wrong query (SELECT city, COUNT(*) without GROUP BY) β see the error
Next class?
Tell me:
- Want HAVING vs WHERE deep comparison?
- GROUP BY with JOIN (common real-world pattern)?
- Advanced grouping: GROUPING SETS, ROLLUP, CUBE?
- Aggregates inside window functions (no GROUP BY needed)?
Your guru is ready β what’s the next topic? π
