Chapter 20: Pandas Exercises
Pandas exercises — written as if we are sitting together in a classroom or doing a live coding session.
I will give you exercises with increasing difficulty, clear instructions, small realistic datasets, hints when needed, and detailed solutions + explanations after each block so you can check yourself.
The idea is simple:
- Try to solve each exercise yourself first (even if you only get part of it right — that’s how you learn)
- Then compare with the solution
- Note which parts felt easy / confusing / surprising
- Come back and tell me what was difficult so we can focus more on those topics
Let’s start.
Warm-up / Level 1 – Core basics everyone should be comfortable with
Exercise 1 – Filtering & sorting
Task:
From the following DataFrame:
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
import pandas as pd data = { 'name': ['Priya', 'Rahul', 'Ananya', 'Sneha', 'Vikram', 'Meera', 'Arjun', 'Neha', 'Karan'], 'age': [24, 31, 19, 28, 45, 22, 33, 27, 29], 'city': ['Pune', 'Hyderabad', 'Bangalore', 'Mumbai', 'Pune', 'Chennai', 'Mumbai', 'Pune', 'Hyderabad'], 'salary': [72000, 145000, 88000, 112000, 210000, 68000, 95000, 105000, 128000], 'department': ['HR', 'IT', 'Sales', 'IT', 'Finance', 'Sales', 'HR', 'Marketing', 'IT'] } df = pd.DataFrame(data) |
Select only people who:
- are older than 26
- live in Pune, Mumbai or Hyderabad
- earn at least ₹90,000
Show only the columns: name, city, salary, department Sort the result by salary descending
Try it yourself first.
.
.
.
Solution & explanation
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
result = df[ (df['age'] > 26) & (df['city'].isin(['Pune', 'Mumbai', 'Hyderabad'])) & (df['salary'] >= 90000) ][['name', 'city', 'salary', 'department']].sort_values('salary', ascending=False) print(result) |
Expected output:
|
0 1 2 3 4 5 6 7 8 9 10 11 |
name city salary department 1 Rahul Hyderabad 145000 IT 8 Karan Hyderabad 128000 IT 3 Sneha Mumbai 112000 IT 7 Neha Pune 105000 Marketing 6 Arjun Mumbai 95000 HR |
Common mistakes:
- Forgetting parentheses around each condition → precedence error
- Using & without parentheses → wrong logic
- Using .isin() with a list but forgetting the square brackets
- Sorting column name typo (Salary instead of salary)
Exercise 2 – Creating conditional columns
Task:
Add two new columns to the same DataFrame:
- bonus_pct
- 12% if salary > 120,000
- 10% if salary > 90,000
- 7% otherwise
- bonus_amount = salary × bonus_pct (round to nearest whole number)
Show the original columns + the two new ones, sorted by bonus_amount descending.
Try it.
.
.
.
Solution & explanation
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
df['bonus_pct'] = pd.cut( df['salary'], bins=[0, 90000, 120000, np.inf], labels=[0.07, 0.10, 0.12], include_lowest=True ).astype(float) # Alternative (more explicit): df['bonus_pct'] = np.where(df['salary'] > 120000, 0.12, np.where(df['salary'] > 90000, 0.10, 0.07)) df['bonus_amount'] = (df['salary'] * df['bonus_pct']).round(0).astype(int) print(df[['name', 'salary', 'bonus_pct', 'bonus_amount']] .sort_values('bonus_amount', ascending=False)) |
Common mistakes:
- Forgetting to multiply by 0.12 / 0.10 (people often write 12 instead of 0.12)
- Using .apply() with a function when np.where or pd.cut is much faster
- Not rounding / converting to int → ugly decimal places
Level 2 – Typical daily tasks
Exercise 3 – Groupby with multiple aggregations
Task:
Group by department and calculate:
- number of employees
- average salary (rounded to 0 decimals)
- median salary
- highest salary
- percentage of people earning more than ₹100,000
Show the result sorted by average salary descending.
Try it.
.
.
.
Solution & explanation
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
summary = df.groupby('department').agg( count=('name', 'count'), avg_salary=('salary', 'mean'), median_salary=('salary', 'median'), max_salary=('salary', 'max'), above_1L_pct=('salary', lambda x: (x > 100000).mean() * 100) ).round({'avg_salary': 0, 'median_salary': 0, 'above_1L_pct': 1}) summary = summary.sort_values('avg_salary', ascending=False) print(summary) |
Common mistakes:
- Forgetting to name the aggregated columns → ugly column names
- Using mean() on boolean → wrong percentage
- Sorting by wrong column
Exercise 4 – Rank within group
Task:
Add column salary_rank_in_dept — rank within each department (1 = highest salary in that department)
Then show:
- name
- department
- salary
- salary_rank_in_dept
Sorted first by department (A→Z), then by rank (1 first)
Try it.
.
.
.
Solution & explanation
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
df['salary_rank_in_dept'] = df.groupby('department')['salary'].rank( ascending=False, method='min' # important: ties get same rank ).astype(int) print(df[['name', 'department', 'salary', 'salary_rank_in_dept']] .sort_values(['department', 'salary_rank_in_dept'])) |
Important notes:
- ascending=False → 1 = highest
- method=’min’ → if two people have same salary → both get rank 1, next gets 3
- Very common mistake: forgetting groupby() → ranks globally instead of per group
Level 3 – More realistic & slightly harder
Exercise 5 – Missing values + group-based imputation
Task:
Make some salaries missing:
|
0 1 2 3 4 5 6 |
df.loc[[1, 4, 7], 'salary'] = np.nan |
Now fill the missing salaries with the median salary of their own department. If the department has no valid salaries → use the overall median.
Show the DataFrame after filling (only name, department, salary columns).
Try it.
.
.
.
Solution & explanation
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# Step 1: department median dept_median = df.groupby('department')['salary'].median() # Step 2: fill using map df['salary'] = df['salary'].fillna(df['department'].map(dept_median)) # Step 3: if still any NaN (department had all missing) → overall median overall_median = df['salary'].median() df['salary'] = df['salary'].fillna(overall_median) print(df[['name', 'department', 'salary']].sort_values('salary', ascending=False)) |
Alternative (one-liner style):
|
0 1 2 3 4 5 6 7 |
df['salary'] = df.groupby('department')['salary'].transform(lambda x: x.fillna(x.median())) df['salary'] = df['salary'].fillna(df['salary'].median()) |
Exercise 6 – Top N per group (advanced version)
Task:
Show the top 2 highest-paid employees per department (If a department has fewer than 2 people → show all)
Columns to show: name, department, salary, rank_in_dept
Try it.
.
.
.
Solution & explanation
|
0 1 2 3 4 5 6 7 8 9 10 |
# Method 1 - using rank (cleanest) df['rank'] = df.groupby('department')['salary'].rank(ascending=False, method='min') top2 = df[df['rank'] <= 2][['name', 'department', 'salary', 'rank']].sort_values(['department', 'rank']) print(top2) |
Alternative method (using nlargest)
|
0 1 2 3 4 5 6 |
top2_alt = df.groupby('department').apply(lambda x: x.nlargest(2, 'salary')).reset_index(drop=True) |
What next?
Try at least 4–5 exercises before looking at all solutions.
Then come back and tell me:
- Which exercises felt easy?
- Which ones were difficult / gave you errors?
- Did any solution surprise you?
- Do you want more exercises on a specific topic (groupby, merging, missing values, plotting, time series, etc.)?
I’m here — we can go deeper on whatever you find challenging. 😊
