Chapter 20: Pandas Exercises

Pandas exercises — written as if we are sitting together in a classroom or doing a live coding session.

I will give you exercises with increasing difficulty, clear instructions, small realistic datasets, hints when needed, and detailed solutions + explanations after each block so you can check yourself.

The idea is simple:

  • Try to solve each exercise yourself first (even if you only get part of it right — that’s how you learn)
  • Then compare with the solution
  • Note which parts felt easy / confusing / surprising
  • Come back and tell me what was difficult so we can focus more on those topics

Let’s start.

Warm-up / Level 1 – Core basics everyone should be comfortable with

Exercise 1 – Filtering & sorting

Task:

From the following DataFrame:

Python

Select only people who:

  • are older than 26
  • live in Pune, Mumbai or Hyderabad
  • earn at least ₹90,000

Show only the columns: name, city, salary, department Sort the result by salary descending

Try it yourself first.

.

.

.

Solution & explanation

Python

Expected output:

text

Common mistakes:

  • Forgetting parentheses around each condition → precedence error
  • Using & without parentheses → wrong logic
  • Using .isin() with a list but forgetting the square brackets
  • Sorting column name typo (Salary instead of salary)

Exercise 2 – Creating conditional columns

Task:

Add two new columns to the same DataFrame:

  1. bonus_pct
    • 12% if salary > 120,000
    • 10% if salary > 90,000
    • 7% otherwise
  2. bonus_amount = salary × bonus_pct (round to nearest whole number)

Show the original columns + the two new ones, sorted by bonus_amount descending.

Try it.

.

.

.

Solution & explanation

Python

Common mistakes:

  • Forgetting to multiply by 0.12 / 0.10 (people often write 12 instead of 0.12)
  • Using .apply() with a function when np.where or pd.cut is much faster
  • Not rounding / converting to int → ugly decimal places

Level 2 – Typical daily tasks

Exercise 3 – Groupby with multiple aggregations

Task:

Group by department and calculate:

  • number of employees
  • average salary (rounded to 0 decimals)
  • median salary
  • highest salary
  • percentage of people earning more than ₹100,000

Show the result sorted by average salary descending.

Try it.

.

.

.

Solution & explanation

Python

Common mistakes:

  • Forgetting to name the aggregated columns → ugly column names
  • Using mean() on boolean → wrong percentage
  • Sorting by wrong column

Exercise 4 – Rank within group

Task:

Add column salary_rank_in_dept — rank within each department (1 = highest salary in that department)

Then show:

  • name
  • department
  • salary
  • salary_rank_in_dept

Sorted first by department (A→Z), then by rank (1 first)

Try it.

.

.

.

Solution & explanation

Python

Important notes:

  • ascending=False → 1 = highest
  • method=’min’ → if two people have same salary → both get rank 1, next gets 3
  • Very common mistake: forgetting groupby() → ranks globally instead of per group

Level 3 – More realistic & slightly harder

Exercise 5 – Missing values + group-based imputation

Task:

Make some salaries missing:

Python

Now fill the missing salaries with the median salary of their own department. If the department has no valid salaries → use the overall median.

Show the DataFrame after filling (only name, department, salary columns).

Try it.

.

.

.

Solution & explanation

Python

Alternative (one-liner style):

Python

Exercise 6 – Top N per group (advanced version)

Task:

Show the top 2 highest-paid employees per department (If a department has fewer than 2 people → show all)

Columns to show: name, department, salary, rank_in_dept

Try it.

.

.

.

Solution & explanation

Python

Alternative method (using nlargest)

Python

What next?

Try at least 4–5 exercises before looking at all solutions.

Then come back and tell me:

  • Which exercises felt easy?
  • Which ones were difficult / gave you errors?
  • Did any solution surprise you?
  • Do you want more exercises on a specific topic (groupby, merging, missing values, plotting, time series, etc.)?

I’m here — we can go deeper on whatever you find challenging. 😊

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *