Chapter 7: Pandas Analyzing Data

Analyzing DataFrames in pandas (We are sitting together — I’m explaining slowly, showing realistic examples, running small pieces of code, and telling you why we do things this way and what analysts actually do every day.)

Let’s assume we already have a DataFrame loaded and cleaned. Now the real fun begins: understanding what the data is telling us.

Starting point – our example DataFrame

Let’s work with a realistic employee / sales-like dataset:

Python

This is our playground for the whole session.

Phase 1 – Quick first look (things you should do in < 30 seconds)

Python

What we learn quickly from these:

  • How many people, how many columns
  • Any missing values? (rating has 1 missing)
  • Salary range: 62k – 210k
  • Most common department, city, etc.

Phase 2 – Asking simple but powerful questions

Question 1: Who earns the most / least?

Python

Question 2: Average, median, min, max per group

Python

Very useful version with sorting:

Python

Question 3: How many people above / below certain thresholds?

Python

Question 4: When did people join? (time-based analysis)

Python

Phase 3 – Creating useful analysis columns

Python

Now we can ask better questions:

Python

Phase 4 – Most powerful analysis patterns analysts use daily

Pattern 1: Group + multiple aggregations + sorting

Python

Pattern 2: Crosstab (like pivot table for counting)

Python

Pattern 3: Value counts with percentage & sorting

Python

Pattern 4: Conditional aggregation (very powerful)

Python

Phase 5 – Quick checklist – what good analysis almost always includes

  1. Shape, info, head/tail
  2. Missing values → df.isna().sum()
  3. Value counts for categories
  4. Describe() for numbers
  5. Groupby + agg for at least 3–4 metrics
  6. Sorting — highest/lowest usually tell the story
  7. New columns that make interpretation easier (categories, flags, ranks, time parts)
  8. Crosstab or pivot_table when comparing two categories
  9. Percentages, not just counts

Your turn – small analysis exercise

Using the DataFrame above (or create similar):

  1. Find the average salary per city
  2. Show department with highest average rating
  3. Count how many people have salary > department average
  4. Show top 3 most recently joined employees with name, dept, join_date
  5. Create a table showing count + avg salary + % active per department

Try to write the code — then come back and we can compare & improve together.

Where do you want to go next?

  • Deeper into pivot_table vs crosstab vs groupby
  • Time-based analysis (monthly, quarterly, year-over-year)
  • Visualization basics with pandas + matplotlib/seaborn
  • Finding outliers and extreme values
  • Correlation between numeric columns
  • Conditional formatting in Jupyter (pretty tables)

Tell me which topic you want to explore next — I’ll go deep with examples. 😊

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *