Chapter 22: Pandas Study Plan

Pandas Study Plan written in the style of a patient human teacher who has taught pandas to many beginners and intermediate learners.

I assume you already know basic Python (lists, dictionaries, loops, functions, if-statements). If any of these feel shaky, tell me — we’ll do a quick Python warm-up first.

Pandas Study Plan – From Beginner to Comfortably Productive

(Realistic timeline: 8–16 weeks depending on your weekly time)

Overall philosophy

  • First priority: be able to read, clean, filter, group and understand real data
  • Second priority: combine tables and handle dates
  • Third priority: produce good-looking tables & visualizations
  • Last priority: advanced / niche topics (multi-index, styling, optimization, window functions…)

You don’t need to learn everything — you need to become dangerously good at the 20% that solves 80% of real problems.

Phase 0 – Preparation (1–4 days)

Must-have before serious pandas

  1. Comfortable with Jupyter / Colab / VS Code notebooks
  2. Know how to:
    • import pandas as pd
    • import numpy as np
    • pd.set_option(‘display.max_columns’, 100)
    • pd.set_option(‘display.float_format’, ‘{:,.2f}’.format)
  3. Understand these concepts (quick mental model):
    • Series = one column
    • DataFrame = table (many Series with shared index)
    • Index = row labels (numbers, dates, names…)
    • .loc[] = label-based
    • .iloc[] = position-based

Quick readiness test (run in a notebook):

Python

If this works → you’re ready.

Phase 1 – Survival & Daily Basics (Weeks 1–3)

Goal: Be able to read CSV/Excel → clean a little → filter → group → make simple plots

Core topics & recommended time

# Topic Time estimate Key things to master Practice goal after this topic
1 Creating & reading DataFrames 4–7 h dict → DataFrame, list of dicts, read_csv, read_excel, read_json, head/tail/info/dtypes Read any CSV and understand its structure in < 1 minute
2 Selecting & filtering 8–12 h df[col], df[[…]], loc/iloc, boolean masks, & | ~, isin, str.contains, query() Filter complex conditions without getting errors
3 Creating & transforming columns 8–12 h =, np.where, np.select, .map, .apply, str methods, pd.to_datetime + .dt accessor Create 5–6 useful columns in < 5 minutes
4 Sorting & basic ranking 3–5 h sort_values, rank (global & groupby rank) Sort any way you want instantly
5 Handling missing values 6–10 h isna().sum(), dropna, fillna, ffill/bfill, groupby + transform fillna Decide intelligently what to do with NaN
6 Basic plotting 5–8 h .plot(), .plot.bar(), .plot.hist(), seaborn line/scatter/boxplot/heatmap Make readable plots quickly

Mini-project after Phase 1 (very important)

  • Find a simple CSV (Kaggle: “Titanic”, “Indian startups”, “IPL matches”, “Coffee quality”, etc.)
  • Read it
  • Check missing values
  • Clean obvious problems (missing names → ‘Unknown’, negative ages → NaN, etc.)
  • Create 2–3 useful columns (age group, total score, profit margin…)
  • Group by 1–2 categories and show count/mean
  • Make 2–3 plots

Success criterion: You can clean & explore a new CSV in 30–60 minutes

Phase 2 – The Real Power Tools (Weeks 4–7)

Goal: Handle 90% of daily data wrangling tasks comfortably

# Topic Time estimate Must master Practice goal after this topic
7 GroupBy – basic & advanced 12–18 h .groupby().agg(), named aggregation, transform, apply, cumsum/cummax/rank/shift/diff Replace almost all Excel pivot tables
8 Pivot table & crosstab 6–9 h pivot_table (aggfunc, margins, fill_value), pd.crosstab(normalize, margins) Create Excel-style summary tables
9 Merging & joining 8–12 h pd.merge (how, on/left_on/right_on, suffixes, indicator), pd.concat (axis=0/1) Combine 2–4 tables correctly
10 Duplicates & deduplication 4–7 h duplicated(), drop_duplicates (keep, subset), keep=’last’ vs ‘first’ vs False Clean duplicate rows intelligently
11 String & categorical cleaning 6–10 h str.extract, str.split, str.replace, pd.Categorical, value_counts().head(20) Standardize messy text columns fast
12 Basic time series 8–12 h pd.to_datetime (format, errors), .dt, resample, rolling/expanding, shift/diff/pct_change Handle date-based data naturally

Mini-project Phase 2

  • Load multiple related CSVs (example: customers + orders + products)
  • Merge them
  • Clean missing values & duplicates
  • Create monthly revenue per category/region
  • Make line + bar combination plot

Success criterion: You can take 2–4 messy tables → produce one clean analysis table + 3–4 insights

Phase 3 – Analysis & Communication Level (Weeks 8–12+)

Goal: Turn data into understandable stories

# Topic Time estimate Must master Practice goal
13 Conditional filtering & top-N per group 5–8 h nlargest/nsmallest, idxmax/idxmin, groupby + head/tail, query() Find top 5 customers per region quickly
14 Correlations & basic statistics 5–8 h .corr(), .corr(method=’spearman’), .cov(), quantile, skew, kurtosis Understand relationships in numeric data
15 Visualization – polished & effective 12–20 h seaborn (relplot, catplot, pairplot, heatmap), subplots, annotations, styling Create publication-quality plots
16 Export & reporting 4–7 h to_csv/excel/json, df.style (bar, background_gradient, highlight), to_clipboard Produce clean tables for reports
17 Performance & memory basics (optional) 5–10 h category dtype, read_csv (dtype, usecols, nrows), .memory_usage(), chunking Handle larger files without crashing

Capstone projects (choose 1–2)

  1. Sales dashboard style Multiple CSVs → clean → monthly trends → top products/regions → heatmap + line plots
  2. Student/employee analysis Clean messy data → group by department/city → correlations → boxplots + bar charts
  3. Time series story Stock prices / COVID data / weather → resample → rolling average → anomaly detection

Suggested weekly rhythm

  • 3–5 hours theory + exercises (videos, articles, my explanations)
  • 4–8 hours hands-on coding (exercises + mini-projects)
  • 1 hour explaining (write what you learned or teach someone)

Recommended resources (2025–2026)

  • Free & excellent:
    • “10 Minutes to pandas” official guide
    • Kaggle Pandas micro-course (free)
    • DataCamp / Codecademy pandas tracks (some free parts)
    • Corey Schafer pandas playlist (YouTube)
    • “Python for Data Analysis” book – Wes McKinney (chapters 5–10)
  • Practice datasets:
    • Kaggle: Titanic, IPL, Superstore Sales, Indian Startups, Zomato, Airbnb
    • Our World in Data, Google Dataset Search

Success checkpoints

After Phase 1 → clean & explore any small CSV in ~45 min After Phase 2 → combine 2–4 tables and summarize intelligently After Phase 3 → produce 1-page report with table + 3–4 clear visualizations

Where would you like to start?

  • Detailed Phase 1 exercises right now
  • A full mini-project with step-by-step guidance
  • Focus on one weak area (missing values, groupby, merging, plotting…)
  • A personalized plan based on how many hours/week you can spend

Just tell me what feels most useful — we’ll build from there together. 😊

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *