Chapter 22: Pandas Study Plan
Pandas Study Plan written in the style of a patient human teacher who has taught pandas to many beginners and intermediate learners.
I assume you already know basic Python (lists, dictionaries, loops, functions, if-statements). If any of these feel shaky, tell me — we’ll do a quick Python warm-up first.
Pandas Study Plan – From Beginner to Comfortably Productive
(Realistic timeline: 8–16 weeks depending on your weekly time)
Overall philosophy
- First priority: be able to read, clean, filter, group and understand real data
- Second priority: combine tables and handle dates
- Third priority: produce good-looking tables & visualizations
- Last priority: advanced / niche topics (multi-index, styling, optimization, window functions…)
You don’t need to learn everything — you need to become dangerously good at the 20% that solves 80% of real problems.
Phase 0 – Preparation (1–4 days)
Must-have before serious pandas
- Comfortable with Jupyter / Colab / VS Code notebooks
- Know how to:
- import pandas as pd
- import numpy as np
- pd.set_option(‘display.max_columns’, 100)
- pd.set_option(‘display.float_format’, ‘{:,.2f}’.format)
- Understand these concepts (quick mental model):
- Series = one column
- DataFrame = table (many Series with shared index)
- Index = row labels (numbers, dates, names…)
- .loc[] = label-based
- .iloc[] = position-based
Quick readiness test (run in a notebook):
|
0 1 2 3 4 5 6 7 8 9 10 |
import pandas as pd import numpy as np print(pd.__version__) # should be 2.0+ in 2025–2026 print(np.__version__) |
If this works → you’re ready.
Phase 1 – Survival & Daily Basics (Weeks 1–3)
Goal: Be able to read CSV/Excel → clean a little → filter → group → make simple plots
Core topics & recommended time
| # | Topic | Time estimate | Key things to master | Practice goal after this topic |
|---|---|---|---|---|
| 1 | Creating & reading DataFrames | 4–7 h | dict → DataFrame, list of dicts, read_csv, read_excel, read_json, head/tail/info/dtypes | Read any CSV and understand its structure in < 1 minute |
| 2 | Selecting & filtering | 8–12 h | df[col], df[[…]], loc/iloc, boolean masks, & | ~, isin, str.contains, query() | Filter complex conditions without getting errors |
| 3 | Creating & transforming columns | 8–12 h | =, np.where, np.select, .map, .apply, str methods, pd.to_datetime + .dt accessor | Create 5–6 useful columns in < 5 minutes |
| 4 | Sorting & basic ranking | 3–5 h | sort_values, rank (global & groupby rank) | Sort any way you want instantly |
| 5 | Handling missing values | 6–10 h | isna().sum(), dropna, fillna, ffill/bfill, groupby + transform fillna | Decide intelligently what to do with NaN |
| 6 | Basic plotting | 5–8 h | .plot(), .plot.bar(), .plot.hist(), seaborn line/scatter/boxplot/heatmap | Make readable plots quickly |
Mini-project after Phase 1 (very important)
- Find a simple CSV (Kaggle: “Titanic”, “Indian startups”, “IPL matches”, “Coffee quality”, etc.)
- Read it
- Check missing values
- Clean obvious problems (missing names → ‘Unknown’, negative ages → NaN, etc.)
- Create 2–3 useful columns (age group, total score, profit margin…)
- Group by 1–2 categories and show count/mean
- Make 2–3 plots
Success criterion: You can clean & explore a new CSV in 30–60 minutes
Phase 2 – The Real Power Tools (Weeks 4–7)
Goal: Handle 90% of daily data wrangling tasks comfortably
| # | Topic | Time estimate | Must master | Practice goal after this topic |
|---|---|---|---|---|
| 7 | GroupBy – basic & advanced | 12–18 h | .groupby().agg(), named aggregation, transform, apply, cumsum/cummax/rank/shift/diff | Replace almost all Excel pivot tables |
| 8 | Pivot table & crosstab | 6–9 h | pivot_table (aggfunc, margins, fill_value), pd.crosstab(normalize, margins) | Create Excel-style summary tables |
| 9 | Merging & joining | 8–12 h | pd.merge (how, on/left_on/right_on, suffixes, indicator), pd.concat (axis=0/1) | Combine 2–4 tables correctly |
| 10 | Duplicates & deduplication | 4–7 h | duplicated(), drop_duplicates (keep, subset), keep=’last’ vs ‘first’ vs False | Clean duplicate rows intelligently |
| 11 | String & categorical cleaning | 6–10 h | str.extract, str.split, str.replace, pd.Categorical, value_counts().head(20) | Standardize messy text columns fast |
| 12 | Basic time series | 8–12 h | pd.to_datetime (format, errors), .dt, resample, rolling/expanding, shift/diff/pct_change | Handle date-based data naturally |
Mini-project Phase 2
- Load multiple related CSVs (example: customers + orders + products)
- Merge them
- Clean missing values & duplicates
- Create monthly revenue per category/region
- Make line + bar combination plot
Success criterion: You can take 2–4 messy tables → produce one clean analysis table + 3–4 insights
Phase 3 – Analysis & Communication Level (Weeks 8–12+)
Goal: Turn data into understandable stories
| # | Topic | Time estimate | Must master | Practice goal |
|---|---|---|---|---|
| 13 | Conditional filtering & top-N per group | 5–8 h | nlargest/nsmallest, idxmax/idxmin, groupby + head/tail, query() | Find top 5 customers per region quickly |
| 14 | Correlations & basic statistics | 5–8 h | .corr(), .corr(method=’spearman’), .cov(), quantile, skew, kurtosis | Understand relationships in numeric data |
| 15 | Visualization – polished & effective | 12–20 h | seaborn (relplot, catplot, pairplot, heatmap), subplots, annotations, styling | Create publication-quality plots |
| 16 | Export & reporting | 4–7 h | to_csv/excel/json, df.style (bar, background_gradient, highlight), to_clipboard | Produce clean tables for reports |
| 17 | Performance & memory basics (optional) | 5–10 h | category dtype, read_csv (dtype, usecols, nrows), .memory_usage(), chunking | Handle larger files without crashing |
Capstone projects (choose 1–2)
- Sales dashboard style Multiple CSVs → clean → monthly trends → top products/regions → heatmap + line plots
- Student/employee analysis Clean messy data → group by department/city → correlations → boxplots + bar charts
- Time series story Stock prices / COVID data / weather → resample → rolling average → anomaly detection
Suggested weekly rhythm
- 3–5 hours theory + exercises (videos, articles, my explanations)
- 4–8 hours hands-on coding (exercises + mini-projects)
- 1 hour explaining (write what you learned or teach someone)
Recommended resources (2025–2026)
- Free & excellent:
- “10 Minutes to pandas” official guide
- Kaggle Pandas micro-course (free)
- DataCamp / Codecademy pandas tracks (some free parts)
- Corey Schafer pandas playlist (YouTube)
- “Python for Data Analysis” book – Wes McKinney (chapters 5–10)
- Practice datasets:
- Kaggle: Titanic, IPL, Superstore Sales, Indian Startups, Zomato, Airbnb
- Our World in Data, Google Dataset Search
Success checkpoints
After Phase 1 → clean & explore any small CSV in ~45 min After Phase 2 → combine 2–4 tables and summarize intelligently After Phase 3 → produce 1-page report with table + 3–4 clear visualizations
Where would you like to start?
- Detailed Phase 1 exercises right now
- A full mini-project with step-by-step guidance
- Focus on one weak area (missing values, groupby, merging, plotting…)
- A personalized plan based on how many hours/week you can spend
Just tell me what feels most useful — we’ll build from there together. 😊
