Chapter 21: Pandas Syllabus
Pandas Syllabus — written exactly as I would explain it to a student or junior colleague who wants to go from “I know almost nothing about pandas” to “I can confidently use pandas in real data work”.
This is not a list of functions. This is a learning path — realistic order, realistic priorities, realistic time estimates (assuming you spend 5–15 hours per week), and why each topic comes at that moment.
Pandas Learning Syllabus – From Zero to Confidently Useful (2025–2026)
Total realistic time: 8–16 weeks (depending on your previous Python knowledge and weekly time investment)
Phase 0 – Before pandas (1–3 days if needed)
Must-have prerequisites (skip if already comfortable)
- Python basics: lists, dictionaries, loops, functions, if/else
- What is a Series vs DataFrame (mental model)
- What is an index (row labels)
- Difference between .loc[] and .iloc[]
- Basic NumPy understanding (arrays, vectorized operations)
- How to use Jupyter / Colab / VS Code notebooks
Goal: Understand that pandas is basically “Excel + SQL + Python arrays” combined.
Phase 1 – First 10–20 hours – Core survival kit
(You should be able to do ~70–80% of daily tasks after this)
Week 1–2
- Creating & inspecting DataFrames (2–4 h)
- pd.DataFrame() from dict, list of dicts, list of lists
- pd.read_csv(), pd.read_excel(), pd.read_json()
- df.head(), df.tail(), df.shape, df.columns, df.dtypes, df.info()
- df.describe(), df.describe(include=’object’)
- df.sample(10), df.nunique(), df.value_counts()
- Selecting data – the 4 most important patterns (4–6 h)
- df[‘column’], df[[‘col1′,’col2’]]
- df.loc[], df.iloc[]
- Boolean indexing: df[df[‘age’] > 30]
- Combining conditions: & | ~ .isin() .str.contains() .between()
- Chaining: df[df[‘city’]==’Pune’][[‘name’,’salary’]].sort_values(…)
- Creating & modifying columns (4–6 h)
- Simple: df[‘new’] = df[‘old’] * 1.1
- np.where() / np.select()
- .map() + dictionary
- .apply() + lambda or custom function
- String methods: .str.lower(), .str.strip(), .str.replace(), .str.split()
- Date methods: pd.to_datetime(), .dt.year, .dt.month, .dt.day_name()
- Sorting & basic ranking (2–3 h)
- df.sort_values(), multiple columns
- df.rank() – global & within group
Mini-project at end of Phase 1 Clean a small messy CSV → filter → create bonus column → group by city → bar plot of average salary
Phase 2 – Weeks 3–5 – The heart of pandas (where real power lives)
Core intermediate skills – you will use these every day
- Missing values (NaN / None / pd.NA) (5–8 h)
- Detection: isna(), notna(), isna().sum(), isna().mean()*100
- Dropping: dropna(), dropna(subset=…), how=’all’, thresh=
- Filling: fillna(value), fillna(method=’ffill’/’bfill’), fillna with dict
- Group-aware: groupby + transform + fillna
- Mark missing as category: fillna(‘Unknown’)
- GroupBy – the most important skill after selection (8–12 h)
- df.groupby(‘col’)[‘num’].mean() / sum() / count() / nunique()
- Multiple aggregations: .agg({‘col1′:’mean’, ‘col2’:[‘min’,’max’]})
- Named aggregations: .agg(avg_salary=(‘salary’,’mean’), …)
- groupby + transform (broadcast group stat to every row)
- groupby + apply (flexible but slower)
- groupby + cumsum(), cummax(), pct_change()
- groupby + rank(), shift(), diff()
- Crosstab & pivot_table (3–5 h)
- pd.crosstab() – count / sum / mean / normalize
- df.pivot_table() – values, index, columns, aggfunc, margins
- When to use pivot_table vs groupby
Mini-project Phase 2 Load real-ish sales data → clean missing → group by month & region → pivot table → heatmap of revenue
Phase 3 – Weeks 6–9 – Connecting & transforming data
- Merging & joining tables (6–10 h)
- pd.merge(how=’left’/’inner’/’right’/’outer’)
- on vs left_on / right_on
- suffixes, indicator=True
- merge_asof (nearest match – very useful)
- pd.concat() – axis=0 (rows) vs axis=1 (columns)
- Combining multiple files: glob + loop + concat
- Duplicates & deduplication (3–5 h)
- df.duplicated(), keep=’first’/’last’/False
- duplicated(subset=…)
- drop_duplicates()
- drop_duplicates + keep most recent / most complete row
- String & categorical cleaning (4–7 h)
- .str methods (contains, startswith, replace, split, extract)
- pd.Categorical – memory & order
- Fuzzy matching basics (optional: fuzzywuzzy / rapidfuzz)
- Time series basics (5–8 h)
- pd.to_datetime(), format=, errors=’coerce’
- .dt accessor
- resample(‘M’), .asfreq(), .interpolate()
- rolling() & expanding() windows
- shift(), diff(), pct_change()
Mini-project Phase 3 Merge employees + departments + sales → clean → create monthly revenue trend per department → line plot
Phase 4 – Weeks 10–14 – Analysis & communication level
- Advanced filtering & conditions (3–5 h)
- idxmax / idxmin
- nlargest / nsmallest
- query() method
- isin with list or Series
- between_time / at_time
- Statistical summaries & correlations (4–7 h)
- corr(), corr(method=’spearman’)
- cov()
- value_counts(normalize=True)
- quantile, describe(percentiles=…)
- Visualization with pandas + seaborn/matplotlib (8–12 h)
- .plot() shortcuts: line, bar, hist, scatter, box
- seaborn: heatmap, pairplot, catplot, relplot
- Multiple subplots (plt.subplots)
- Styling: titles, labels, legends, colors, annotations
- Export & reporting (3–5 h)
- to_csv, to_excel, to_json
- to_clipboard
- df.style (background_gradient, bar, highlight_max)
Capstone project Phase 4 Take a real public dataset (Kaggle/ourworldindata/etc.) → clean → explore → groupby/pivot → create 4–6 insightful visualizations → write short summary
Suggested learning order & time allocation (realistic)
| Phase | Duration | Focus | Goal after phase |
|---|---|---|---|
| 0 | 1–3 days | Python + notebook basics | Comfortable starting notebooks |
| 1 | 1–3 weeks | Create, select, filter, new cols | Read CSV → clean a bit → filter → simple plot |
| 2 | 2–4 weeks | GroupBy + missing values | Group, aggregate, fill missing intelligently |
| 3 | 3–5 weeks | Merge, concat, time series | Combine tables, handle dates, rolling stats |
| 4 | 3–6 weeks | Analysis + visualization | Tell stories with data, nice-looking reports |
Final advice from a teacher perspective
- Do not try to memorize every function
- Do memorize the 4–5 most common patterns you will write 1000×
- Practice on real-ish data — not just perfect toy examples
- After every 2–3 exercises → explain what you did to someone (even to yourself out loud)
- When you get stuck → search error message + pandas (Stack Overflow is still king)
Which part of the syllabus do you want to start with right now? Or would you like me to give you:
- detailed exercises for Phase 1
- a full mini-project with guided steps
- a “cheat sheet” of the 20 most important patterns
Just tell me where you want to begin or what feels most urgent — we’ll go deep together. 😊
