Chapter 21: Pandas Syllabus

Pandas Syllabus — written exactly as I would explain it to a student or junior colleague who wants to go from “I know almost nothing about pandas” to “I can confidently use pandas in real data work”.

This is not a list of functions. This is a learning path — realistic order, realistic priorities, realistic time estimates (assuming you spend 5–15 hours per week), and why each topic comes at that moment.

Pandas Learning Syllabus – From Zero to Confidently Useful (2025–2026)

Total realistic time: 8–16 weeks (depending on your previous Python knowledge and weekly time investment)

Phase 0 – Before pandas (1–3 days if needed)

Must-have prerequisites (skip if already comfortable)

  • Python basics: lists, dictionaries, loops, functions, if/else
  • What is a Series vs DataFrame (mental model)
  • What is an index (row labels)
  • Difference between .loc[] and .iloc[]
  • Basic NumPy understanding (arrays, vectorized operations)
  • How to use Jupyter / Colab / VS Code notebooks

Goal: Understand that pandas is basically “Excel + SQL + Python arrays” combined.

Phase 1 – First 10–20 hours – Core survival kit

(You should be able to do ~70–80% of daily tasks after this)

Week 1–2

  1. Creating & inspecting DataFrames (2–4 h)
    • pd.DataFrame() from dict, list of dicts, list of lists
    • pd.read_csv(), pd.read_excel(), pd.read_json()
    • df.head(), df.tail(), df.shape, df.columns, df.dtypes, df.info()
    • df.describe(), df.describe(include=’object’)
    • df.sample(10), df.nunique(), df.value_counts()
  2. Selecting data – the 4 most important patterns (4–6 h)
    • df[‘column’], df[[‘col1′,’col2’]]
    • df.loc[], df.iloc[]
    • Boolean indexing: df[df[‘age’] > 30]
    • Combining conditions: & | ~ .isin() .str.contains() .between()
    • Chaining: df[df[‘city’]==’Pune’][[‘name’,’salary’]].sort_values(…)
  3. Creating & modifying columns (4–6 h)
    • Simple: df[‘new’] = df[‘old’] * 1.1
    • np.where() / np.select()
    • .map() + dictionary
    • .apply() + lambda or custom function
    • String methods: .str.lower(), .str.strip(), .str.replace(), .str.split()
    • Date methods: pd.to_datetime(), .dt.year, .dt.month, .dt.day_name()
  4. Sorting & basic ranking (2–3 h)
    • df.sort_values(), multiple columns
    • df.rank() – global & within group

Mini-project at end of Phase 1 Clean a small messy CSV → filter → create bonus column → group by city → bar plot of average salary

Phase 2 – Weeks 3–5 – The heart of pandas (where real power lives)

Core intermediate skills – you will use these every day

  1. Missing values (NaN / None / pd.NA) (5–8 h)
    • Detection: isna(), notna(), isna().sum(), isna().mean()*100
    • Dropping: dropna(), dropna(subset=…), how=’all’, thresh=
    • Filling: fillna(value), fillna(method=’ffill’/’bfill’), fillna with dict
    • Group-aware: groupby + transform + fillna
    • Mark missing as category: fillna(‘Unknown’)
  2. GroupBy – the most important skill after selection (8–12 h)
    • df.groupby(‘col’)[‘num’].mean() / sum() / count() / nunique()
    • Multiple aggregations: .agg({‘col1′:’mean’, ‘col2’:[‘min’,’max’]})
    • Named aggregations: .agg(avg_salary=(‘salary’,’mean’), …)
    • groupby + transform (broadcast group stat to every row)
    • groupby + apply (flexible but slower)
    • groupby + cumsum(), cummax(), pct_change()
    • groupby + rank(), shift(), diff()
  3. Crosstab & pivot_table (3–5 h)
    • pd.crosstab() – count / sum / mean / normalize
    • df.pivot_table() – values, index, columns, aggfunc, margins
    • When to use pivot_table vs groupby

Mini-project Phase 2 Load real-ish sales data → clean missing → group by month & region → pivot table → heatmap of revenue

Phase 3 – Weeks 6–9 – Connecting & transforming data

  1. Merging & joining tables (6–10 h)
    • pd.merge(how=’left’/’inner’/’right’/’outer’)
    • on vs left_on / right_on
    • suffixes, indicator=True
    • merge_asof (nearest match – very useful)
    • pd.concat() – axis=0 (rows) vs axis=1 (columns)
    • Combining multiple files: glob + loop + concat
  2. Duplicates & deduplication (3–5 h)
    • df.duplicated(), keep=’first’/’last’/False
    • duplicated(subset=…)
    • drop_duplicates()
    • drop_duplicates + keep most recent / most complete row
  3. String & categorical cleaning (4–7 h)
    • .str methods (contains, startswith, replace, split, extract)
    • pd.Categorical – memory & order
    • Fuzzy matching basics (optional: fuzzywuzzy / rapidfuzz)
  4. Time series basics (5–8 h)
    • pd.to_datetime(), format=, errors=’coerce’
    • .dt accessor
    • resample(‘M’), .asfreq(), .interpolate()
    • rolling() & expanding() windows
    • shift(), diff(), pct_change()

Mini-project Phase 3 Merge employees + departments + sales → clean → create monthly revenue trend per department → line plot

Phase 4 – Weeks 10–14 – Analysis & communication level

  1. Advanced filtering & conditions (3–5 h)
    • idxmax / idxmin
    • nlargest / nsmallest
    • query() method
    • isin with list or Series
    • between_time / at_time
  2. Statistical summaries & correlations (4–7 h)
    • corr(), corr(method=’spearman’)
    • cov()
    • value_counts(normalize=True)
    • quantile, describe(percentiles=…)
  3. Visualization with pandas + seaborn/matplotlib (8–12 h)
    • .plot() shortcuts: line, bar, hist, scatter, box
    • seaborn: heatmap, pairplot, catplot, relplot
    • Multiple subplots (plt.subplots)
    • Styling: titles, labels, legends, colors, annotations
  4. Export & reporting (3–5 h)
    • to_csv, to_excel, to_json
    • to_clipboard
    • df.style (background_gradient, bar, highlight_max)

Capstone project Phase 4 Take a real public dataset (Kaggle/ourworldindata/etc.) → clean → explore → groupby/pivot → create 4–6 insightful visualizations → write short summary

Suggested learning order & time allocation (realistic)

Phase Duration Focus Goal after phase
0 1–3 days Python + notebook basics Comfortable starting notebooks
1 1–3 weeks Create, select, filter, new cols Read CSV → clean a bit → filter → simple plot
2 2–4 weeks GroupBy + missing values Group, aggregate, fill missing intelligently
3 3–5 weeks Merge, concat, time series Combine tables, handle dates, rolling stats
4 3–6 weeks Analysis + visualization Tell stories with data, nice-looking reports

Final advice from a teacher perspective

  • Do not try to memorize every function
  • Do memorize the 4–5 most common patterns you will write 1000×
  • Practice on real-ish data — not just perfect toy examples
  • After every 2–3 exercises → explain what you did to someone (even to yourself out loud)
  • When you get stuck → search error message + pandas (Stack Overflow is still king)

Which part of the syllabus do you want to start with right now? Or would you like me to give you:

  • detailed exercises for Phase 1
  • a full mini-project with guided steps
  • a “cheat sheet” of the 20 most important patterns

Just tell me where you want to begin or what feels most urgent — we’ll go deep together. 😊

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *