Chapter 16: Pandas Plotting
Pandas plotting tutorial — written as if we are sitting together, I’m showing you the code line by line, explaining why we do things a certain way, what is realistic in real projects, and what beginners usually get wrong.
We will go from very simple to more useful & polished plots step by step.
0. Mindset before we start
Good plots in data analysis are communication tools, not just decoration.
Before writing any .plot() code, always ask yourself these three questions:
- What exact question should this plot answer?
- Who is going to look at it? (myself, colleague, manager, presentation, report)
- What should be the most obvious thing someone sees in 3 seconds?
Common realistic goals:
- Show trend over time → line plot
- Compare categories → bar / column plot
- Show relationship between two numbers → scatter
- Show distribution / spread → histogram, boxplot, violin
- Show composition / proportions → stacked bar, pie (use carefully)
1. Very first simple plot (what everyone tries first)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
import pandas as pd import numpy as np import matplotlib.pyplot as plt # Small example dataset df = pd.DataFrame({ 'day': ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'], 'temperature': [24.5, 26.1, 23.8, 27.4, 25.9, 28.2, 22.7], 'rain_mm': [0, 1.2, 8.7, 0, 0.3, 0, 12.4], 'visitors': [120, 145, 80, 190, 165, 210, 95] }) # Simplest possible plot df['temperature'].plot(title="Temperature this week") plt.show() |
What happens? → You get a line plot, but x-axis is just 0,1,2,3,… → Not very readable.
2. Better: use index meaningfully
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
# Set meaningful index df = df.set_index('day') # Now it looks better df['temperature'].plot( title="Daily Temperature This Week", ylabel="Temperature (°C)", color="darkred", marker="o", linestyle="--", linewidth=2 ) plt.grid(True, alpha=0.3) plt.tight_layout() plt.show() |
3. Most common realistic plots – one by one
A. Line plot – evolution over time (most frequent use case)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
# Create time-based data dates = pd.date_range(start='2025-01-01', periods=60, freq='D') sales = pd.DataFrame({ 'date': dates, 'north': np.random.randint(40, 140, 60).cumsum(), 'south': np.random.randint(30, 120, 60).cumsum(), 'east': np.random.randint(25, 110, 60).cumsum() }) sales = sales.set_index('date') # Plot multiple lines plt.figure(figsize=(12, 6)) sales.plot( linewidth=2.3, marker='o', markersize=5, alpha=0.9 ) plt.title('Cumulative Sales by Region – Jan-Feb 2025', fontsize=14, pad=15) plt.ylabel('Cumulative Revenue (₹ thousands)', fontsize=12) plt.xlabel('Date', fontsize=12) plt.legend(title='Region', fontsize=10, title_fontsize=11) plt.grid(True, alpha=0.25, linestyle='--') plt.tight_layout() plt.show() |
B. Bar plot – comparing categories
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
# Total sales per product product_sales = sales.groupby('product')['revenue'].sum().sort_values(ascending=False) plt.figure(figsize=(10, 6)) product_sales.plot.bar( color=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728'], edgecolor='black', linewidth=1.1, width=0.72 ) plt.title('Total Revenue by Product – Q1 2025', fontsize=14) plt.ylabel('Revenue (₹)', fontsize=12) plt.xlabel('Product Category', fontsize=12) plt.xticks(rotation=0, fontsize=11) plt.grid(axis='y', alpha=0.25, linestyle=':') # Add value labels on top of bars for i, v in enumerate(product_sales): plt.text(i, v + max(product_sales)*0.015, f'₹{v:,.0f}', ha='center', fontsize=10, fontweight='bold') plt.tight_layout() plt.show() |
C. Scatter plot – relationship between two variables
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
plt.figure(figsize=(10, 7)) plt.scatter( x=sales['customer_rating'], y=sales['discount_%'], s=sales['units_sold'] * 4, # bubble size c=sales['revenue'] / 1000, # color by revenue cmap='viridis', alpha=0.65, edgecolors='black', linewidth=0.6 ) plt.colorbar(label='Revenue (thousands ₹)') plt.title('Customer Rating vs Discount %\n(size = units sold, color = revenue)', fontsize=13) plt.xlabel('Average Customer Rating', fontsize=12) plt.ylabel('Discount Percentage', fontsize=12) plt.grid(True, alpha=0.3, linestyle='--') plt.tight_layout() plt.show() |
D. Histogram – distribution of one variable
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
plt.figure(figsize=(11, 5)) # Two styles side by side plt.subplot(1, 2, 1) sales['customer_rating'].plot.hist( bins=18, color='cornflowerblue', edgecolor='black', title='Histogram – Customer Ratings' ) plt.subplot(1, 2, 2) sns.histplot( data=sales, x='customer_rating', bins=18, kde=True, color='teal', stat='density' ) plt.title('Histogram + KDE – Customer Ratings') plt.tight_layout() plt.show() |
E. Boxplot – compare distributions across groups
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
plt.figure(figsize=(11, 6)) sns.boxplot( data=sales, x='region', y='customer_rating', hue='product', palette='Set2', width=0.75, fliersize=5, linewidth=1.4 ) plt.title('Customer Rating Distribution by Region & Product', fontsize=14) plt.ylabel('Customer Rating (1–5)', fontsize=12) plt.xlabel('Sales Region', fontsize=12) plt.legend(title='Product', bbox_to_anchor=(1.02, 1), loc='upper left') plt.grid(True, axis='y', alpha=0.3) plt.tight_layout() plt.show() |
Summary – Which plot when? (quick cheat sheet)
| Goal | Recommended plot type | pandas / seaborn command example |
|---|---|---|
| Trend over time | Line | df.plot() or sns.lineplot() |
| Compare categories | Bar / Column | .plot.bar() or sns.barplot() |
| Relationship 2 variables | Scatter / Bubble | .plot.scatter() or sns.scatterplot() |
| Distribution / shape | Histogram + KDE | .plot.hist() + sns.histplot(kde=True) |
| Compare distributions | Boxplot / Violin | sns.boxplot() / sns.violinplot() |
| Correlation overview | Heatmap | sns.heatmap(df.corr(), annot=True) |
| Composition / proportions | Stacked bar / Pie (use carefully) | .plot.bar(stacked=True) or .plot.pie() |
Your turn – small practice tasks
Try these on the sales DataFrame we created:
- Plot average daily revenue per region as a line plot
- Make a bar plot showing total units sold per product
- Create a scatter plot of units_sold vs revenue, colored by customer_rating
- Show boxplots of customer_rating for each region
Which one would you like to try first? Or tell me what kind of data/story you want to visualize — I’ll help you choose and build the right plot step by step.
Where do you want to go next?
- How to make plots look much more professional (titles, legends, colors, fonts…)
- Creating multiple plots in one figure (subplots)
- Saving plots for reports (png, pdf, high resolution)
- Using Seaborn more deeply (themes, palettes, easier statistics)
- Interactive plots with Plotly (zoom, hover, export)
Just tell me what interests you most — we’ll continue slowly and practically. 😊
