Chapter 2: Pandas Getting Started
Pandas – Getting Started (realistic first day)
Step 0 – What should you expect today?
Today we want to reach this point:
- Know how to install & import pandas
- Understand what a DataFrame and Series are
- Create your first table
- Look at the data in the most useful ways
- Select columns and rows
- Make very simple new columns
- Feel comfortable running 10–15 most common beginner commands
That’s already enough to start playing with real small datasets.
Step 1 – Make sure pandas is available
In Jupyter Notebook / JupyterLab / VS Code / Google Colab
|
0 1 2 3 4 5 6 7 |
# Run this once — usually already installed in Colab / many data environments !pip install pandas |
In your own computer (terminal / command prompt / powershell)
|
0 1 2 3 4 5 6 7 8 9 10 |
pip install pandas # or pip3 install pandas # or (if you use conda) conda install pandas |
Then in Python:
|
0 1 2 3 4 5 6 7 8 9 |
import pandas as pd print(pd.__version__) # good habit to check version # Example output: 2.2.2 or 2.1.4 or newer in 2026 |
Almost everyone uses the short name pd — just follow this convention.
Step 2 – Our very first DataFrame (the most common way to start)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# Small table of students — very typical first example students = pd.DataFrame({ 'name': ['Aarav', 'Diya', 'Rohan', 'Isha', 'Vihaan'], 'age': [20, 19, 21, 18, 22], 'city': ['Delhi', 'Mumbai', 'Bangalore', 'Chennai', 'Pune'], 'marks': [78, 92, 65, 88, 71], 'grade': ['B', 'A', 'C', 'A', 'B'] }) # Show it! students |
You will see something like this:
|
0 1 2 3 4 5 6 7 8 9 10 11 |
name age city marks grade 0 Aarav 20 Delhi 78 B 1 Diya 19 Mumbai 92 A 2 Rohan 21 Bangalore 65 C 3 Isha 18 Chennai 88 A 4 Vihaan 22 Pune 71 B |
This is a pandas DataFrame — your main working object.
Step 3 – The 8 most important things to check first (do this every time!)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
# 1. How many rows and columns? students.shape # → (5, 5) # 2. Just the column names students.columns # → Index(['name', 'age', 'city', 'marks', 'grade'], dtype='object') # 3. What type is each column? students.dtypes # 4. The most useful single command for beginners students.info() # Typical output: """ <class 'pandas.core.frame.DataFrame'> RangeIndex: 5 entries, 0 to 4 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 5 non-null object 1 age 5 non-null int64 2 city 5 non-null object 3 marks 5 non-null int64 4 grade 5 non-null object dtypes: int64(2), object(3) memory usage: 328.0+ bytes """ |
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# 5. Quick statistics for numeric columns students.describe() # 6. For text/categorical columns students.describe(include='object') # 7. How many different values in a column? students['city'].value_counts() # 8. How many unique values? students['grade'].nunique() # → 3 |
Tip: students.info() and students.head() are the two commands you will run most often in your entire pandas life.
Step 4 – Selecting data – the four most common patterns
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
# Pattern 1: One column → gives you a Series students['marks'] # Pattern 2: Multiple columns → gives you a DataFrame students[['name', 'marks', 'city']] # Pattern 3: First few rows students.head(3) # first 3 rows students.tail(2) # last 2 rows # Pattern 4: Filter rows with condition (very important!) students[students['marks'] >= 80] # More realistic filters you will write many times: students[students['age'] <= 20] students[students['grade'] == 'A'] students[students['city'].isin(['Delhi', 'Mumbai', 'Pune'])] |
Very common combination:
|
0 1 2 3 4 5 6 7 |
# Only names and marks of students who scored 80+ students[students['marks'] >= 80][['name', 'marks']] |
Step 5 – Creating new columns (first magic moment)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# 1. Very simple calculation students['marks_plus_5'] = students['marks'] + 5 # 2. Boolean column (True/False) students['passed'] = students['marks'] >= 70 # 3. Using conditions (most useful way for beginners) import numpy as np students['result'] = np.where(students['marks'] >= 80, 'Excellent', np.where(students['marks'] >= 70, 'Good', 'Needs work')) # 4. Simple percentage students['percent'] = (students['marks'] / 100 * 100).round(1) |
Now your table has more columns — this is how you grow your data step by step.
Step 6 – Sorting (very satisfying)
|
0 1 2 3 4 5 6 7 8 9 10 |
# Highest marks first students.sort_values('marks', ascending=False) # Sort by city, then inside each city sort by marks descending students.sort_values(['city', 'marks'], ascending=[True, False]) |
Step 7 – Tiny first summary with groupby
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# Average marks per city students.groupby('city')['marks'].mean() # Count of students per grade students['grade'].value_counts() # A bit nicer version students.groupby('grade').agg( count_students = ('name', 'count'), avg_marks = ('marks', 'mean') ).round(1) |
Even this simple groupby already gives you real insight.
Your Day-1 Cheat Sheet – Commands you should be able to write from memory
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
import pandas as pd # Create df = pd.DataFrame({...}) # Look df.head(6) df.tail() df.shape df.columns df.dtypes df.info() # Select df['marks'] df[['name','marks']] df[df['marks'] > 75] df[df['city'].isin(['Delhi','Mumbai'])] # New column df['new'] = df['marks'] * 1.1 df['good'] = df['marks'] >= 80 # Sort df.sort_values('marks', ascending=False) # Simple group df.groupby('city')['marks'].mean() |
Small practice task you can do right now (5–10 minutes)
- Copy the students table code
- Add 2–3 more students (add your friends or family names)
- Add a new column called attendance (values between 65–100)
- Create a column final_score = (marks + attendance) / 2
- Show only students with final_score ≥ 80
- Sort them by final_score descending
Try it — then come back and tell me what you got, or ask what went wrong.
Where do you want to go next?
- Learn how to read CSV files (most common next step)
- Practice more filtering with many conditions
- Understand Series vs DataFrame more clearly
- Start playing with missing values (NaN)
- Try your first real small dataset together
- Go one step deeper into groupby
Just tell me which direction feels right for you right now. I’ll keep explaining slowly and with examples. 😊
