Chapter 47: R Median
1. What the Median Actually Is (Intuitive Explanation)
The median is:
the middle value in a list of numbers after you sort them from smallest to largest.
- If you have an odd number of values → the median is the one right in the center.
- If you have an even number → the median is the average of the two middle values.
Real-life feeling Imagine 7 friends stand in a line ordered by how much they spent on food delivery last month:
₹1,800 – ₹2,900 – ₹3,200 – ₹3,800 – ₹4,100 – ₹4,500 – ₹12,500
→ The median is the 4th person: ₹3,800 Even if one friend ordered a huge party catering for ₹12,500, the median does not move much — it still represents what the typical person spent.
Compare that to the mean: (1800 + 2900 + 3200 + 3800 + 4100 + 4500 + 12500) / 7 ≈ ₹5,114 → The one big order pulled the mean up by more than ₹1,300 — it no longer feels like a “typical” value.
This is why the median is called robust / resistant to outliers.
2. Exact Definition & How to Calculate It by Hand
Step-by-step:
- Sort the numbers from smallest to largest
- Find the position of the middle value:
- If n is odd → position = (n + 1) / 2
- If n is even → average of positions n/2 and (n/2 + 1)
Examples:
Odd count (7 values) Sorted: 1800, 2900, 3200, 3800, 4100, 4500, 12500 Middle position: (7 + 1)/2 = 4th value → 3800
Even count (6 values) Sorted: 2900, 3200, 3800, 4100, 4500, 12500 Middle positions: 3rd and 4th → (3800 + 4100)/2 = 3950
3. How R Calculates the Median (Hands-on)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# Simple vector expenses <- c(1800, 2900, 3200, 3800, 4100, 4500, 12500) median(expenses) # 3800 # With missing values (common real case) expenses_na <- c(expenses, NA, NA, 3400) median(expenses_na) # NA ← default behavior median(expenses_na, na.rm = TRUE) # 3800 |
2026 golden rule (same as mean): Always write median(x, na.rm = TRUE) unless you intentionally want to know that there are missing values.
4. Real Hyderabad Examples – When Median Wins Big
Example A – Apartment rent in Hyderabad (2026)
|
0 1 2 3 4 5 6 7 8 9 |
rents_1bhk <- c(16500, 17800, 19200, 18500, 45000, 21000, 19800, 62000, 17500, 18800) mean(rents_1bhk) # 33,110 ₹ ← looks expensive median(rents_1bhk) # 19,000 ₹ ← much closer to what most people actually pay |
→ Real estate portals, government reports, and news articles in India almost always report median rent / house price — never mean — for exactly this reason.
Example B – Delivery time from Swiggy/Zomato (minutes)
|
0 1 2 3 4 5 6 7 8 9 |
delivery_times <- c(28, 35, 42, 31, 29, 38, 120, 33, 40, 37) mean(delivery_times) # 43.3 min ← pulled up by one very late order median(delivery_times) # 35.5 min ← much more representative |
→ Food delivery companies usually advertise median delivery time in their reports.
Example C – Exam marks in a large class (skewed left or right)
|
0 1 2 3 4 5 6 7 8 9 |
marks <- c(92, 88, 85, 78, 95, 68, 82, 91, 45, 89, 76, 84, 55) mean(marks) # 79.08 median(marks) # 84 |
→ Here median is higher because there are a few low outliers pulling the mean down.
5. When to Choose Median Over Mean (Practical 2026 Guide)
Use median when:
- Data is skewed (right-skewed: income, house price, time-to-delivery, time-to-failure)
- There are outliers or extreme values that are real but not typical
- You want to report what is typical / representative for most people
- Data is ordinal (rankings, Likert scales: 1–5)
- You are reporting to non-technical audience (journalists, managers, general public)
Use mean when:
- Data is symmetric / bell-shaped (no strong skew)
- No extreme outliers (or you already removed/winsorized them)
- You need the value for further calculations (variance, standard deviation, many statistical formulas require the mean)
- You are doing physics / engineering averages (temperature, speed, voltage)
Both together is often best Many real reports say: “Mean = ₹33,110 (skewed by luxury apartments), Median = ₹19,000 (more typical rent)”
6. Your Mini Practice Right Now (Copy → Run & Experiment)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# Your own realistic dataset – monthly Ola/Uber rides cost rides <- c(380, 420, 350, 480, 410, 5200, 390, 450, 370, 460) cat("Mean ride cost: ₹", round(mean(rides, na.rm = TRUE), 0), "\n") cat("Median ride cost: ₹", median(rides, na.rm = TRUE), "\n") # Now remove the one very expensive ride (maybe airport trip) rides_clean <- rides[rides < 1000] cat("\nAfter removing outlier:\n") cat("Mean: ₹", round(mean(rides_clean), 0), "\n") cat("Median: ₹", median(rides_clean), "\n") |
Now try these changes and observe:
- Add three more very high rides (₹4,800, ₹5,200, ₹6,100) → see mean explode
- Add ten rides all costing exactly ₹420 → see how median and mode behave
- Make data left-skewed (many high values, few low) → see mean < median
You just witnessed the core difference with your own numbers.
Clearer now?
Next logical questions?
- Want to learn quartiles, percentiles, IQR next (they are all based on median thinking)?
- See variance & standard deviation (which depend on mean)?
- Compare mean/median in real built-in data sets (iris, mtcars, diamonds)?
- Or jump to first real statistical test involving means (t-test)?
Just tell me — whiteboard is ready! 📊🧮🚀
