Chapter 45: Mean Median and Mode
Mean Median and Mode.
These three words appear in almost every statistics textbook, every data analysis course, every job interview question, and every data science resume. Yet many people use them without really understanding when each one is appropriate — and more importantly — when one of them is completely misleading.
I’m going to explain them like your patient offline teacher who actually analyzes real data every day: slowly, with intuition first, then formulas, then real examples (including Hyderabad-flavored ones), and finally a clear decision guide you can keep forever.
1. The Big Picture – Why do we need three different “averages”?
All three try to answer the same question:
“What is a typical / central / representative value in this data?”
But they answer it in three completely different ways — and each is best in different situations.
| Measure | What it actually is | Sensitive to extreme values? | Best when data is… | Common nickname |
|---|---|---|---|---|
| Mean | Arithmetic average (sum ÷ count) | Very sensitive (outliers pull it) | Roughly symmetric, no crazy outliers | “Average” |
| Median | Middle value when data is ordered | Not sensitive at all | Skewed, has outliers, ordinal data | “Middle value” |
| Mode | Most frequent value(s) | Not sensitive | Categorical data, multimodal patterns | “Most common value” |
2. Mean (Arithmetic Mean / Average)
Definition Mean = (sum of all values) ÷ (number of values)
Intuition “Take all the values, add them up, and divide by how many there are.”
Formula For numbers x₁, x₂, …, xₙ Mean (μ or x̄) = (x₁ + x₂ + … + xₙ) / n
Real example – Hyderabad monthly expenses (2026)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
expenses <- c(18500, 22000, 19800, 45000, 21000, 17500, 20200) # Calculate mean mean(expenses) # 23428.57 # In nice format cat("Average monthly expense:", round(mean(expenses), 0), "₹\n") |
Problem with mean Look at the number 45000 — that’s probably one month with a big medical bill or wedding expense. That single value pulls the mean up a lot — even though most months are around 18–22k.
→ Mean is very sensitive to outliers.
3. Median (Middle Value)
Definition The value right in the middle when you sort the data from smallest to largest.
How to find it
- Odd number of observations → middle one
- Even number → average of two middle ones
Intuition “Line up everyone by height — the median height is the height of the person in the middle of the line.”
Same expenses example
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
# Sort the expenses sort(expenses) # 17500 18500 19800 20200 21000 22000 45000 median(expenses) # 20200 cat("Median monthly expense:", median(expenses), "₹\n") |
→ Even with that crazy 45,000 outlier, the median stays very close to what most months actually look like.
When to prefer median over mean (very common rule)
- Income / salary data (few billionaires pull mean way up)
- House prices
- Time to complete tasks (few very slow people)
- Skewed distributions
- When there are obvious outliers
4. Mode (Most Frequent Value)
Definition The value that appears most often in the data set.
Important facts
- Can have no mode (all values appear once)
- Can have one mode (unimodal)
- Can have two modes (bimodal)
- Can have many modes (multimodal)
Example – Favorite biryani type in Hyderabad survey
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
biryani_type <- c("Hyderabadi", "Lucknowi", "Ambur", "Hyderabadi", "Kolkata", "Hyderabadi", "Ambur", "Hyderabadi", "Lucknowi", "Hyderabadi") # Most frequent table(biryani_type) # Ambur Hyderabadi Kolkata Lucknowi # 2 5 1 2 # Mode = Hyderabadi (appears 5 times) |
In R – base way
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
# Simple way (not perfect) names(which.max(table(biryani_type))) # "Hyderabadi" # Better – using DescTools or custom function # install.packages("DescTools") library(DescTools) Mode(biryani_type) # Hyderabadi |
When mode is most useful
- Categorical / nominal data (colors, brands, cities, food preferences)
- Finding most common answer in surveys
- Detecting multiple clusters (bimodal / multimodal data)
5. Side-by-Side Comparison – Same Data, Three Answers
Let’s take realistic Hyderabad apartment rent data (monthly, 1BHK, 2026)
|
0 1 2 3 4 5 6 7 8 9 10 |
rents <- c(18500, 19200, 17800, 45000, 19500, 18200, 20100, 18800, 19000, 62000) mean(rents) # 33820 ← pulled up by two luxury apartments median(rents) # 19100 ← much more representative of typical rent Mode(rents) # no mode (all unique) |
Conclusion from this data
- If you tell a friend “average rent is ~34k”, they will think you’re living in a palace
- If you say “median rent is ~19k”, they understand the typical experience
- There is no “most common” rent — every apartment is a bit different
6. Quick Cheat-Sheet You Can Keep Forever
| Situation / Data Type | Best Choice | Why? | Alternative(s) |
|---|---|---|---|
| Roughly symmetric, no strong outliers | Mean | Uses all data points | Median is also fine |
| Skewed distribution / outliers present | Median | Robust to extremes | Mean is misleading |
| Income, house prices, time-to-event | Median | Standard in economics & survival analysis | — |
| Categorical / nominal data | Mode | Only measure that makes sense | — |
| Bimodal data (two peaks) | Mode(s) | Shows the two common clusters | Median & mean confusing |
| Report to non-technical audience | Median | Easier to understand “half are below, half above” | — |
| Need mathematically nice properties | Mean | Used in variance, covariance, many formulas | — |
Your Mini Practice Right Now
Copy this and run — then change the data and see how the three measures behave:
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
# Your monthly pocket money / expenses (fake) values <- c(12000, 14500, 11000, 38000, 13000, 12500, 14000, 11800, 13500, 92000) cat("Mean: ", round(mean(values), 0), "\n") cat("Median:", median(values), "\n") # Mode (if any) freq <- table(values) if (max(freq) > 1) { cat("Mode(s):", names(freq[freq == max(freq)]), "\n") } else { cat("No mode – all values unique\n") } |
Now try these experiments:
- Remove the two big outliers (38000 and 92000) → see how mean jumps closer to median
- Add many people with exactly 14000 → watch mode appear
- Make data strongly right-skewed → see mean >> median
You just saw with your own eyes why we need all three!
Feeling clearer now?
Next logical questions?
- Want to calculate quartiles, percentiles, IQR next?
- Learn variance / standard deviation (why mean is needed there)?
- See how mean/median/mode behave in real data frames (iris, mtcars, diamonds)?
- Or jump to first hypothesis test (t-test on means)?
Just tell me — whiteboard is still clean! 📊🧮🚀
