Chapter 46: R Mean
1. What the Mean Actually Is (Very Simple Intuition)
The mean is what most people think of when they hear the word “average”:
Take all the values, add them up, and divide by how many values there are.
That’s it.
Real-life feeling Imagine five friends go to Paradise Biryani in Hyderabad and spend:
- ₹420
- ₹380
- ₹450
- ₹410
- ₹340
The mean bill = (420 + 380 + 450 + 410 + 340) ÷ 5 = 400 ₹
→ “On average, each person spent 400 rupees.”
That feels fair and representative in this case.
2. The Mathematical Formula (Write This Down Somewhere)
For a set of numbers x₁, x₂, …, xₙ
Mean (μ or x̄) = (x₁ + x₂ + … + xₙ) / n
In R there are two main ways people write it:
|
0 1 2 3 4 5 6 7 8 9 10 |
# Way 1 – classic mean(c(420, 380, 450, 410, 340)) # 400 # Way 2 – most common in real scripts (column of data frame) mean(bill_amount) # when bill_amount is a vector |
3. Real Hyderabad Examples – Different Situations
Example A – Nice symmetric data (mean is perfect)
Monthly pocket money of 8 college friends in 2026:
|
0 1 2 3 4 5 6 7 8 9 |
pocket <- c(12000, 14500, 11000, 13000, 12500, 14000, 11800, 13500) mean(pocket) # 12,912.5 ₹ median(pocket) # 12,750 ₹ ← very close |
→ Mean and median almost the same → safe to say “average pocket money ≈ 13k”
Example B – One big outlier (mean becomes misleading)
Now add one friend whose parents send ₹92,000 once (big gift):
|
0 1 2 3 4 5 6 7 8 9 |
pocket2 <- c(pocket, 92000) mean(pocket2) # 22,236.11 ₹ ← jumps a lot! median(pocket2) # 13,000 ₹ ← barely moves |
→ If you report “average pocket money is 22k”, everyone thinks you’re rich → Truth: most friends are around 12–14k → median tells the real story
Example C – Income example (classic case where mean fails)
Monthly net income of 10 people in a small Hyderabad startup team:
|
0 1 2 3 4 5 6 7 8 9 |
income <- c(48000, 52000, 55000, 62000, 68000, 75000, 82000, 95000, 145000, 1200000) mean(income) # 1,80,500 ₹ ← dominated by CEO/owner median(income) # 70,000 ₹ ← much more realistic for “typical” employee |
→ In India (and almost everywhere), salary/income reports almost always use median, never mean, because of this exact problem.
4. How R Actually Calculates mean() – Important Details
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# Missing values – very common trap scores <- c(85, 92, NA, 78, 88, NA) mean(scores) # NA ← default = any NA → whole result NA mean(scores, na.rm = TRUE) # 85.75 ← almost always what you want # Infinite values weird <- c(1, 2, Inf, 4) mean(weird, na.rm = TRUE) # Inf ← infinity poisons the mean |
2026 rule of thumb Always write mean(x, na.rm = TRUE) unless you have a very specific reason not to.
5. When to Trust the Mean (Quick Decision Guide)
Use mean when:
- Data is roughly symmetric (bell-shaped histogram)
- No extreme outliers (or you removed them already)
- You need a value that uses every data point (important for variance, standard deviation, many formulas)
- You are calculating things like total average revenue, average speed, average temperature where extremes are expected
Do NOT trust mean alone when:
- Data is skewed (income, house prices, time-to-failure, time-to-complete tasks)
- There are obvious outliers (one ₹92,000 gift, one ₹12 lakh bonus)
- Data has long right tail (most values small, few very large)
- You want to report what is typical / representative for most people
→ In these cases → prefer median
6. Your Mini Practice Right Now (Copy → Run & Play)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# Your own small dataset – monthly food delivery expenses delivery <- c(3200, 2800, 4500, 1800, 3600, 5200, 2900, 4100, 3800, 12500) cat("Mean delivery spend: ₹", round(mean(delivery), 0), "\n") cat("Median delivery spend: ₹", median(delivery), "\n") # Now remove the big outlier (maybe one month with party order) delivery_clean <- delivery[delivery < 6000] cat("\nAfter removing outlier:\n") cat("Mean: ₹", round(mean(delivery_clean), 0), "\n") cat("Median: ₹", median(delivery_clean), "\n") |
Now try these changes and watch:
- Add another very high value (₹18,000) → see mean jump again
- Add five people who all spend exactly ₹3,500 → see mode appear
- Make data left-skewed (many high, few low) → see mean < median
You just discovered with your own eyes why statistics teachers keep repeating: “Mean is sensitive, median is robust”
Feeling clearer?
Next logical steps?
- Want to calculate quartiles, percentiles, IQR next?
- Learn variance & standard deviation (they need mean)?
- See mean/median in real data frames (iris, mtcars, diamonds)?
- Or jump to first real statistical test that compares means (t-test)?
Just tell me — whiteboard is ready! 📊🧮🚀
