Chapter 49: R Percentiles
1. What is a Percentile? (Intuitive Explanation First)
A percentile tells you:
“What value is higher than X% of the other values in the data?”
Examples in plain English:
- The 50th percentile = the median (50% of values are below it, 50% are above)
- The 25th percentile (Q1) = 25% of values are below it
- The 75th percentile (Q3) = 75% of values are below it
- The 90th percentile = only 10% of values are higher than this
Real-life feeling Imagine 100 students take an exam:
- The 90th percentile mark = 92 → Only 10 students scored 92 or higher → 90 students scored less than 92
- The 10th percentile mark = 48 → Only 10 students scored 48 or lower
2. How Percentiles Are Calculated (Step by Step)
There are several mathematical methods to calculate percentiles (R uses one of the most common ones by default).
Basic idea (simplified):
- Sort the data from smallest to largest
- Find the position: position = (percentile / 100) × (n + 1) (n = number of observations)
- If position is integer → take that value
- If position is not integer → interpolate between two closest values
R’s default method (type = 7) is very close to what Excel, SPSS, and most statistical software use.
3. How R Calculates Percentiles (Hands-on)
The main function is quantile()
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# Simple vector – monthly food delivery spend (Hyderabad 2026) spend <- c(1800, 2200, 2900, 3200, 3800, 4100, 4500, 5200, 12500, 3400) # All common percentiles at once quantile(spend, probs = c(0, 0.25, 0.5, 0.75, 0.9, 0.95, 1)) # Output: # 0% 25% 50% 75% 90% 95% 100% # 1800 2875 3600 4775 5200 8225 12500 |
Interpretation:
- 0% (minimum) = ₹1,800
- 25th percentile (Q1) = ₹2,875 → 25% spend less than this
- 50th percentile = median = ₹3,600
- 75th percentile (Q3) = ₹4,775 → 75% spend less than this
- 90th percentile = ₹5,200 → only 10% spend more than this
- 95th percentile = ₹8,225 → only 5% spend more
- 100% (maximum) = ₹12,500
4. Very Common Real-Life Uses of Percentiles (Hyderabad Examples)
A – Salary benchmarking
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
salaries <- c(48000, 52000, 62000, 75000, 82000, 95000, 145000, 280000, 1200000, 68000) quantile(salaries, probs = c(0.1, 0.25, 0.5, 0.75, 0.9, 0.95)) # 10th percentile ≈ ₹52,000 → bottom 10% earn below this # 50th percentile (median) ≈ ₹75,000 → typical salary # 90th percentile ≈ ₹280,000 → top 10% earn above this |
→ Naukri.com, Glassdoor, AmbitionBox almost always report 25th, 50th, 75th percentiles — never just the mean.
B – Delivery time (Swiggy/Zomato style)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
delivery_min <- c(18, 22, 25, 28, 31, 35, 38, 42, 48, 120) quantile(delivery_min, probs = c(0.5, 0.9, 0.95)) # 50th = 31 min (median) # 90th = 48 min → 90% of orders delivered in ≤48 min # 95th = 84 min → 95% in ≤84 min |
→ Companies advertise “90% orders delivered in 45 minutes” — that’s the 90th percentile.
C – Exam percentile ranks (competitive exams style)
|
0 1 2 3 4 5 6 7 8 9 10 |
marks_out_of_100 <- c(92, 88, 85, 78, 95, 68, 82, 91, 45, 89, 76, 84, 55, 97) quantile(marks_out_of_100, probs = seq(0, 1, 0.1)) # Your score of 92 → roughly 90th percentile (top 10%) |
5. Important Arguments in R’s quantile()
| Argument | What it does | Default | Most common choice |
|---|---|---|---|
| probs | Which percentiles (0 to 1) | — | c(0.25, 0.5, 0.75) or seq(0,1,0.05) |
| na.rm | Remove NA before calculation | FALSE | Always TRUE in real work |
| names | Add names to output (0%, 25%…) | TRUE | Keep TRUE |
| type | Calculation method (1 to 9) | 7 | 7 (matches Excel & most software) |
Always write quantile(x, probs = …, na.rm = TRUE)
6. Quick Cheat-Sheet – When to Use Which Percentile
| Percentile | Also called | What it means | Very common use in India |
|---|---|---|---|
| 0% | Minimum | Smallest value | Baseline / floor |
| 25% | Q1 | Bottom 25% are below this | Lower quartile |
| 50% | Median | Half are below, half are above | Typical / representative |
| 75% | Q3 | Top 25% are above this | Upper quartile |
| 90% | — | Top 10% are above this | “Good” / “high” threshold |
| 95% | — | Top 5% are above this | Outlier threshold |
| 99% | — | Top 1% are above this | Extreme / elite |
| 100% | Maximum | Largest value | Ceiling |
Your Mini Practice Right Now (Copy → Run & Experiment)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# Your own realistic dataset – monthly Ola/Uber ride costs rides <- c(380, 420, 350, 480, 410, 5200, 390, 450, 370, 460, 3800, 620) quantile(rides, probs = c(0, 0.25, 0.5, 0.75, 0.9, 0.95, 1), na.rm = TRUE) # Now remove the two big airport rides rides_typical <- rides[rides < 1000] quantile(rides_typical, probs = c(0, 0.25, 0.5, 0.75, 0.9, 1)) |
Now try:
- Add five more rides at exactly ₹420 → see how percentiles behave
- Make data strongly right-skewed → compare 90th percentile vs mean
- Calculate IQR = Q3 – Q1 (interquartile range)
You just saw percentiles in action!
Clearer now?
Next logical questions?
- Want to learn IQR and outlier detection using percentiles?
- See boxplot (which shows min, Q1, median, Q3, max)?
- Compare percentiles across groups (male vs female salary percentiles)?
- Or jump to variance & standard deviation?
Just tell me — whiteboard is ready! 📊🧮🚀
