Chapter 38: R Scatter Plot
R Scatter Plot.
Scatter plots are the bread-and-butter of data exploration — they show relationships between two continuous variables, reveal patterns, clusters, outliers, correlations, trends… basically everything you need to “see” your data before doing any serious statistics.
We’ll go through this step by step like a real classroom session:
- What it is and when to use it
- Base R version (quick & dirty)
- ggplot2 version (beautiful & modern)
- customization tricks people actually use in 2026
- common mistakes
- your own mini practice
1. What is a Scatter Plot? (Simple & Honest Definition)
A scatter plot is a graph where:
- Each observation (row) becomes a single point
- One variable → x-axis
- Another variable → y-axis
Goal: See if there is a relationship (linear, curved, none, clusters, outliers) between the two variables.
Classic real-life examples:
- Height vs Weight
- House size vs Price
- Study hours vs Exam marks
- Temperature vs Ice cream sales
- Car weight vs Fuel efficiency
2. Base R Scatter Plot – Fastest Way (No Package Needed)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
# Classic iris dataset – everyone’s first scatter plot plot(iris$Sepal.Length, iris$Petal.Length, main = "Sepal Length vs Petal Length (Base R)", xlab = "Sepal Length (cm)", ylab = "Petal Length (cm)", pch = 19, # filled circle (most common) cex = 1.3, # point size col = iris$Species) # color by species # Add legend manually (this is the annoying part in base R) legend("topleft", legend = levels(iris$Species), col = 1:3, pch = 19, bty = "n", # no box cex = 1.1) |
Add trend line (very common)
|
0 1 2 3 4 5 6 7 8 9 10 11 |
plot(iris$Sepal.Length, iris$Petal.Length, pch = 16, col = "steelblue", cex = 1.4) # Add linear regression line abline(lm(Petal.Length ~ Sepal.Length, data = iris), col = "darkred", lwd = 2.5, lty = 2) # dashed line |
3. ggplot2 Scatter Plot – The Modern Professional Choice
This is what almost everyone uses when the plot needs to look good or go into a report/paper/presentation.
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
library(ggplot2) # Basic version ggplot(iris, aes(x = Sepal.Length, y = Petal.Length)) + geom_point(size = 3, alpha = 0.85, shape = 16) + labs( title = "Iris Dataset – Sepal vs Petal Length", x = "Sepal Length (cm)", y = "Petal Length (cm)" ) + theme_minimal(base_size = 14) |
Color by category + smooth line + confidence band
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, color = Species)) + geom_point(size = 3.2, alpha = 0.9, pch = 16) + geom_smooth(method = "loess", se = TRUE, alpha = 0.12, linewidth = 1.2) + scale_color_brewer(palette = "Dark2") + # nice color set labs( title = "Sepal vs Petal Length by Species", subtitle = "With LOESS smoother and 95% confidence interval", x = "Sepal Length (cm)", y = "Petal Length (cm)", color = "Species" ) + theme_light(base_size = 14) + theme(legend.position = "bottom", plot.title = element_text(face = "bold")) |
Add size by another variable (very powerful)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, color = Species, size = Sepal.Width)) + geom_point(alpha = 0.8, pch = 16) + scale_size_continuous(range = c(2, 8)) + # control size range scale_color_brewer(palette = "Set1") + labs( title = "Iris – Size Encodes Sepal Width", size = "Sepal Width (cm)" ) + theme_minimal() + theme(legend.box = "vertical") |
4. Very Common Customizations People Actually Use
| Want to do this… | Base R way | ggplot2 way (recommended) |
|---|---|---|
| Change point shape | pch = 17 (triangle), 18 (diamond)… | shape = 17, or aes(shape = Species) |
| Change point size | cex = 1.5 | size = 3.5 or aes(size = variable) |
| Add regression line | abline(lm(y ~ x)) | geom_smooth(method = “lm”) |
| Add smooth curve | lines(lowess(x,y)) | geom_smooth(method = “loess”) |
| Add confidence band | Manual calculation | geom_smooth(se = TRUE) |
| Color by category | col = factor + legend() | aes(color = category) → auto legend |
| Facet by group | Manual multiple plots | facet_wrap(~ Species) or facet_grid() |
| Add text labels | text() | geom_text() or ggrepel::geom_text_repel() |
5. Quick Save Examples
|
0 1 2 3 4 5 6 7 8 9 10 |
# ggplot2 – high quality PNG ggsave("iris_scatter.png", width = 8, height = 6, dpi = 300, bg = "white") # PDF (best for publications) ggsave("iris_scatter.pdf", width = 7, height = 5) |
6. Your Mini Practice Right Now (Copy → Run!)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
library(ggplot2) # Fake Hyderabad student data students <- data.frame( study_hours = c(2, 4, 6, 8, 10, 3, 5, 7, 9, 1), marks = c(55, 68, 78, 85, 92, 62, 75, 82, 88, 48), sleep_hours = c(7, 6, 5.5, 6, 7.5, 8, 6.5, 5, 7, 4.5), area = c("Hyd", "Bng", "Hyd", "Del", "Mum", "Hyd", "Bng", "Del", "Mum", "Hyd") ) ggplot(students, aes(x = study_hours, y = marks, color = area, size = sleep_hours)) + geom_point(alpha = 0.85, shape = 16) + geom_smooth(method = "lm", se = FALSE, linewidth = 1.1, linetype = "dashed") + scale_color_brewer(palette = "Set2") + scale_size_continuous(range = c(3, 9), name = "Sleep (hours)") + labs( title = "Study Hours vs Exam Marks – Hyderabad Students 2026", subtitle = "Point size = average sleep hours", x = "Daily Study Hours", y = "Exam Marks (%)", color = "City" ) + theme_minimal(base_size = 14) + theme(legend.position = "bottom") |
Now try these experiments:
- Change geom_smooth(method = “loess”)
- Add facet_wrap(~ area)
- Use shape = area instead of color
- Add geom_text(aes(label = marks), vjust = -1)
Which version looks clearest to you?
Ready for more?
- Want to add correlation coefficient or equation on the plot?
- Learn marginal plots (scatter + histogram on sides)
- Practice saving publication-ready scatter plots?
- Or next plot type (boxplot, histogram, bar, heatmap)?
Just tell me — whiteboard is still clean! 📊✨🚀
