Chapter 2: R Introduction
1. What actually is R? (The short, honest version)
R is a free, open-source programming language + software environment made especially for:
- Statistics
- Data analysis
- Data visualization (beautiful charts & graphs)
- Reporting results in a reproducible way
It is not a general-purpose language like Python, Java, or C++. R was born for numbers, tables, experiments, surveys, models — and it still shines brightest there in 2026.
Official one-line from the R Project website (cran.r-project.org):
“R is a language and environment for statistical computing and graphics.”
Think of it as Excel + SPSS + MATLAB + graphing calculator — but free, programmable, and 100× more powerful.
2. Very quick history (just so you know where it came from)
- Late 1970s–1980s → Bell Labs creates S language (for statistics)
- 1990s → Ross Ihaka + Robert Gentleman (University of Auckland, NZ) want something better & free → they create R (R comes after S in alphabet 😄)
- 1993 → first version appears
- 2000 → version 1.0.0 released → becomes popular in universities
- Today (2026) → R version is around 4.5.x, used by millions in academia, pharma, finance, government, marketing, AI research, etc.
R is a GNU project → completely free forever, no license fees.
3. Why do people still love R in 2026? (Real advantages)
| Reason | Why it matters (especially for beginners in India) | Compared to Python (most common question) |
|---|---|---|
| Statistics built-in | t-test, ANOVA, regression, time-series, survival analysis… already there — no extra install needed | Python needs statsmodels / scipy / pingouin |
| Publication-quality plots | ggplot2 package → most beautiful scientific graphs in the world | matplotlib/seaborn good, but ggplot2 still wins in academia |
| Data frames are native | Tables feel like Excel but programmable | Python → pandas (very good, but R did it first) |
| Reproducible reports | R Markdown / Quarto → mix code + text + plots → PDF/Word/HTML reports in one click | Jupyter notebooks similar, but Quarto is catching up fast |
| Huge package ecosystem (CRAN) | > 22,000 packages (2026) — bioinformatics, finance, GIS, shiny apps, machine learning… | PyPI has more overall, but CRAN is very high-quality |
| Academia & research dominant | If you’re doing MSc/PhD/MBA thesis, almost everyone in stats/economics/psychology uses R | Python more in tech companies & ML engineering |
| Free & works offline | Download once → no cloud needed (great for Hyderabad power cuts 😅) | Same for both |
4. Who uses R today? (2026 reality check)
- Universities & research (IITs, IIMs, AIIMS, ISI Kolkata, almost every stats dept)
- Pharma & clinical trials (almost mandatory — CDISC standards love R)
- Banks & insurance (risk modeling, credit scoring)
- Government (NSSO, RBI reports, election analysis)
- Marketing analytics (customer segmentation, A/B testing)
- Data journalists (The Hindu, Scroll.in use R sometimes for infographics)
5. Let’s do your very first R session right now (copy-paste ready)
Install these two (if not done yet):
- R → https://cran.r-project.org/bin/windows/base/ (choose latest 4.5.x)
- RStudio Desktop (free) → https://posit.co/download/rstudio-desktop/
Open RStudio → new script (File → New File → R Script)
Paste and run line-by-line (Ctrl+Enter):
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
# 1. Say hello (classic first program) print("Namaste! I just started R in Hyderabad 2026 🌶️") # 2. Simple math – R is a big calculator 5 + 3 * 2 (5 + 3) * 2 sqrt(144) # square root log(100) # natural log exp(4.605) # e^4.605 ≈ 100 # 3. Store values (variables) temperature_today <- 28.5 humidity <- 65 city <- "Hyderabad" print(temperature_today + 2) # what if +2°C tomorrow? # 4. Create your first table (data frame – heart of R) students <- data.frame( Name = c("Aarav", "Priya", "Rahul", "Sneha"), Marks_Maths = c(92, 85, 68, 78), Marks_Stats = c(88, 94, 72, 81), City = c("Hyd", "Bng", "Del", "Mum") ) # See it nicely print(students) # Average stats marks mean(students$Marks_Stats) # Who scored > 80 in both? students[students$Marks_Maths > 80 & students$Marks_Stats > 80, ] |
You should see something like:
|
0 1 2 3 4 5 6 7 8 9 10 |
Name Marks_Maths Marks_Stats City 1 Aarav 92 88 Hyd 2 Priya 85 94 Bng 3 Rahul 68 72 Del 4 Sneha 78 81 Mum |
…and average ≈ 83.75
6. One famous built-in dataset – iris (play with this!)
R comes with famous flower measurement data (used in almost every intro):
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
data(iris) # load it head(iris) # first 6 rows # Simple plot (base R – quick & dirty) plot(iris$Sepal.Length, iris$Sepal.Width, col = iris$Species, pch = 19, main = "Iris Flowers – Sepal Size by Species") # Much prettier version (install once: install.packages("ggplot2")) library(ggplot2) ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) + geom_point(size = 3) + theme_minimal() + labs(title = "Classic Iris Dataset Visualization") |
This plot is legendary — you’ll see it everywhere in data science books.
7. Where to go from here? (My personal recommended path for 2026 beginners)
Week 1–2 → Basics (vectors, data.frames, $, mean/median/sd, subsetting) Week 3–4 → Import data (read.csv, readxl), clean with dplyr Week 5+ → ggplot2 plots, basic stats (t.test, lm), R Markdown reports Later → Shiny apps, tidymodels (machine learning), Quarto dashboards
Free best resources right now:
- Official: An Introduction to R (cran.r-project.org/doc/manuals/r-release/R-intro.pdf)
- Free interactive: DataCamp “Introduction to R” (first chapter free)
- YouTube: “R programming for ABSOLUTE beginners” by R Programming 101
- Book (free online): R for Data Science (r4ds.hadley.nz) — modern tidyverse way
So… how are you feeling?
- Want to install R + RStudio together step-by-step?
- Try a mini-project with your own data (marks, expenses, IPL scores…)?
- Jump straight to dplyr filtering & summarizing?
Tell me what’s next — I’m here all evening! 🚀
