Chapter 41: R Statistics
R Statistics.
R was literally created for statistics — the entire language was designed by statisticians for statisticians. In 2026, R remains one of the two dominant languages for serious statistical analysis (the other being Python), especially in academia, pharma, clinical trials, government research, econometrics, psychometrics, biostatistics, and many industrial research labs.
Today I’m going to explain R Statistics like your offline teacher who actually does real statistical work — slowly, honestly, with practical examples you can copy-paste right now, and with the 2026 reality (what people actually use, not just textbook theory).
1. What Does “R Statistics” Actually Mean?
When people say “R Statistics”, they usually mean three overlapping layers:
- Base R statistical functions — already built-in, very fast, very reliable
- Core statistics packages — stats (included), MASS, boot, survival, etc.
- Modern tidyverse-style ecosystem — tidymodels, easystats, rstatix, infer, broom, modelsummary, etc.
Most people in 2026 use a mix:
- Base + stats for quick t-tests, ANOVA, regression
- tidymodels or easystats for tidy modeling workflows
- ggplot2 + ggstatsplot / see / performance for beautiful statistical visualization
2. Most Commonly Used Statistical Tools in R (2026 Reality)
| Task / Analysis Type | Base R function(s) | Modern preferred way (2026) | When to use which |
|---|---|---|---|
| Summary statistics | mean(), median(), sd(), summary() | skimr::skim(), janitor::tabyl(), summarytools | Quick look → skimr |
| One-sample t-test | t.test() | t.test() or rstatix::t_test() | Both fine |
| Two-sample t-test | t.test(x ~ group) | rstatix::t_test() or infer | rstatix nicer output |
| Paired t-test | t.test(x, y, paired = TRUE) | same | — |
| Wilcoxon / Mann-Whitney | wilcox.test() | rstatix::wilcox_test() | — |
| ANOVA (one-way) | aov(), anova() | rstatix::anova_test() or car::Anova() | rstatix or afex |
| Repeated-measures ANOVA | — (manual) | afex::aov_ez() or ez::ezANOVA() | afex recommended |
| Linear regression | lm() | lm() + broom::tidy() + performance::check_model() | Always use broom + performance |
| Logistic regression | glm(family = binomial) | glm() + broom + performance | — |
| Correlation (Pearson/Spearman) | cor.test() | correlation::correlation() or rstatix | correlation package |
| Chi-square test | chisq.test() | rstatix::chisq_test() | — |
| Survival analysis | survival::survfit() | survival + survminer for plots | — |
| Power analysis | pwr package | pwrss or WebPower | — |
3. Hands-on Examples – Real Statistical Work in R
Example 1 – Descriptive statistics & quick exploration
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
# Load famous dataset data(iris) # Classic base summary summary(iris) # Modern tidy summary (very popular in 2026) library(skimr) skim(iris) # Or even nicer library(summarytools) dfSummary(iris, plain.ascii = FALSE, style = "grid") |
Example 2 – Two-sample t-test (classic)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# Compare petal length between setosa and versicolor t.test(Petal.Length ~ Species, data = subset(iris, Species %in% c("setosa", "versicolor")), var.equal = FALSE) # Welch's t-test (safer default) # Modern tidy output (rstatix) library(rstatix) iris %>% filter(Species %in% c("setosa", "versicolor")) %>% t_test(Petal.Length ~ Species) |
Example 3 – Linear regression + modern diagnostics (2026 style)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# Simple linear model model <- lm(Petal.Length ~ Sepal.Length + Species, data = iris) # Classic summary summary(model) # Modern tidy results table library(broom) tidy(model, conf.int = TRUE) # coefficients table glance(model) # model fit statistics # Check assumptions visually (very important!) library(performance) check_model(model) # beautiful diagnostic plots |
Example 4 – Correlation matrix + visualization
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# Correlation between all numeric variables library(correlation) correlation(iris %>% select(where(is.numeric)), method = "pearson") # Beautiful correlation plot library(ggcorrplot) ggcorrplot(cor(iris[,1:4]), type = "lower", lab = TRUE, title = "Correlation Matrix – Iris Dataset") |
4. Modern 2026 Workflow Summary
Most people today follow this pattern:
- Explore & describe → skimr, summarytools, DataExplorer
- Visualize before modeling → GGally::ggpairs(), ggstatsplot
- Model → lm(), glm(), or tidymodels workflow
- Tidy results → broom::tidy(), broom::glance()
- Check assumptions → performance::check_model()
- Report nicely → modelsummary, gtsummary, flextable
- Visualize results → ggstatsplot, see, sjPlot
Your Mini Practice Right Now
Copy this block — run it and see how modern output looks:
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
library(tidyverse) library(rstatix) library(ggstatsplot) library(performance) # Quick analysis – does sepal width differ by species? iris %>% filter(Species != "virginica") %>% # only two groups t_test(Sepal.Width ~ Species) %>% print() # Beautiful plot with stats ggbetweenstats( data = iris %>% filter(Species != "virginica"), x = Species, y = Sepal.Width, type = "nonparametric", # Wilcoxon test title = "Sepal Width by Species (Wilcoxon test)" ) # Linear model diagnostics model <- lm(Sepal.Width ~ Sepal.Length + Species, data = iris) check_model(model) |
You just did a real statistical analysis workflow in ~10 lines!
Feeling excited?
Next steps?
- Want to do ANOVA + post-hoc tests?
- Logistic regression example?
- Power analysis for t-test?
- Or jump to reporting tables with gtsummary / modelsummary?
Just tell me — whiteboard is ready! 📈🧮🚀
