Chapter 4: R Syntax
R: its syntax.
Think of syntax as the grammar rules of the R language. If you break them, R gets confused and throws errors (those red messages in the console). But once you get comfortable, R syntax feels very natural — especially if you’re coming from stats/math background rather than heavy programming.
I’m going to teach this like we’re sitting together at a table in Hi-Tech City, writing code line by line on my laptop, explaining why each rule exists, with tons of examples you can copy-paste right now into RStudio.
1. Core Philosophy of R Syntax (Important Mindset First)
- R is case-sensitive → mean() ≠ Mean() ≠ MEAN()
- Almost everything in R is a function — even operators like + are secretly functions
- You mostly work with objects (variables) — you create them, name them, manipulate them
- No semicolons needed at end of lines (unlike Java/C++) — new line = new command
- Use spaces freely for readability — R ignores extra spaces (mostly)
- Comments start with # — everything after # on that line is ignored
2. The Most Important Symbol: Assignment Operator (<-)
This is the single most characteristic thing in R syntax.
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# Three ways to assign (but use <- almost always) x <- 42 # leftward arrow – preferred style in almost all modern R code y = 3.14 # equal sign – works, but can cause confusion in function arguments 100 -> z # rightward arrow – rare, avoid print(x) # 42 print(y) # 3.14 print(z) # 100 |
Why <- instead of = ? Because = is also used for naming arguments inside functions. Using <- everywhere avoids confusion.
Bad (confusing):
|
0 1 2 3 4 5 6 |
lm(formula = y ~ x, data = mydata) # = here is naming argument, not assignment |
Good (clear):
|
0 1 2 3 4 5 6 |
result <- lm(y ~ x, data = mydata) |
3. Comments – Your Best Friend
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
# This is a full-line comment temperature <- 29.5 # Hyderabad afternoon in Feb 2026 – inline comment # Multi-line comments? Just put # on every line # Like this # No /* */ style in base R |
Pro tip: Good code has ~30–50% comments when you’re learning.
4. Printing / Showing Results
In RStudio console:
- Just type the name → it prints (implicit print)
|
0 1 2 3 4 5 6 7 |
marks <- c(85, 92, 78) marks # prints automatically |
- Explicit print (useful in scripts/functions)
|
0 1 2 3 4 5 6 7 |
print("Hello Hyderabad!") print(marks) |
- Cat for nicer output (no [1] prefix)
|
0 1 2 3 4 5 6 |
cat("Average =", mean(marks), "\n") # \n = new line |
5. Basic Building Block: Vectors (Everything Starts Here)
Vectors are the atoms of R — almost all data lives in vectors.
Syntax to create:
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 |
# c() = combine / concatenate numbers <- c(1, 3, 5, 7, 9) # numeric vector names <- c("Aarav", "Priya", "Rahul") # character vector – use " " or ' ' logic <- c(TRUE, FALSE, TRUE) # logical – TRUE/FALSE (must be uppercase) # Mix types? R coerces to most flexible type mixed <- c(42, "hello", TRUE) # becomes all character: "42" "hello" "TRUE" |
Vector operations happen element-wise — huge time-saver!
|
0 1 2 3 4 5 6 7 8 9 |
marks <- c(78, 92, 65, 88) marks + 5 # 83 97 70 93 marks > 80 # FALSE TRUE FALSE TRUE marks[marks > 80] # 92 88 (logical subsetting – very powerful!) |
6. Functions – How You Do Almost Everything
Syntax pattern:
|
0 1 2 3 4 5 6 |
function_name(argument1, argument2 = default_value, ...) |
Examples:
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# No arguments getwd() # current working directory # One argument mean(marks) length(names) # Multiple arguments – some named, some positional seq(from = 1, to = 10, by = 2) # named arguments – very readable seq(1, 10, by = 2) # positional – shorter but less clear # Common: install & load packages install.packages("ggplot2") # only once library(ggplot2) # every session |
7. Data Frames – The Excel-like Table (Most Important Structure)
Syntax to create manually:
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
df <- data.frame( Name = c("Amit", "Priya", "Rahul"), Age = c(24, 22, 25), Score = c(88.5, 92.0, 79.5), Passed = c(TRUE, TRUE, FALSE), stringsAsFactors = FALSE # modern default – keeps characters as character ) print(df) |
Access syntax – very flexible:
|
0 1 2 3 4 5 6 7 8 9 10 11 |
df$Score # column as vector df$Score[2] # 92.0 df[ , "Age"] # all rows, Age column df[2 , ] # row 2, all columns df[ df$Passed == TRUE , ] # filter rows where Passed is TRUE |
Modern tidyverse style (learn soon):
|
0 1 2 3 4 5 6 7 8 |
library(dplyr) df |> filter(Passed == TRUE) |> select(Name, Score) |
8. Control Structures – If, For, While (Less Used in Modern R)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# If-else grade <- "A" if (grade == "A") { print("Excellent!") } else if (grade == "B") { print("Good") } else { print("Try harder") } # For loop (vectorized better, but sometimes needed) for (i in 1:5) { cat("Square of", i, "is", i^2, "\n") } |
9. Quick Reference Table – Most Common Syntax Patterns
| What you want to do | Syntax Example | Notes |
|---|---|---|
| Assign value | x <- 10 or x = 10 | Prefer <- |
| Create vector | c(1, 2, 3) or 1:10 | : is fast for integers |
| Comment | # explanation | Everywhere! |
| Function call | mean(x, na.rm = TRUE) | na.rm = named argument |
| Access column | df$column or df[[“column”]] | $ is friendly |
| Logical AND / OR | & and |
(single for vectors) |
| String combine | paste(“Hi”, “Hyd”, sep = ” “) | or paste0() no space |
| Install package | install.packages(“tidyverse”) | Internet needed |
| Load package | library(tidyverse) | Every new session |
10. Your Mini Practice Right Now (Copy → Run)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
# Create student data students <- data.frame( id = 1:4, name = c("Sneha", "Rahul", "Priya", "Vikram"), marks = c(92, 68, 85, 79), city = c("Hyd", "Bng", "Del", "Mum") ) # Print summary print(students) # Average marks avg <- mean(students$marks) cat("Average marks =", round(avg, 1), "\n") # 81 # Who scored above average? above_avg <- students[students$marks > avg, "name"] print(above_avg) # Add new column students$grade <- ifelse(students$marks >= 80, "A", "B") print(students) |
You just used vectors, data frames, logical subsetting, functions, ifelse(), assignment — core R syntax!
Feeling good?
Tell me:
- Want more on subsetting tricks?
- Control flow (loops/if) in detail?
- Or jump to dplyr / ggplot2 syntax next?
- Any line giving error right now?
I’m right here — let’s fix or go deeper! 🚀
