Chapter 33: R Data Frames
R: Data Frames.
If you’ve been following our lessons step by step, you already know:
- vectors (lists of same-type values)
- logical values
- strings
- numbers
- how to combine strings, do math, use if-else…
Now data frames are where everything comes together — they are R’s version of Excel tables, Google Sheets, or SQL tables, but much more powerful because you can program them.
Let’s go slowly and deeply — like I’m sitting next to you in RStudio, building everything live.
1. What is a Data Frame? (The Honest Definition)
A data frame is:
- A two-dimensional table (rows × columns)
- Every column is a vector (and all columns must have the same number of rows)
- Every column can have a different data type (numeric in one column, character in another, logical in another…)
- Columns have names (like variable names)
- Rows usually represent individual observations (people, days, transactions…)
- Columns represent variables / features (age, name, salary, city…)
In short: Data frame = rectangular table where each column is a properly named vector of the same length.
This structure is so central that ~90% of real data analysis in R happens inside data frames.
2. How to Create a Data Frame (Manual Way – Good for Learning)
The most common way to create one from scratch:
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# Classic way – most readable students <- data.frame( name = c("Aarav", "Priya", "Rahul", "Sneha", "Vikram"), age = c(22, 21, 24, 23, 20), marks_math = c(92, 85, 68, 78, 88), marks_stats= c(88, 94, 72, 81, 90), city = c("Hyd", "Bng", "Del", "Mum", "Hyd"), passed = c(TRUE, TRUE, FALSE, TRUE, TRUE), stringsAsFactors = FALSE # ← important in modern R (prevents auto-factors) ) # Look at it nicely print(students) View(students) # opens interactive viewer in RStudio – very useful! |
Output looks like this:
|
0 1 2 3 4 5 6 7 8 9 10 11 |
name age marks_math marks_stats city passed 1 Aarav 22 92 88 Hyd TRUE 2 Priya 21 85 94 Bng TRUE 3 Rahul 24 68 72 Del FALSE 4 Sneha 23 78 81 Mum TRUE 5 Vikram 20 88 90 Hyd TRUE |
3. Basic Things You Do with Data Frames Every Day
a) See structure & summary
|
0 1 2 3 4 5 6 7 8 |
str(students) # shows types & first few values summary(students) # min/max/mean/median for numbers, freq for factors/characters glimpse(students) # tidyverse version – very clean (if you have dplyr) |
b) Access columns (3 main ways)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# Dollar notation – most common & readable students$marks_math students$city # By position students[[3]] # 3rd column students[, "marks_stats"] # named column # Multiple columns students[, c("name", "age", "city")] |
c) Access rows
|
0 1 2 3 4 5 6 7 |
students[1, ] # first row students[3:5, ] # rows 3 to 5 |
d) Add a new column
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
# Simple calculation students$avg_marks <- (students$marks_math + students$marks_stats) / 2 # Conditional column students$grade <- ifelse(students$avg_marks >= 80, "A", ifelse(students$avg_marks >= 60, "B", "C")) # Using dplyr (modern favorite) library(dplyr) students <- students |> mutate( from_south = city %in% c("Hyd", "Bng", "Chn"), performance = case_when( avg_marks >= 90 ~ "Excellent", avg_marks >= 75 ~ "Good", TRUE ~ "Needs improvement" ) ) |
4. Filtering Rows (The Real Power)
Base R way:
|
0 1 2 3 4 5 6 7 8 9 10 |
# Students from Hyderabad who passed students[students$city == "Hyd" & students$passed == TRUE, ] # Same but only name & average students[students$city == "Hyd" & students$passed == TRUE, c("name", "avg_marks")] |
Modern tidyverse way (much more readable):
|
0 1 2 3 4 5 6 7 8 9 10 |
library(dplyr) students |> filter(city == "Hyd", passed == TRUE) |> select(name, avg_marks, grade) |
5. Common Real-Life Creation Patterns (What You’ll Actually Do)
From CSV / Excel files
|
0 1 2 3 4 5 6 7 8 9 10 11 |
# CSV sales <- read.csv("monthly_sales_hyd_2026.csv", stringsAsFactors = FALSE) # Excel (better package) library(readxl) survey <- read_excel("hyd_customer_survey.xlsx", sheet = "Responses") |
From vectors of equal length
|
0 1 2 3 4 5 6 7 8 9 10 |
dates <- as.Date(c("2026-02-01", "2026-02-02", "2026-02-03")) temps <- c(28.5, 29.8, 30.2) rain <- c(FALSE, TRUE, FALSE) weather <- data.frame(date = dates, temp_c = temps, rained = rain) |
6. Important Properties & Tips (2026 Style)
- Column names should be snake_case → student_id, order_date, total_amount
- Never use spaces or special characters in column names
- stringsAsFactors = FALSE (or global option) — prevents automatic conversion to factors
- tibble (from tibble package) is modern data.frame alternative — nicer printing, no partial matching
|
0 1 2 3 4 5 6 7 8 9 |
library(tibble) students_tb <- as_tibble(students) print(students_tb) # cleaner output |
Your Mini Practice Right Now
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
# Create your own small Hyderabad data frame df <- data.frame( day = c("Mon", "Tue", "Wed", "Thu", "Fri"), temp_max = c(28.9, 31.4, 30.1, 32.5, 29.2), humidity = c(68, 55, 72, 45, 80), traffic = c("Medium", "Heavy", "Light", "Very Heavy", "Medium") ) # Add useful columns df <- df |> mutate( is_hot = temp_max >= 31, uncomfortable = is_hot & humidity >= 60, bad_day = traffic %in% c("Heavy", "Very Heavy") | uncomfortable ) # See only bad days df |> filter(bad_day) |> select(day, temp_max, traffic) |
You just created, enriched, and filtered a data frame — this is 80% of daily R work!
Feeling good?
Next logical steps?
- Want to go deeper into dplyr verbs (filter, select, mutate, group_by, summarise)?
- Learn joining data frames (like VLOOKUP in Excel)?
- Or practice importing your own CSV/Excel file?
Just tell me — we continue exactly where you want! ☕📊🚀
