Chapter 9: R Multiple Variables
R Multiple variables
This topic can mean slightly different things depending on context, so I’ll cover both interpretations that beginners usually have in mind:
- Having / managing many different variables in a real script (very common in data analysis)
- Assigning values to several variables at the same time (like unpacking or parallel/multiple assignment)
We’ll go slowly with lots of realistic examples — like we’re sitting together looking at the same RStudio screen.
1. Having Many Variables — The Normal Situation in Real R Work
In almost every real R script (especially data science / statistics work), you will have 10–50+ variables at the same time.
This is not a problem — it’s expected and encouraged.
Example of a realistic small analysis script:
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
# ─────────────────────────────────────────────────────────────── # Hyderabad Retail Sales Analysis – February 2026 # ─────────────────────────────────────────────────────────────── # 1. Constants / settings CITY <- "Hyderabad" START_DATE <- as.Date("2026-01-01") MIN_SALE_THRESHOLD <- 5000 # 2. File paths / locations raw_file <- "data/sales_raw_hyd_2026.csv" clean_file <- "data/sales_clean.rds" # 3. Raw & imported data sales_raw <- read.csv(raw_file, stringsAsFactors = FALSE) customer_data <- readRDS("data/customers.rds") # 4. Working / intermediate objects sales_hyd <- sales_raw |> filter(city == CITY, order_date >= START_DATE) sales_clean <- sales_hyd |> mutate( profit = revenue - cogs - shipping_cost, profit_margin = profit / revenue * 100, is_high_value = revenue >= MIN_SALE_THRESHOLD ) |> filter(!is.na(customer_id)) # 5. Summary statistics & results monthly_sales <- sales_clean |> group_by(year_month = format(order_date, "%Y-%m")) |> summarise( total_revenue = sum(revenue), total_profit = sum(profit), n_transactions = n(), avg_order_value = mean(revenue) ) overall_avg_profit <- mean(sales_clean$profit, na.rm = TRUE) top_customers <- sales_clean |> group_by(customer_id) |> summarise(total_spent = sum(revenue)) |> slice_max(total_spent, n = 10) best_selling_product <- sales_clean |> count(product_name, wt = revenue, name = "total_revenue") |> slice_max(total_revenue, n = 1) # 6. Models profit_model <- lm(profit ~ revenue + shipping_cost + is_high_value, data = sales_clean) # 7. Visualizations p1 <- ggplot(monthly_sales, aes(year_month, total_revenue)) + geom_col(fill = "#00A087") + theme_minimal() + labs(title = paste("Monthly Revenue –", CITY, "2026")) p2 <- ggplot(sales_clean, aes(profit_margin)) + geom_histogram(bins = 30, fill = "#E69F00") + theme_minimal() # ─────────────────────────────────────────────────────────────── # End of analysis – many variables alive at this point # ─────────────────────────────────────────────────────────────── |
Important lessons from this example:
- It’s normal to have many variables (here ~20–25)
- Group them logically with comments / sections
- Use descriptive names (monthly_sales > df3, overall_avg_profit > x)
- Many variables are temporary / intermediate — that’s fine
- You can remove variables you no longer need:
|
0 1 2 3 4 5 6 7 8 9 |
rm(sales_raw, sales_hyd) # free memory # or remove everything except final results: keep <- c("monthly_sales", "profit_model", "p1", "p2") rm(list = setdiff(ls(), keep)) |
2. Assigning to Multiple Variables at Once (Parallel / Multiple Assignment)
R does not have clean built-in syntax like Python’s
|
0 1 2 3 4 5 6 |
a, b, c = 10, 20, "hello" # Python – easy |
But R users use several practical patterns instead.
Pattern A – Most common & clearest (recommended for beginners)
Return a named list and extract with $ or [[ ]]
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
calculate_stats <- function(x) { list( mean = mean(x, na.rm = TRUE), median = median(x, na.rm = TRUE), sd = sd(x, na.rm = TRUE), n = length(x) ) } result <- calculate_stats(c(78, 92, 85, 67, 91, NA)) # Classic & very readable way avg <- result$mean middle <- result$median spread <- result$sd count <- result$n |
Pattern B – zeallot package → very Python-like (%<-% operator)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# install.packages("zeallot") # only once library(zeallot) c(avg, med, stdev, count) %<-% calculate_stats(c(78, 92, 85, 67, 91, NA)) # Now you have four separate variables print(avg) # 82.6 print(med) # 85 print(stdev) # ≈ 10.01 |
Many people who come from Python really like this one.
Pattern C – wrapr package → another nice syntax (%:=%)
|
0 1 2 3 4 5 6 7 8 9 10 11 |
# install.packages("wrapr") library(wrapr) c(mean_val, med_val, sd_val) %:=% calculate_stats(c(78, 92, 85, 67, 91)) print(mean_val) # 82.6 |
Pattern D – Base R quick & dirty (2–3 values)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# Very common pattern range_vals <- range(c(67, 78, 85, 91, 92)) lowest <- range_vals[1] highest <- range_vals[2] # Or with names quantile_vals <- quantile(marks, probs = c(0.25, 0.75)) names(quantile_vals) <- c("Q1", "Q3") q1 <- quantile_vals["Q1"] q3 <- quantile_vals["Q3"] |
Quick Decision Table – Which Style to Use?
| Situation | Recommended Approach | Why / When to choose it |
|---|---|---|
| Beginner / teaching / simple scripts | Named list + $ extraction | Very clear, no extra packages |
| Want Python-like feeling | zeallot → %<-% | Cleanest syntax, readable |
| Already using wrapr / advanced tidy | wrapr → %:=% | Nice alternative |
| Only 2 values (min/max, range, etc.) | range(), quantile(), then [1],[2] | Fast, no extra code |
| Inside a pipeline | Keep as list or use list2env() | Rare – usually better to keep as list |
Your Turn – Small Practice (copy → run)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
library(zeallot) # if you installed it quick_stats <- function(values) { list( avg = mean(values), med = median(values), minval = min(values), maxval = max(values) ) } marks <- c(68, 72, 79, 85, 92, 94) # Classic way s <- quick_stats(marks) print(s$avg) print(s$maxval) # zeallot way (uncomment if you have the package) # c(average, middle, lowest, highest) %<-% quick_stats(marks) # print(average) # 81.666... # print(highest) # 94 |
So — which part feels most useful / confusing right now?
- Want to try writing a function that returns multiple values?
- Practice cleaning up a script with too many similar variables?
- Move forward to vectors, data frames, or dplyr?
Just tell me — we continue exactly where you want! 🚀
