Chapter 16: ML Linear Regressions

Linear Regression in Machine Learning in a very detailed, step-by-step way, just like your favorite teacher explaining it over a long lunch break.

No scary formulas right at the start — we’ll build intuition first with stories and real examples (Hyderabad flats, exam marks, everyday things), then slowly add the math and code so it all makes perfect sense. By the end you’ll be able to explain Linear Regression to anyone and even build a simple one yourself.

Step 1: What is Linear Regression? (The Simple Big Idea)

Linear Regression is the simplest and most important supervised ML algorithm for predicting continuous numbers (not categories like cat/dog).

It assumes that the relationship between your input(s) and the output is roughly a straight line (or flat plane in 2+ dimensions).

In one sentence: Linear Regression finds the best straight line that fits your data points so you can use that line to predict new values.

Real-life analogy everyone gets: Imagine you’re trying to guess the price of a 2BHK flat in Hyderabad just by knowing its size in square feet.

You look at many past sales:

  • 800 sq ft → ₹48 lakh
  • 1200 sq ft → ₹72 lakh
  • 1500 sq ft → ₹90 lakh
  • 1800 sq ft → ₹108 lakh

If you plot these on graph paper (size on x-axis, price on y-axis), the dots roughly form an upward-sloping straight line trend.

Linear Regression job = draw the best possible straight line through those dots → then for any new flat size (say 1400 sq ft), go up to the line and read the predicted price.

That’s it — the whole magic of Linear Regression!

Step 2: Why “Linear”?

Because the model assumes a linear relationship:

  • When input increases by 1 unit, output increases (or decreases) by a fixed amount (the slope).
  • No curves, no sudden jumps — just steady straight change.

Examples of linear-ish relationships in real life:

  • Study hours vs exam marks (more hours → higher marks, roughly straight)
  • Advertising spend vs sales (more ads → more sales, often linear in small range)
  • Car mileage vs engine size (bigger engine → lower mileage, negative linear)
  • Temperature vs ice cream sales (hotter → more sales, positive linear)

Step 3: The Famous Equation (Keep This Forever)

The model is just this simple line equation:

y = mx + c (or in ML language: ŷ = w₁x₁ + w₀)

  • (y-hat) = predicted value (what the model guesses)
  • x = input feature (e.g., size in sq ft)
  • m or w₁ = slope (how much price changes per extra sq ft)
  • c or w₀ = intercept (base price when size = 0, e.g., land value or fixed costs)

Goal of Linear Regression = find the best m and c so that the line is as close as possible to all real data points.

Step 4: How Does It Find the “Best” Line? (Least Squares)

It uses the Least Squares method:

For each data point:

  • Calculate error = actual price – predicted price (vertical distance to the line)
  • Square every error (so negative/positive don’t cancel, and big errors hurt more)
  • Sum all squared errors → get total error (called Sum of Squared Errors or SSE)

The algorithm tries different slopes & intercepts until it finds the pair that gives the smallest possible SSE.

In 2026 we almost never do this manually — libraries like scikit-learn or statsmodels solve it instantly (using closed-form math or gradient descent).

Step 5: Real Example 1 – Hyderabad Flat Prices (One Feature)

Data (small pretend dataset):

Size (sq ft) Price (₹ lakh)
800 48
1000 60
1200 72
1400 84
1600 96
1800 108

After Linear Regression (in real life you’d run code):

Best fit line ≈ Price = 0.06 × Size + 0

  • Slope 0.06 → every extra 100 sq ft adds ≈ ₹6 lakh
  • Intercept ≈ 0 (in this simple data — in real data there’s usually a base)

Predict for new flat: 1500 sq ft → 0.06 × 1500 = ₹90 lakh (model prediction)

Step 6: Multiple Linear Regression (Realistic – Many Features)

In real life you never use just one feature!

Real Hyderabad flat data might have:

  • Size_sqft
  • Bedrooms
  • Location_score (e.g., 9 for Gachibowli, 6 for Uppal)
  • Age_years
  • Floor_number

Equation becomes:

Price = w₁×Size + w₂×Bedrooms + w₃×Location_score + w₄×Age + w₅×Floor + w₀

Each w tells the impact:

  • w₁ = +0.055 → +₹5.5 lakh per 100 sq ft
  • w₂ = +8 → +₹8 lakh per extra bedroom
  • w₄ = -2 → -₹2 lakh per extra year of age (depreciation)

This is Multiple Linear Regression — still a flat plane/hyperplane, but in higher dimensions.

Step 7: How Good Is the Fit? (Very Important Metrics)

  • R² (R-squared) — 0 to 1 Tells % of variance in price explained by the model. R² = 0.92 → model explains 92% of price changes — very good! R² = 0.3 → poor fit, need better features or non-linear model.
  • Mean Absolute Error (MAE) → average |actual – predicted| in ₹ lakh Example: MAE = 4.2 → on average predictions are off by ₹4.2 lakh.
  • Root Mean Squared Error (RMSE) → average error (bigger errors penalized more)

Step 8: Quick Python Example (What You Actually Do)

Python

Step 9: When Linear Regression Fails (Important Warnings)

  • Non-linear relationships → curve (use polynomial regression, trees, neural nets)
  • Outliers → one crazy expensive flat ruins the line
  • Multicollinearity → features too correlated (e.g., size & bedrooms)
  • Assumptions violated → errors not normal, heteroscedasticity (funnel shape in residuals plot)

Final Teacher Summary (Repeat This!)

Linear Regression = supervised ML for predicting numbers by fitting the best straight line (or plane) to data.

  • One feature → simple line
  • Many features → multiple linear regression
  • Goal → minimize squared errors → predict new values
  • Always check R², MAE, residuals plot

In Hyderabad 2026: Every property site (99acres, Magicbricks), car price estimator (CarDekho), salary predictor uses Linear Regression (or its improved cousins) as baseline.

Understood now? 🌟

Want next?

  • Full code for multiple features + visualization?
  • How to interpret coefficients in real estate?
  • Difference vs Logistic Regression (for classification)?

Just tell me — class is still going! 🚀

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *