Chapter 2: Programming Foundations
Programming Foundations explained in full detail, just like we’re sitting together in Airoli with your laptop open, me guiding you line by line, chai in hand. I’ll talk like your patient teacher who’s been coding in Python for data science since before Pandas was cool. We’ll go slow, with tons of real examples (copy-paste ready), why each thing matters for data science in 2026, and small exercises you can try right now.
By the end of this chapter, you should be comfortable writing clean Python scripts, organizing code, handling mistakes gracefully, and using Git/GitHub to track your work like a pro. This is the foundation—everything in later chapters (Pandas, ML models, deployment) builds on this.
Why Python is Still the King for Data Science in 2026
Python dominates because:
- Readable like English → you spend less time debugging syntax, more on thinking about data.
- Huge ecosystem: NumPy, Pandas, Scikit-learn, PyTorch, Hugging Face — all mature.
- Community + jobs: 90%+ of data science/ML roles in India ask for Python (LinkedIn/Naukri trends 2026).
- In 2026, Python 3.13 is stable → better REPL (interactive shell), improved error messages, experimental free-threading (faster for some parallel work), but we’ll stick to 3.11–3.13 basics that are rock-solid for DS.
Setup tip (do this now if not done): Install Python 3.12 or 3.13 from python.org or use Anaconda/Miniconda (best for data science → comes with Pandas, Jupyter, etc.). Use VS Code + Python extension or JupyterLab — both free and excellent.
1. Variables, Data Types, Control Flow, Functions
Variables — boxes to store stuff. No need to declare type.
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 |
# Simple assignment name = "Webliance" # string (str) age = 25 # integer (int) height = 5.9 # float is_learning = True # boolean (bool) print(name, age, height, is_learning) # Output: Webliance 25 5.9 True |
Common data types in data science:
- int, float — numbers
- str — text
- bool — True/False
- list — ordered, changeable collection [1, 2, 3]
- tuple — ordered, unchangeable (1, 2, 3)
- dict — key-value pairs {“city”: “Airoli”, “pin”: 400708}
- set — unique items {1, 2, 3} (no duplicates)
Control flow — decisions and loops
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
# if-elif-else temperature = 32 if temperature > 35: print("Too hot in Airoli today!") elif temperature > 28: print("AC on, please.") else: print("Pleasant weather.") # Output depends on value # for loop (very common in DS for iterating data) fruits = ["mango", "banana", "chikoo"] for fruit in fruits: print(f"I like {fruit.title()}") # f-string (modern formatting) # while loop (use carefully — can infinite loop!) count = 0 while count < 5: print(count) count += 1 |
Functions — reusable blocks of code. Crucial for clean DS scripts.
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
def greet_user(name, city="Mumbai"): # default argument """This greets the user.""" # docstring (good habit) return f"Hello {name}! How's weather in {city}?" print(greet_user("Webliance", "Airoli")) # Output: Hello Webliance! How's weather in Airoli? # Function with type hints (2026 best practice for DS code) def calculate_bmi(weight_kg: float, height_m: float) -> float: return weight_kg / (height_m ** 2) bmi = calculate_bmi(70.5, 1.72) print(f"Your BMI is {bmi:.2f}") # :.2f → 2 decimal places |
Quick exercise: Write a function is_hot_day(temp_c) that returns “Hot” if >30, “Warm” if 25–30, “Cool” otherwise. Test it.
2. OOP Basics, Modules, Error Handling
OOP (Object-Oriented Programming) — think of real-world things as objects with data + behavior.
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
# Class → blueprint class Customer: def __init__(self, name, city): # constructor self.name = name self.city = city self.orders = 0 # attribute def place_order(self, amount): self.orders += 1 print(f"{self.name} placed order #{self.orders} worth ₹{amount}") # Object (instance) cust1 = Customer("Webliance", "Airoli") cust1.place_order(1200) cust1.place_order(850) |
In data science: You’ll see classes in scikit-learn (e.g., model = RandomForestClassifier()), PyTorch models, custom pipelines.
Modules — files with code you can import (reuse across projects)
Create utils.py:
|
0 1 2 3 4 5 6 7 8 |
# utils.py def celsius_to_fahrenheit(c): return (c * 9/5) + 32 |
Then in your main script:
|
0 1 2 3 4 5 6 7 8 9 10 11 12 |
import utils print(utils.celsius_to_fahrenheit(32)) # → 89.6 # Better: selective import from utils import celsius_to_fahrenheit print(celsius_to_fahrenheit(40)) |
Popular built-in modules: math, random, datetime, os.
Error Handling — code crashes less (very important in DS pipelines)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
try: result = 10 / 0 except ZeroDivisionError: print("Cannot divide by zero!") except Exception as e: # catch-all print(f"Something went wrong: {e}") else: print("Success!") # runs if no error finally: print("This always runs") # cleanup |
In DS: Handle missing files, bad data, API errors gracefully.
3. File I/O, List/Dict Comprehensions, Lambda
File I/O — read/write files (CSV, JSON common in DS)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 |
# Write with open("notes.txt", "w") as f: # auto-closes file f.write("Learning Python in Airoli 2026\nLine 2") # Read with open("notes.txt", "r") as f: content = f.read() print(content) |
List comprehensions — short, fast way to create lists (very Pythonic for DS)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# Old way squares = [] for i in range(10): squares.append(i**2) # Comprehension (faster + cleaner) squares = [i**2 for i in range(10)] print(squares) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] # With condition even_squares = [i**2 for i in range(20) if i % 2 == 0] |
Dict comprehensions
|
0 1 2 3 4 5 6 7 8 9 10 |
cities = ["Airoli", "Mumbai", "Pune"] temps = [32, 34, 28] temp_dict = {city: temp for city, temp in zip(cities, temps)} print(temp_dict) # {'Airoli': 32, 'Mumbai': 34, 'Pune': 28} |
Lambda — anonymous (one-line) functions. Super useful in sorting, Pandas apply, etc.
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# Normal function def double(x): return x*2 # Lambda double = lambda x: x*2 print(double(5)) # 10 # Common use: sort list of tuples students = [("Rahul", 85), ("Priya", 92), ("Amit", 78)] sorted_students = sorted(students, key=lambda x: x[1], reverse=True) print(sorted_students) # [('Priya', 92), ('Rahul', 85), ('Amit', 78)] |
4. Version Control — Git & GitHub (Branching, Pull Requests, Collaboration)
Why Git for data scientists in 2026? Notebooks change a lot → track experiments, revert bad changes, collaborate, show portfolio on GitHub.
Step-by-step setup (do this today):
- Install Git → https://git-scm.com
- Create free GitHub account
- Configure:
Bash01234567git config --global user.name "Webliance"git config --global user.email "your@email.com"
Basic workflow (local → GitHub):
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# In your project folder git init # start git repo # Add files git add my_notebook.ipynb requirements.txt git commit -m "Initial commit: first DS project" # Connect to GitHub (create empty repo first on github.com) git remote add origin https://github.com/Webliance/my-ds-project.git git branch -M main # default branch usually main git push -u origin main |
Branching (super important for experiments)
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 |
git checkout -b feature/churn-model # create + switch branch # ... work, commit changes ... git add . git commit -m "Add baseline churn model" git checkout main # go back git merge feature/churn-model # merge good work git push origin main |
Pull Request (PR) on GitHub — for collaboration/review
- Push branch → git push origin feature/churn-model
- Go to GitHub → repo → Pull requests → New pull request
- Compare branches → create PR → add description (“Added XGBoost churn model, 0.82 AUC”)
- Someone reviews → approve → merge
Collaboration example: Friend clones your repo
|
0 1 2 3 4 5 6 7 8 9 10 11 |
git clone https://github.com/Webliance/my-ds-project.git cd my-ds-project git checkout -b fix/bug-in-eda # fix something git push origin fix/bug-in-eda # Then open PR |
Pro tips for DS in 2026:
- Add .gitignore (ignore large data files, pycache, .env)
- Use Git LFS for big models/datasets
- Commit often, small messages
- Use GitHub Actions for auto-testing notebooks (advanced but cool)
Quick exercise: Create a folder python-practice, init git, make a branch experiment-1, write a small script, commit, push to GitHub, create PR from branch to main.
That’s the full Chapter 2! You now have the tools to write structured, reusable, error-proof Python code and version it properly.
