Chapter 5: Pandas Read CSV

Pandas – Reading CSV Files (the most common first real task)

Almost every data project starts with:

Python

But in real life, very few CSVs are clean and simple. That’s why we need to understand the most important options and how to troubleshoot.

1. The most basic read (when everything is perfect)

Python

This works when:

  • File is in current working directory or you give correct path
  • First row contains column names
  • Separator is comma (,)
  • No strange encoding
  • No extra blank lines at top/bottom

But in reality… almost never all these are true.

2. The realistic “good starting command” most people use

Python

Even better version (very common in 2025):

Python

3. Most common real-world problems & how to fix them

Problem 1 – Wrong separator (very common!)

Many files use ; or tab or instead of ,

Python

Quick way to check separator:

Python

Look at the first line — what character separates the fields?

Problem 2 – First few rows are not data (header, comments, title)

Python

Problem 3 – Missing values written in strange ways

Python

Problem 4 – Dates are read as strings

Python

Problem 5 – Large files → memory issues or slow loading

Python

4. Real-life example – reading a messy Indian sales CSV

Let’s pretend we have a file sales_data.csv with these issues:

  • Separator is ;
  • First line is title
  • Second line is column description
  • Dates are in DD-MM-YYYY
  • Missing values are – or blank
  • Customer ID and phone should stay as string
  • File uses windows-1252 encoding (common in older Indian systems)
Python

5. Quick checklist when reading a CSV

  1. Look at the file first (open in text editor or head -n 5 file.csv in terminal)
  2. What is the separator?
  3. How many header rows to skip?
  4. Are there dates? In which format?
  5. What do missing values look like?
  6. Which columns should not become float? (IDs, phone, pincode…)
  7. What encoding? (try utf-8 → latin1 → windows-1252 → utf-8-sig)

6. Your turn – small practice task

Try to write the read_csv command for these situations:

A. File uses comma, but first 3 lines are comments starting with #

B. File has no header, columns should be named id, name, score, dept

C. File uses tab separator, dates are YYYY/MM/DD, missing = NULL

D. You only want columns order_id, amount, customer_city, order_date

Write your versions — then we can check together.

Where do you want to go next?

  • How to inspect & understand the data after reading
  • Dealing with messy column names (spaces, capitals, special chars)
  • Fixing wrong data types after reading
  • Handling very large CSVs (chunks, dask, filtering while reading)
  • Reading multiple CSV files at once (common pattern)
  • Common encoding nightmares & how to detect them

Tell me which one you want to dive into next — I’ll explain slowly with real examples. 😊

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *