Chapter 9: Cleaning Empty Cells

How to clean / handle empty cells (missing values) in pandas.

Imagine we are sitting together and I’m showing you real data on my screen. We will go slowly, understand why things are missing, see what options we really have, and learn the patterns most people actually use in real projects.

1. First — what do we actually mean by “empty cells”?

In pandas, “empty” usually means one of these:

What you see Internal value in pandas Name isna() returns True?
(nothing) NaN Not a Number (float) Yes
None None Python None Yes
<NA> pd.NA Nullable integer/string Yes
empty string “” “” Empty string No
“NA”, “N/A”, “-“, “null”, “missing” string Text that means missing No

Important takeaway: NaN, None, and pd.NA are considered missing by pandas (isna() / isnull()). Empty strings and words like “NA” / “-” are not missing — they are normal strings.

2. Let’s create a realistic messy table with different kinds of “empties”

Python

3. Step 1 – Always start by finding out where the missing values are

Python

Common output for our table:

text

4. The 7 most realistic ways people handle missing values

# Method When people use it Code example Destructive?
1 Drop rows Very few missing, row is useless without data df.dropna() Yes
2 Drop columns Column is almost all missing df.dropna(axis=1, thresh=…) Yes
3 Fill with fixed value You know what missing should mean (0, ‘Unknown’) df[‘city’] = df[‘city’].fillna(‘Unknown’) No
4 Fill with mean / median Numeric column, missing looks random df[‘salary’] = df[‘salary’].fillna(df[‘salary’].median()) No
5 Fill with group average Missing depends on category (dept, city, …) Groupby + transform No
6 Forward / backward fill Time series, last known value is reasonable df[‘rating’].fillna(method=’ffill’) No
7 Leave as is / mark explicitly You want to keep info that value was missing df[‘salary_missing’] = df[‘salary’].isna() No

5. Realistic cleaning walkthrough — column by column

Column: name

Python

Column: age

Python

Column: city

Python

Column: salary (very common situation!)

Python

Column: rating

Python

Column: joined (date)

Python

Column: active

Python

6. Quick reference – the commands people use most

Python

7. Mini practice task for you

Take this small messy series:

Python

Clean it so that:

  • empty string ” → NaN
  • ‘N/A’ → NaN
  • ‘-‘ → NaN
  • All missing values filled with median of the non-missing numbers

Try writing the code — then come back and we’ll compare.

Where do you want to go next?

  • More advanced group-based imputation (KNN, regression, etc.)
  • How to deal with very high percentage of missing values
  • Visualizing missing values (missingno library)
  • Cleaning mixed types in the same column
  • Realistic strategies when you don’t know what to fill

Just tell me which direction you want to continue — I’ll keep explaining slowly with examples. 😊

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *