Chapter 10: Cleaning Wrong Format

cleaning data that is in the wrong format in pandas.

Imagine we are sitting together with a laptop. I’m showing you a real messy dataset, explaining why these problems appear, how to recognize them quickly, what realistic choices we have, and how people actually clean “wrong format” data in everyday work.

What do we mean by “wrong format”?

“Wrong format” usually means one (or several) of these situations:

Numbers are stored as text → “₹1,499.00” or “1 250” or “N/A”
Dates are strings in many different formats → “15-03-2024”, “2024/04/22”, “March 5th 2024”, “05/03/24”
Percentages are text → “78.5%”, “82 %”, “0.785”
Phone numbers / IDs have inconsistent separators → “+91-98765 43210”, “9876543210”, “(987) 654-3210”
Booleans are written as text → “Yes”/”No”, “Y”/”N”, “True”/”true”/”TRUE”
Categories have typos, extra spaces, mixed case → “electronics “, “Electronics”, “ELECTRONICS”, “electro”
Mixed types in one column → some numbers, some text, some missing markers

These are very common — especially when data comes from Excel, CSV exports, web forms, APIs, or manual entry.

Realistic example — messy sales / customer data

Python

Let’s clean it step by step — exactly the way people do it in real projects.

Step-by-step cleaning of wrong formats

1. Quick inspection — always start here

Python

Typical suspects: amount, discount_pct, quantity, order_date, payment_mode, status

2. Clean numeric columns stored as text

amount

Python

discount_pct

Python

quantity — mixed words and numbers

Python

3. Fix dates — one of the hardest & most common problems

Python

4. Clean categorical / text columns

Python

5. Final quick validation

Python

Realistic cleaning order for wrong-format columns

Look — df.info(), df.head(), select_dtypes(‘object’)
String cleanup first — strip(), lower()/title(), replace known typos
Numeric columns — remove symbols, commas, currency, then pd.to_numeric(…, errors=’coerce’)
Dates — try pd.to_datetime(errors=’coerce’) → handle failures with custom parser or format list
Categories — standardize case, remove spaces, map synonyms
Percentages — remove % → divide by 100
Check again — info(), describe(), value_counts()

Small practice task for you

Clean these values into proper types:

Python

Try to turn them into:

float for amount & percentage
datetime for date

Show your code — then we can discuss the best approach together.

Where would you like to go next?

More advanced date parsing tricks & libraries
Dealing with very inconsistent categories (fuzzy matching)
Cleaning phone numbers & postal codes properly
Converting mixed-type columns safely
Automating format detection (when you have many columns)

Tell me what you want to focus on — I’ll explain with many examples. 😊

Languages

Database

Web Technologies

Wordpress Tutorial

Top Online Compilers

PHP Projects

CRUD Management
PHP Search
Blog/CMS
E-commerce Website
Event Management System
Online Learning Platform
Task Management System
Social Networking Site
Inventory Management System
Real Estate Listing Website
Job Portal
Discussion Forum
Online Quiz/Test Platform
File Sharing System
Travel Booking System
Expense Management System
Recipe Sharing Platform
Online Survey System
Library Management System
Health and Fitness Tracker
Online Marketplace

Home

About Us

Disclaimer

+91 9433 511 250

Email

info@bestwebteacher.com

Chapter 10: Cleaning Wrong Format

What do we mean by “wrong format”?

Realistic example — messy sales / customer data

Step-by-step cleaning of wrong formats

1. Quick inspection — always start here

2. Clean numeric columns stored as text

3. Fix dates — one of the hardest & most common problems

4. Clean categorical / text columns

5. Final quick validation

Realistic cleaning order for wrong-format columns

Small practice task for you

You may also like...

Leave a Reply Cancel reply

PANDAS Tutorial

Languages

Database

Web Technologies

Web Technologies

Wordpress Tutorial

Top Online Compilers

PHP Projects

WhatsApp

Email

Connect with us

Chapter 10: Cleaning Wrong Format

What do we mean by “wrong format”?

Realistic example — messy sales / customer data

Step-by-step cleaning of wrong formats

1. Quick inspection — always start here

2. Clean numeric columns stored as text

3. Fix dates — one of the hardest & most common problems

4. Clean categorical / text columns

5. Final quick validation

Realistic cleaning order for wrong-format columns

Small practice task for you

You may also like...

Chapter 22: Pandas Study Plan

Chapter 21: Pandas Syllabus

Chapter 20: Pandas Exercises

Leave a Reply Cancel reply

PANDAS Tutorial

Languages

Database

Web Technologies

Web Technologies

Wordpress Tutorial

Top Online Compilers

PHP Projects

WhatsApp

Email

Connect with us