Chapter 6: Pandas Read JSON

What is JSON and why do we read it with pandas?

JSON (JavaScript Object Notation) is one of the most common data formats today — especially when:

  • Data comes from APIs (REST APIs, web services)
  • Data is exported from NoSQL databases (MongoDB, Firebase, etc.)
  • Data is stored in modern log systems, configuration files, scraped data, etc.

JSON can be:

  • One big object { … }
  • An array of objects[ {…}, {…}, {…} ] ← this is the most common case for pandas
  • Nested objects, arrays inside objects, etc.

pandas is excellent at turning the array-of-objects style into a clean DataFrame.

1. The most common & clean case – array of objects

File users.json:

JSON

Reading it:

Python

Result:

text

→ pandas automatically turns the array into rows, keys into columns

2. Most useful & realistic read_json() options

Python

Very common modern version (2025 style):

Python

3. Important variations of JSON structure

Case 1: JSON Lines format (very common in logs & big data)

File events.jsonl (one object per line):

JSON

Read it:

Python

→ lines=True is the key here

Case 2: Single object with nested data (split / index / columns orient)

File stats.json:

JSON
Python

→ orient=”split” is specifically for this structure

Case 3: Columns as top-level keys

File by_city.json:

JSON
Python

4. Common real-world problems & fixes

Problem 1: Dates are strings

Python

Problem 2: Nested objects / lists inside cells

JSON

→ pandas keeps them as dict/list — you need to normalize/flatten later

Python

Problem 3: Very large JSON file

Python

5. Quick reference – most useful read_json() patterns

Situation Command / Option
Array of objects (most common) pd.read_json(“file.json”)
One JSON object per line pd.read_json(“file.jsonl”, lines=True)
Dates should be parsed convert_dates=[“date_col1”, “date_col2”]
Keep IDs as string dtype={“id”: “string”, “phone”: “string”}
Nested data → flat table pd.json_normalize(data)
JSON with “index”, “columns”, “data” orient=”split”
Columns are top-level keys orient=”index”
Large file chunksize=100000

6. Small realistic practice task

Create a file products.json with this content (or copy-paste into a text file):

JSON

Try these:

  1. Read it normally
  2. Read it and convert price to float, stock to nullable int
  3. Add a column low_stock = stock < 20
  4. Sort by price descending

Then try the same file but saved as JSON Lines (one object per line).

Where would you like to go next?

  • How to flatten nested JSON properly with json_normalize
  • Combining multiple JSON files
  • Dealing with deeply nested data (real API examples)
  • Converting JSON → DataFrame → clean & analysis
  • Common API → pandas workflow
  • Handling invalid / broken JSON files

Just tell me which direction feels most useful right now — I’ll continue with detailed examples and explanations. 😊

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *