Chapter 31: Example 2 Data

What is “Example 2 Data”?

Example 2 Data = the MNIST dataset — 70,000 grayscale images of handwritten digits (0 through 9), each 28×28 pixels.

  • 60,000 training images → used to teach the model
  • 10,000 test images → used to check how well it learned (never seen during training)

Each image has:

  • Input (xs) → a 28×28 grid of pixel values (0 = black, 255 = white)
  • Label (ys) → a number 0–9 telling “this is a 7” (or 3, 5, etc.)

The goal: look at a new 28×28 image → correctly say which digit it is.

Why This Dataset is Perfect for Example 2

  • Small enough to train in browser in 1–5 minutes (even on a laptop/phone)
  • Images are simple (single digit, centered, grayscale) → good for learning convolutions
  • Real handwriting variation → teaches generalization (different styles of “3”)
  • Everyone uses it → you can compare your accuracy (97–99%) to the world
  • Classic benchmark → if your model gets >95%, it’s working well

Visual Picture of MNIST Data (Imagine This)

Each image is a 28×28 grid — like a tiny 2.8 cm × 2.8 cm square paper.

Examples of what you see:

  • A “0” → smooth oval loop
  • A “1” → thin vertical line
  • A “3” → two curves connected
  • A “8” → two stacked circles

Pixel values:

  • 0 = completely black background
  • 255 = bright white ink
  • Values in between = shades of gray

In TensorFlow.js, we normalize them → divide by 255 → pixels become 0.0 to 1.0 (easier for neural nets).

How the Data Looks in Code (TensorFlow.js Style)

Official loading (what most tutorials use):

JavaScript

In most simple tutorials (including ours), we use:

JavaScript

This gives a dataset iterator → perfect for streaming large data in browser.

Real Numbers – A Few Sample Images (What They Represent)

Imagine these 28×28 grids (I’ll describe a few typical ones):

  1. Digit 0 (common example)
    • Mostly black (0), white loop in center
    • Label = 0
  2. Digit 7 (often tricky because some people write it with cross-bar)
    • Vertical line with top horizontal slash
    • Label = 7
  3. Digit 4 (some write open-top, some closed)
    • Vertical line + horizontal cross + diagonal
    • Label = 4

The dataset has lots of variation:

  • Thick vs thin strokes
  • Slanted vs straight
  • Small vs large size inside the 28×28 box
  • Some noise / imperfect writing

That’s why the model must learn invariant patterns (shape of “3” is same even if rotated a bit or thicker).

Why We Reshape & Normalize in Code

JavaScript
  • Original: flat vector of 784 numbers (28×28 = 784)
  • Reshape → [28, 28, 1] → height × width × channels (grayscale = 1 channel)
  • Divide by 255 → pixels 0–1 → neural nets love normalized data (gradients stable)

Quick Summary Table (Keep This Handy!)

Part Value / Description Why It Matters
Total images 70,000 (60k train + 10k test) Enough to learn, small for browser
Image size 28 × 28 pixels, grayscale (1 channel) Tiny → fast training
Pixel range (raw) 0 (black) to 255 (white) Classic grayscale
Pixel range (used) 0.0 to 1.0 after /255 Good for neural nets
Labels 0 to 9 (one per image) 10-class classification
Format in tf.js tf.data.mnist() → iterator of {xs, ys} Memory efficient, streams data

Final Teacher Words

Example 2 Data = the MNIST handwritten digit dataset — 70,000 small 28×28 grayscale images of digits 0–9 with correct labels.

  • Training set → teaches the model
  • Test set → checks if it really learned
  • Pixels normalized 0–1 → ready for CNN
  • Labels 0–9 → multi-class problem

This dataset is the “Hello World” of image classification — used since 1998, still the first real image dataset every student sees in 2026.

Once you train on MNIST and get 97–99% accuracy in your browser, you know you can build real vision apps — photo tagging, digit reading in banking, crop disease detection from drone photos, etc.

Got the full picture of Example 2 Data now? 🔥

Want next?

  • Full code to visualize some MNIST images in browser?
  • Show confusion matrix after training to see which digits confuse the model?
  • Change to Telugu handwritten digits (if dataset exists) or custom data?

Just tell me — next class is ready! 🚀

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *