Chapter 10: MLTraining a Perceptron
ML Training a Perceptron” in Machine Learning. This is the exciting part where the perceptron actually learns from mistakes, just like a student practicing sums until they get them right.
Last time we saw what a perceptron is (simple neuron: inputs × weights + bias → step function → 0 or 1 output). Today we zoom into how we train it — step-by-step, with numbers, the famous AND gate example, and why it works (when the data allows it).
Think of training like teaching a child “AND logic”:
- “Only say YES if both toys are present.”
- Child guesses wrong → you gently correct → child adjusts thinking.
The perceptron does exactly that — automatically!
Step 1: Quick Recap – What We Need to Train
We need:
- Training data: Examples with correct labels (supervised)
- Initial weights (w₁, w₂, …) and bias (b) → usually start random or zeros
- Learning rate (α or η) → small number like 0.1 or 0.5 (controls how big each correction is)
- Activation: Classic perceptron uses hard step: 1 if z ≥ 0, else 0 (z = weighted sum + bias)
- Update rule (the heart of learning!)
Step 2: The Perceptron Learning Rule (Update Formula)
For each training example (x, true y):
- Compute prediction ŷ z = w₁x₁ + w₂x₂ + … + b ŷ = 1 if z ≥ 0 else 0
- Compute error: error = y – ŷ (If correct → error = 0, no change)
- Only if wrong, update: New weight wᵢ = old wᵢ + α × error × xᵢNew bias b = old b + α × error
This is the delta rule or perceptron rule.
- If predicted 0 but should be 1 (error = +1) → add positive amount to weights (pull toward firing)
- If predicted 1 but should be 0 (error = -1) → subtract (push away from firing)
- Learning rate α keeps steps small → avoids overshooting
Repeat over all examples many times (epochs) until no more errors (or max epochs reached).
Beautiful guarantee (Rosenblatt, 1958): If data is linearly separable (straight line/plane can separate classes perfectly), perceptron will converge in finite steps — no matter starting weights!
Step 3: Detailed Numerical Example – Training for AND Gate
AND gate truth table (our training data):
| x₁ (A) | x₂ (B) | True y (AND) |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |
Start with:
- Weights: w₁ = 0.0, w₂ = 0.0 (simple zero init)
- Bias: b = 0.0
- Learning rate α = 0.5 (common choice)
We’ll do epochs (full passes over 4 examples).
Epoch 1 – Round 1 (first example: x = [0,0], y=0)
z = 00 + 00 + 0 = 0 ≥ 0 → ŷ = 1 Error = 0 – 1 = -1 (wrong!)
Update: w₁ = 0 + 0.5 × (-1) × 0 = 0 w₂ = 0 + 0.5 × (-1) × 0 = 0 b = 0 + 0.5 × (-1) = -0.5
New: w₁=0, w₂=0, b=-0.5
Round 2 (x=[0,1], y=0)
z = 00 + 10 + (-0.5) = -0.5 < 0 → ŷ=0 Error = 0 – 0 = 0 → no change
Round 3 (x=[1,0], y=0)
z = 10 + 00 -0.5 = -0.5 < 0 → ŷ=0 Error=0 → no change
Round 4 (x=[1,1], y=1)
z = 10 + 10 -0.5 = -0.5 < 0 → ŷ=0 Error = 1 – 0 = +1 (wrong!)
Update: w₁ = 0 + 0.5 × 1 × 1 = 0.5 w₂ = 0 + 0.5 × 1 × 1 = 0.5 b = -0.5 + 0.5 × 1 = 0.0
End Epoch 1: w₁=0.5, w₂=0.5, b=0.0
Epoch 2 – Quick check (all 4 examples again)
- [0,0]: z=0+0+0=0 ≥0 → ŷ=1 (but y=0) error=-1 Update: w₁=0.5 +0.5*(-1)0=0.5 w₂=0.5 +0.5(-1)0=0.5 b=0 +0.5(-1)= -0.5
- [0,1]: z=00.5 +10.5 -0.5 = 0 ≥0 → ŷ=1 (wrong, y=0) error=-1 Update: w₁=0.5 +0.5*(-1)0=0.5 w₂=0.5 +0.5(-1)1=0.0 b=-0.5 +0.5(-1)= -1.0
- [1,0]: z=10.5 +00 -1.0 = -0.5 <0 → ŷ=0 correct → no change
- [1,1]: z=10.5 +10 -1.0 = -0.5 <0 → ŷ=0 (wrong) error=+1 Update: w₁=0.5 +0.511=1.0 w₂=0.0 +0.511=0.5 b=-1.0 +0.5*1= -0.5
You see? Weights keep adjusting on mistakes.
Continue a few more epochs (in practice 5–20 for AND/OR) → eventually it converges to something like:
w₁ ≈ 1.0, w₂ ≈ 1.0, b ≈ -1.5 (Then z for [1,1]=1+1-1.5=0.5 ≥0 → 1; all others <0 → 0)
Perfect separation!
Step 4: Quick Summary Table (Memorize This!)
| Step in Training Loop | What Happens | Example Change (from wrong prediction) |
|---|---|---|
| 1. Forward pass | z = Σ(wᵢ xᵢ) + b → ŷ = step(z) | ŷ=0 but y=1 → error=+1 |
| 2. Error calc | error = y – ŷ | +1 or -1 |
| 3. Update (only if error ≠ 0) | wᵢ ← wᵢ + α × error × xᵢ b ← b + α × error | Pulls decision boundary toward correct class |
| 4. Repeat epochs | Until 0 errors or max epochs | Converges if linearly separable |
Step 5: Teacher’s Final Words (2026 View)
Training a perceptron = online learning by mistake correction — very efficient for simple linear problems.
In 2026 we rarely train single perceptrons from scratch (sklearn does it instantly), but understanding this process helps you debug bigger neural nets (backprop is basically fancy version of this rule!).
It fails on non-linear like XOR — that’s why we stack layers (MLP) and use better activations (sigmoid, ReLU).
Got it? 🌟
Want next?
- Python code to train perceptron for AND/OR from scratch?
- Show convergence plot (decision boundary moving)?
- How modern libraries (sklearn Perceptron) do it?
Just ask — class is open! 🚀
