Chapter 11: ML Testing a Perceptron
ML Testing a Perceptron” — the super important step after training where we check if our little neuron actually learned something useful or if it’s just memorizing the homework!
Last few classes we covered:
- What a perceptron is
- How to train it (adjust weights on mistakes until convergence)
Now: Testing = evaluating how well it works on new, unseen data. This is where real ML wisdom lives — training is like studying for exams, testing is like writing the actual board exam!
Step 1: Why Test a Perceptron? (The Big Why)
After training, the perceptron has final weights & bias. But:
- It might have overfit — perfect on training examples but useless on new ones (like rote learning without understanding)
- Or underfit — didn’t learn even the training patterns well
- Or just right — generalizes to new data
Goal of testing: Measure generalization (how it performs on data it never saw during training).
Key rule in 2026 ML: Never evaluate on training data alone — that’s cheating! Always use a separate test set.
Step 2: Standard Way to Prepare Data for Training + Testing
Split your data before training:
- Training set (70–80%): Used to adjust weights (fit)
- Test set (20–30%): Held out completely — only used after training to check performance
- Sometimes validation set too (for tuning learning rate, epochs), but for simple perceptron we often skip and just use train/test
In Python (sklearn way — most common):
|
0 1 2 3 4 5 6 7 8 |
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) |
- Train on X_train, y_train
- After done → test on X_test, y_test
Step 3: How to Actually Test (Evaluation Steps)
- Make predictions on test set using the trained perceptron
- Compare predictions vs true labels
- Calculate metrics
Common metrics for perceptron (binary classification):
- Accuracy — simplest: (correct predictions) / total test examples
- Confusion Matrix — table showing True Positives (TP), True Negatives (TN), False Positives (FP), False Negatives (FN)
- From confusion matrix → Precision, Recall, F1-score (especially if classes imbalanced)
Step 4: Real Example – Testing on AND Gate (But Bigger & Realistic)
AND gate is too small (only 4 points) — we can get 100% on train, but let’s pretend we have a bigger linearly separable dataset.
Better real example: Breast Cancer Wisconsin dataset (classic binary: malignant=1 vs benign=0)
- Features: 30 measurements (cell radius, texture…)
- Labels: 0 or 1
In practice (sklearn Perceptron):
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
from sklearn.datasets import load_breast_cancer from sklearn.linear_model import Perceptron from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score, confusion_matrix, classification_report # Load data data = load_breast_cancer() X = data.data y = data.target # 1 = malignant, 0 = benign # Split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Train model = Perceptron(max_iter=1000, eta0=0.1, random_state=42) model.fit(X_train, y_train) # Now TEST! y_pred = model.predict(X_test) # Make predictions on unseen test data # Evaluate accuracy = accuracy_score(y_test, y_pred) print("Test Accuracy:", round(accuracy * 100, 2), "%") # Confusion Matrix print("Confusion Matrix:") print(confusion_matrix(y_test, y_pred)) # Full report print(classification_report(y_test, y_pred, target_names=['Benign', 'Malignant'])) |
Typical output (real run numbers vary slightly):
|
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Test Accuracy: 92.4 % Confusion Matrix: [[102 5] ← Benign: 102 correct (TN), 5 wrong (FP) [ 8 56]] ← Malignant: 56 correct (TP), 8 wrong (FN) precision recall f1-score support Benign 0.93 0.95 0.94 107 Malignant 0.92 0.88 0.90 64 accuracy 0.92 171 |
Interpretation:
- Accuracy 92% → good for simple perceptron!
- Confusion matrix tells story:
- High TN & TP → most correct
- Some FN (missed cancers) → dangerous in real medicine → maybe tune or use better model
- Precision/Recall/F1 balance the trade-off
Step 5: What If Accuracy is Low on Test?
- Much lower than train → Overfitting (rare in single perceptron, but happens)
- Low on both → Underfitting → increase epochs, adjust learning rate, scale features (very important for perceptron!)
- Good on train, bad on test → Data issues or not linearly separable
Pro tip: Always scale features before perceptron (StandardScaler) — it hates different scales!
Step 6: Quick Summary Table (Copy in Notes!)
| Step | What You Do | Why It Matters | Example Metric (Breast Cancer) |
|---|---|---|---|
| Split data | train_test_split | Prevent cheating with seen data | 70% train, 30% test |
| Predict on test | model.predict(X_test) | Simulate real-world new patients | y_pred array |
| Accuracy | accuracy_score(y_test, y_pred) | Overall % correct | ~92% |
| Confusion Matrix | confusion_matrix(y_test, y_pred) | Shows exact mistakes (FP/FN costly?) | [[TN FP], [FN TP]] |
| Classification Report | classification_report | Precision, Recall, F1 per class | F1 Malignant ~0.90 |
Step 7: Teacher’s Final Words (2026)
Training = learning from examples (adjusting weights) Testing = honest exam on unseen questions → tells true intelligence!
In 2026: Even simple perceptrons get tested this way, but real apps use:
- Cross-validation (RepeatedStratifiedKFold) for small data
- More metrics (AUC-ROC for imbalance)
- Compare to baselines (always beat random guess!)
Understood the full cycle now? 🌟
Questions?
- Want full Python code for breast cancer perceptron + plot confusion matrix?
- How to visualize decision boundary after training/testing?
- Difference testing perceptron vs modern neural nets?
Just say — next class ready! 🚀
