Understanding CNNs

How Computers Learn to See

A High School Guide to Neural Networks

What is Artificial Intelligence?

Teaching computers to learn from examples, just like how you learned to recognize objects when you were young.

The Challenge

How do you teach a computer to recognize handwritten numbers?

The Traditional Way

Write millions of rules:
"If the pixels look like this... then it's a 7"

Nearly impossible!

The Better Way: Machine Learning

Let the computer learn the patterns itself!

Show it thousands of examples and let it figure out the rules.

Neural Networks

Inspired by your brain!

Your Brain

Billions of neurons connected together, passing signals

Artificial Neural Network

Simulated neurons (math functions) connected, passing numbers

Convolutional Neural Network (CNN)

A special type of neural network designed for images

→ →

Image → CNN → Prediction

How CNNs Work: Step 1

Input Layer

The image is converted to numbers (pixels)

Each pixel has a value from 0 (black) to 255 (white)

MNIST images are 28×28 pixels = 784 numbers

Problems that require colour information need even more data to represent all the RGB values

How CNNs Work: Step 2

Convolutional Layers

The network looks for patterns in the image

First layer: detects edges and lines
Second layer: detects curves and shapes
Deeper layers: detects complex patterns

How CNNs Work: Step 3

Pooling Layers

Shrinks the image to focus on important features

Reduces computation while keeping the most important information

How CNNs Work: Step 4

Output Layer

Makes the final decision

Outputs 10 numbers (one for each digit 0-9)

The highest number is the prediction!

Training the Network

1. Show the network an image

2. Let it make a guess

3. Tell it if it's right or wrong

4. The network adjusts itself to do better next time

5. Repeat thousands of times!

Our Three Models

In the app, you'll test three different trained models:

Baseline Model

Training: 50,000 images for 10 epochs

What it teaches: This is a well-balanced model with proper training

Expected performance: Good generalization to new handwriting

Augmented Model

Training: Same data + rotations, shifts, zoom variations

What it teaches: Data augmentation makes the model more robust

Expected performance: Better at handling different handwriting styles and angles

Overfitted Model

Training: Only 1,000 images for 50 epochs

What it teaches: Demonstrates overfitting—when a model memorizes instead of learns

Expected performance: Poor generalization, may struggle with your handwriting

What is Overfitting?

Good Learning

Understanding the general pattern

Overfitting

Memorizing the specific examples without understanding

Real World Analogy

Studying for a Test

Good learning: Understanding the concepts so you can solve any problem

Overfitting: Memorizing only the practice problems, failing when test questions are different

Teaching a Child to Read

Good learning: Reading 100 different books—learns diverse stories

Overfitting: Reading the same book 100 times—struggles with new stories

Data Augmentation

Creating variations of training data to make models more robust

Rotation (±10°)
Shifting position (±10%)
Zooming in/out (±10%)

This helps the model learn to recognize digits in many different styles!

AI Explainability

Understanding why AI makes decisions

Neural networks are often called "black boxes"

We know what goes in and what comes out, but not always why

Why Does Explainability Matter?

Healthcare: Doctors need to know why AI diagnosed a disease
Trust: We need to verify AI isn't using the wrong features
Fairness: Ensure AI isn't biased or discriminatory
Debugging: Find and fix errors in the model

Grad-CAM

Gradient-weighted Class Activation Mapping

A technique that shows which parts of an image the CNN focused on when making its prediction

Think of it as highlighting what the AI is "looking at"

How Grad-CAM Works

1. The CNN makes a prediction

2. We trace back through the network to see which pixels contributed most

3. Create a heatmap showing importance

4. Overlay the heatmap on the original image

Blue/Cyan = Low importance
Green/Yellow = Medium importance
Red = High importance

Grad-CAM in the App

When you draw a digit, you'll see:

Your Drawing

The digit you drew on the canvas

Model Attention

Heatmap showing where the model focused

Compare how different models "look at" the same digit!

What Grad-CAM Reveals

Good models focus on the actual digit strokes
Overfitted models might focus on irrelevant background pixels
Compare models to see how data augmentation changes attention
Build trust by verifying the model is "looking" at the right things

Why This Matters

CNNs are used everywhere:

Face ID on your phone
Self-driving cars detecting objects
Medical imaging finding diseases
Social media filters and effects
Google Lens identifying objects

Questions to Explore

Which model performs best on your handwriting?
Can you trick the overfitted model?
Does the augmented model handle tilted digits better?
What happens if you draw sloppy vs. neat?

Thank You!

Now you understand the basics of CNNs!

Created by Cole Corbett & Chance Page