Neural Networks Fundamentals: Perceptrons to Backpropagation

Introduction: The Brain-Inspired Revolution

Traditional machine learning algorithms like linear regression or decision trees follow rigid rules. But what if we could build systems that learn complex patterns, just like our brains?

Neural networks are the foundation of modern AI. They power:

  • Computer Vision: Face recognition, self-driving cars
  • Natural Language: ChatGPT, translation systems
  • Recommendation: Netflix, Spotify, YouTube
  • Healthcare: Disease diagnosis, drug discovery

Key Insight: Neural networks are universal function approximators – given enough neurons and data, they can learn almost any pattern!

Learning Objectives

  • Understand biological inspiration for neural networks
  • Master forward propagation and backpropagation
  • Implement neural networks from scratch
  • Choose appropriate activation functions
  • Understand gradient descent optimization
  • Diagnose and fix training issues
  • Apply neural networks to real problems

1. Biological Inspiration

The Neuron

A biological neuron:

  1. Receives signals through dendrites
  2. Processes them in the cell body
  3. Fires output through the axon if threshold exceeded
  4. Connects to other neurons via synapses

The Artificial Neuron

An artificial neuron (perceptron) mimics this:

  1. Receives inputs x1,x2,...,xnx_1, x_2, ..., x_n
  2. Computes weighted sum: z=w1x1+w2x2+...+wnxn+bz = w_1x_1 + w_2x_2 + ... + w_nx_n + b
  3. Applies activation function: a=f(z)a = f(z)
  4. Outputs aa to next layer

Mathematical Formula:

a=f(i=1nwixi+b)=f(wTx+b)a = f(\sum_{i=1}^{n} w_ix_i + b) = f(w^Tx + b)

Where:

  • ww = weights (synaptic strengths)
  • bb = bias (threshold)
  • ff = activation function (firing rule)

2. Neural Network Architecture

Layers

A neural network consists of layers:

  1. Input Layer: Receives raw features
  2. Hidden Layers: Learn intermediate representations
  3. Output Layer: Produces predictions

Deep Learning = neural networks with multiple hidden layers (2+).

Network Topology

Common notation: [3, 4, 4, 2] means:

  • 3 input neurons
  • 2 hidden layers with 4 neurons each
  • 2 output neurons

Interactive Exploration

Loading interactive component...

Try this:

  1. Click on neurons to see their activations
  2. Adjust the architecture – add/remove layers
  3. Watch forward propagation flow through the network
  4. See how weights affect the output

3. Forward Propagation

The Process

Forward propagation computes the network's output:

For each layer ll:

  1. Compute weighted sum: z[l]=W[l]a[l1]+b[l]z^{[l]} = W^{[l]}a^{[l-1]} + b^{[l]}
  2. Apply activation: a[l]=f(z[l])a^{[l]} = f(z^{[l]})

Example (2-layer network):

Layer 1 (Hidden):

z[1]=W[1]x+b[1]z^{[1]} = W^{[1]}x + b^{[1]} a[1]=σ(z[1])a^{[1]} = \sigma(z^{[1]})

Layer 2 (Output):

z[2]=W[2]a[1]+b[2]z^{[2]} = W^{[2]}a^{[1]} + b^{[2]} y^=σ(z[2])\hat{y} = \sigma(z^{[2]})

Implementation from Scratch

Loading Python runtime...


4. Activation Functions

Why Non-linearity?

Without activation functions, stacking layers is pointless:

a[2]=W[2](W[1]x+b[1])+b[2]=Wx+ba^{[2]} = W^{[2]}(W^{[1]}x + b^{[1]}) + b^{[2]} = W'x + b'

This is just a linear transformation! Non-linear activations enable learning complex patterns.

Common Activation Functions

1. Sigmoid

σ(z)=11+ez\sigma(z) = \frac{1}{1 + e^{-z}}
  • Range: (0, 1)
  • Use: Binary classification output
  • Problem: Vanishing gradients

2. Tanh

tanh(z)=ezezez+ez\tanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}
  • Range: (-1, 1)
  • Use: Hidden layers (zero-centered)
  • Problem: Vanishing gradients

3. ReLU (Rectified Linear Unit)

ReLU(z)=max(0,z)\text{ReLU}(z) = \max(0, z)
  • Range: [0, ∞)
  • Use: Hidden layers (default choice!)
  • Advantages: Fast, no vanishing gradient
  • Problem: Dead neurons

4. Leaky ReLU

Leaky ReLU(z)=max(0.01z,z)\text{Leaky ReLU}(z) = \max(0.01z, z)
  • Solves dead ReLU problem

Visualization

Loading Python runtime...


5. Backpropagation: How Networks Learn

The Challenge

We want to minimize loss by adjusting weights:

minW,bL(W,b)\min_{W,b} \mathcal{L}(W, b)

But how to compute LW[l]\frac{\partial \mathcal{L}}{\partial W^{[l]}} for each layer?

The Solution: Chain Rule

Backpropagation applies the chain rule to efficiently compute gradients:

LW[l]=La[l]a[l]z[l]z[l]W[l]\frac{\partial \mathcal{L}}{\partial W^{[l]}} = \frac{\partial \mathcal{L}}{\partial a^{[l]}} \cdot \frac{\partial a^{[l]}}{\partial z^{[l]}} \cdot \frac{\partial z^{[l]}}{\partial W^{[l]}}

Backward pass:

  1. Compute output error: δ[L]=a[L]y\delta^{[L]} = a^{[L]} - y
  2. Propagate backward: δ[l]=(W[l+1])Tδ[l+1]f(z[l])\delta^{[l]} = (W^{[l+1]})^T\delta^{[l+1]} \odot f'(z^{[l]})
  3. Compute gradients: LW[l]=δ[l](a[l1])T\frac{\partial \mathcal{L}}{\partial W^{[l]}} = \delta^{[l]}(a^{[l-1]})^T

Implementation

Loading Python runtime...


6. Training Dynamics

Gradient Descent

Update rule:

W:=WαLWW := W - \alpha \frac{\partial \mathcal{L}}{\partial W}

Where α\alpha is the learning rate.

Learning Rate Selection

  • Too small: Slow training, may get stuck
  • Too large: Overshoots minimum, unstable
  • Just right: Smooth, fast convergence

Common Issues

1. Vanishing Gradients

  • Problem: Gradients become tiny in deep networks
  • Solution: Use ReLU, better initialization, batch normalization

2. Exploding Gradients

  • Problem: Gradients become huge
  • Solution: Gradient clipping, careful initialization

3. Overfitting

  • Problem: Memorizes training data
  • Solution: Regularization (dropout, L2), more data

7. Real-World Application: Classification

Loading Python runtime...


Key Takeaways

Neural networks are universal function approximators inspired by the brain

Architecture: Input → Hidden layers → Output

Forward propagation: Compute predictions layer by layer

Backpropagation: Efficiently compute gradients using chain rule

Activation functions: Add non-linearity (prefer ReLU for hidden layers)

Training: Gradient descent optimizes weights to minimize loss

Challenges: Vanishing/exploding gradients, overfitting, hyperparameter tuning


What's Next?

Next lesson: Deep Learning Basics – building deeper networks, regularization techniques, and advanced optimizers!