Logistic Regression and Classification

Introduction: From Continuous to Categorical

Imagine you're building an email spam filter. Unlike predicting house prices (a number), you need to predict a category: spam or not spam. You can't use linear regression here – it predicts continuous values like 127.3, but you need a probability between 0 and 1.

Logistic regression solves this by applying a special function (the sigmoid) that "squashes" any real number into the range [0, 1], converting it to a probability.

Key Insight: Despite its name, logistic regression is a classification algorithm! It's called "regression" for historical reasons, but it outputs probabilities that we convert to class predictions.

Learning Objectives

Understand binary and multi-class classification
Derive the logistic regression model from first principles
Master the sigmoid function and decision boundaries
Implement binary cross-entropy loss
Train logistic regression with gradient descent
Visualize decision boundaries interactively
Extend to multi-class classification

1. From Regression to Classification

The Classification Problem

In classification, we predict discrete categories (classes):

Type	Classes	Examples
Binary	2 classes	Spam/Not Spam, Disease/Healthy, Cat/Dog
Multi-class	(k > 2) classes	Digit Recognition (0-9), Image Classification (cat/dog/bird)
Multi-label	Multiple per sample	Movie genres, Medical diagnoses

This lesson focuses on binary classification (2 classes: 0 and 1).

Loading Python runtime...

2. The Sigmoid Function: From Scores to Probabilities

Why Not Linear Regression?

If we try linear regression for classification: [ \hat{y} = \mathbf{w}^T \mathbf{x} ]

Problems:

❌ Output can be any value (e.g., -10, 5, 127)
❌ We need probabilities in [0, 1]
❌ Hard to interpret (\hat{y} = 2.7) as a class

The Sigmoid (Logistic) Function

[ \sigma(z) = \frac{1}{1 + e^{-z}} ]

Properties:

Maps any real number to [0, 1]
(\sigma(0) = 0.5) (decision boundary)
(\sigma(z) \to 1) as (z \to +\infty)
(\sigma(z) \to 0) as (z \to -\infty)
Smooth and differentiable everywhere

Loading Python runtime...

The Logistic Regression Model

[ \hat{y} = P(y=1 | \mathbf{x}) = \sigma(\mathbf{w}^T \mathbf{x}) = \frac{1}{1 + e^{-\mathbf{w}^T \mathbf{x}}} ]

Interpretation:

(\hat{y}) = probability that sample belongs to class 1
(1 - \hat{y}) = probability of class 0
Decision rule: predict class 1 if (\hat{y} > 0.5), else class 0

3. Decision Boundaries

Visualizing Classification

The decision boundary is where the model is uncertain: (P(y=1) = 0.5)

For logistic regression: (\mathbf{w}^T \mathbf{x} = 0)

Loading interactive component...

Linear Decision Boundaries

Logistic regression creates linear decision boundaries in feature space:

Loading Python runtime...

4. Binary Cross-Entropy Loss

Why Not MSE?

MSE is non-convex for logistic regression – multiple local minima make optimization hard.

Instead, we use Binary Cross-Entropy (Log Loss):

[ J(\mathbf{w}) = -\frac{1}{n} \sum_{i=1}^{n} \left[ y_i \log(\hat{y}_i) + (1-y_i) \log(1-\hat{y}_i) \right] ]

Where (\hat{y}_i = \sigma(\mathbf{w}^T \mathbf{x}_i))

Intuition:

If (y_i = 1): loss = (-\log(\hat{y}_i)) → high loss if (\hat{y}_i) is small
If (y_i = 0): loss = (-\log(1-\hat{y}_i)) → high loss if (\hat{y}_i) is large

Loading Python runtime...

Gradient for Logistic Regression

The gradient of cross-entropy with respect to weights is:

[ \nabla_{\mathbf{w}} J = \frac{1}{n} \mathbf{X}^T (\hat{\mathbf{y}} - \mathbf{y}) ]

Amazing fact: Same form as linear regression! Just replace predictions with sigmoid outputs.

5. Training Logistic Regression

Interactive Model Training

Loading interactive component...

Implementation from Scratch

Loading Python runtime...

6. Multi-Class Classification

One-vs-Rest (OvR)

For (k) classes, train (k) binary classifiers:

Classifier 1: Class 1 vs. {2, 3, ..., k}
Classifier 2: Class 2 vs. {1, 3, ..., k}
...

Prediction: Choose class with highest probability.

Softmax Regression (Multinomial Logistic)

Direct extension to multi-class:

[ P(y = k | \mathbf{x}) = \frac{e^{\mathbf{w}k^T \mathbf{x}}}{\sum{j=1}^{K} e^{\mathbf{w}_j^T \mathbf{x}}} ]

This is the softmax function – generalizes sigmoid to (k) classes.

Loading Python runtime...

Key Takeaways

✓ Logistic Regression: Classification algorithm using sigmoid function

✓ Sigmoid Function: (\sigma(z) = \frac{1}{1+e^{-z}}) maps real numbers to [0, 1]

✓ Model: (P(y=1|\mathbf{x}) = \sigma(\mathbf{w}^T \mathbf{x}))

✓ Loss: Binary cross-entropy (convex, probabilistically motivated)

✓ Training: Gradient descent with same update rule as linear regression

✓ Decision Boundary: Linear in feature space (where (\mathbf{w}^T \mathbf{x} = 0))

✓ Multi-Class: One-vs-Rest or Softmax regression

Practice Problems

Problem 1: Implement Sigmoid

Loading Python runtime...

Problem 2: Compute Cross-Entropy Loss

Loading Python runtime...

Problem 3: Decision Boundary Interpretation

Given (\mathbf{w} = [2, -1, 3]) (including bias), what is the decision boundary equation?

Loading Python runtime...

Next Steps

You've mastered binary classification with logistic regression! Next:

Lesson 5: Regularization – preventing overfitting with L1/L2 penalties
Lesson 6: Decision Trees – non-linear decision boundaries

Logistic regression is used everywhere: web click prediction, medical diagnosis, credit scoring, and more!

Classical Machine Learning: Supervised Learning Foundations

Logistic Regression and Classification

Introduction: From Continuous to Categorical

Learning Objectives

1. From Regression to Classification

The Classification Problem

2. The Sigmoid Function: From Scores to Probabilities

Why Not Linear Regression?

The Sigmoid (Logistic) Function

The Logistic Regression Model

3. Decision Boundaries

Visualizing Classification

Linear Decision Boundaries

4. Binary Cross-Entropy Loss

Why Not MSE?

Gradient for Logistic Regression

5. Training Logistic Regression

Interactive Model Training

Implementation from Scratch

6. Multi-Class Classification

One-vs-Rest (OvR)

Softmax Regression (Multinomial Logistic)

Key Takeaways

Practice Problems

Problem 1: Implement Sigmoid

Problem 2: Compute Cross-Entropy Loss

Problem 3: Decision Boundary Interpretation

Next Steps

Further Reading