Logistic Regression and Classification

Introduction: From Continuous to Categorical

Imagine you're building an email spam filter. Unlike predicting house prices (a number), you need to predict a category: spam or not spam. You can't use linear regression here – it predicts continuous values like 127.3, but you need a probability between 0 and 1.

Logistic regression solves this by applying a special function (the sigmoid) that "squashes" any real number into the range [0, 1], converting it to a probability.

Key Insight: Despite its name, logistic regression is a classification algorithm! It's called "regression" for historical reasons, but it outputs probabilities that we convert to class predictions.

Learning Objectives

  • Understand binary and multi-class classification
  • Derive the logistic regression model from first principles
  • Master the sigmoid function and decision boundaries
  • Implement binary cross-entropy loss
  • Train logistic regression with gradient descent
  • Visualize decision boundaries interactively
  • Extend to multi-class classification

1. From Regression to Classification

The Classification Problem

In classification, we predict discrete categories (classes):

TypeClassesExamples
Binary2 classesSpam/Not Spam, Disease/Healthy, Cat/Dog
Multi-class(k > 2) classesDigit Recognition (0-9), Image Classification (cat/dog/bird)
Multi-labelMultiple per sampleMovie genres, Medical diagnoses

This lesson focuses on binary classification (2 classes: 0 and 1).

Loading Python runtime...


2. The Sigmoid Function: From Scores to Probabilities

Why Not Linear Regression?

If we try linear regression for classification: [ \hat{y} = \mathbf{w}^T \mathbf{x} ]

Problems:

  • ❌ Output can be any value (e.g., -10, 5, 127)
  • ❌ We need probabilities in [0, 1]
  • ❌ Hard to interpret (\hat{y} = 2.7) as a class

The Sigmoid (Logistic) Function

[ \sigma(z) = \frac{1}{1 + e^{-z}} ]

Properties:

  • Maps any real number to [0, 1]
  • (\sigma(0) = 0.5) (decision boundary)
  • (\sigma(z) \to 1) as (z \to +\infty)
  • (\sigma(z) \to 0) as (z \to -\infty)
  • Smooth and differentiable everywhere

Loading Python runtime...

The Logistic Regression Model

[ \hat{y} = P(y=1 | \mathbf{x}) = \sigma(\mathbf{w}^T \mathbf{x}) = \frac{1}{1 + e^{-\mathbf{w}^T \mathbf{x}}} ]

Interpretation:

  • (\hat{y}) = probability that sample belongs to class 1
  • (1 - \hat{y}) = probability of class 0
  • Decision rule: predict class 1 if (\hat{y} > 0.5), else class 0

3. Decision Boundaries

Visualizing Classification

The decision boundary is where the model is uncertain: (P(y=1) = 0.5)

For logistic regression: (\mathbf{w}^T \mathbf{x} = 0)

Loading interactive component...

Linear Decision Boundaries

Logistic regression creates linear decision boundaries in feature space:

Loading Python runtime...


4. Binary Cross-Entropy Loss

Why Not MSE?

MSE is non-convex for logistic regression – multiple local minima make optimization hard.

Instead, we use Binary Cross-Entropy (Log Loss):

[ J(\mathbf{w}) = -\frac{1}{n} \sum_{i=1}^{n} \left[ y_i \log(\hat{y}_i) + (1-y_i) \log(1-\hat{y}_i) \right] ]

Where (\hat{y}_i = \sigma(\mathbf{w}^T \mathbf{x}_i))

Intuition:

  • If (y_i = 1): loss = (-\log(\hat{y}_i)) → high loss if (\hat{y}_i) is small
  • If (y_i = 0): loss = (-\log(1-\hat{y}_i)) → high loss if (\hat{y}_i) is large

Loading Python runtime...

Gradient for Logistic Regression

The gradient of cross-entropy with respect to weights is:

[ \nabla_{\mathbf{w}} J = \frac{1}{n} \mathbf{X}^T (\hat{\mathbf{y}} - \mathbf{y}) ]

Amazing fact: Same form as linear regression! Just replace predictions with sigmoid outputs.


5. Training Logistic Regression

Interactive Model Training

Loading interactive component...

Implementation from Scratch

Loading Python runtime...


6. Multi-Class Classification

One-vs-Rest (OvR)

For (k) classes, train (k) binary classifiers:

  • Classifier 1: Class 1 vs. {2, 3, ..., k}
  • Classifier 2: Class 2 vs. {1, 3, ..., k}
  • ...

Prediction: Choose class with highest probability.

Softmax Regression (Multinomial Logistic)

Direct extension to multi-class:

[ P(y = k | \mathbf{x}) = \frac{e^{\mathbf{w}k^T \mathbf{x}}}{\sum{j=1}^{K} e^{\mathbf{w}_j^T \mathbf{x}}} ]

This is the softmax function – generalizes sigmoid to (k) classes.

Loading Python runtime...


Key Takeaways

Logistic Regression: Classification algorithm using sigmoid function

Sigmoid Function: (\sigma(z) = \frac{1}{1+e^{-z}}) maps real numbers to [0, 1]

Model: (P(y=1|\mathbf{x}) = \sigma(\mathbf{w}^T \mathbf{x}))

Loss: Binary cross-entropy (convex, probabilistically motivated)

Training: Gradient descent with same update rule as linear regression

Decision Boundary: Linear in feature space (where (\mathbf{w}^T \mathbf{x} = 0))

Multi-Class: One-vs-Rest or Softmax regression


Practice Problems

Problem 1: Implement Sigmoid

Loading Python runtime...

Problem 2: Compute Cross-Entropy Loss

Loading Python runtime...

Problem 3: Decision Boundary Interpretation

Given (\mathbf{w} = [2, -1, 3]) (including bias), what is the decision boundary equation?

Loading Python runtime...


Next Steps

You've mastered binary classification with logistic regression! Next:

  • Lesson 5: Regularization – preventing overfitting with L1/L2 penalties
  • Lesson 6: Decision Trees – non-linear decision boundaries

Logistic regression is used everywhere: web click prediction, medical diagnosis, credit scoring, and more!


Further Reading


Remember: Logistic regression is a linear model that learns linear decision boundaries. For non-linear problems, we'll need more powerful models (coming soon)!