Support Vector Machines: Linear Case

Introduction: Finding the Best Separator

Imagine you're designing a fence to separate two groups of animals in a zoo. There are many possible positions for the fence, but which is best?

The smartest choice is to place the fence as far as possible from both groups. This gives maximum safety margin – if animals wander slightly, they still won't cross!

Support Vector Machines (SVM) apply this principle to classification: find the decision boundary that's as far as possible from both classes!

Key Insight: Instead of just finding any separator, SVMs find the optimal separator with maximum margin – leading to better generalization!

Learning Objectives

Understand the maximum margin principle
Derive the SVM optimization problem
Grasp support vectors and their role
Implement linear SVM from scratch
Handle soft margins with slack variables
Tune the C hyperparameter
Compare SVM with logistic regression

1. The Maximum Margin Principle

Logistic Regression vs SVM

Logistic Regression: Find any separator that classifies well

SVM: Find the best separator with maximum margin

Loading interactive component...

Loading Python runtime...

2. Mathematical Formulation

The Optimization Problem

Decision function: (f(x) = w \cdot x + b)

Classification:

(y_i = +1) if (f(x_i) \geq 0)
(y_i = -1) if (f(x_i) < 0)

Margin: Distance from point to hyperplane = (\frac{|w \cdot x + b|}{||w||})

Goal: Maximize minimum margin

Constraint: All points correctly classified: [y_i(w \cdot x_i + b) \geq 1 \quad \forall i]

Optimization (Hard Margin SVM): [ \begin{aligned} \min_{w,b} \quad & \frac{1}{2}||w||^2 \ \text{s.t.} \quad & y_i(w \cdot x_i + b) \geq 1 \quad \forall i \end{aligned} ]

This is a convex quadratic program – guaranteed global optimum!

Loading interactive component...

Loading Python runtime...

3. Support Vectors

Support Vectors: Training points that lie exactly on the margin boundaries.

These points determine the decision boundary! Other points don't matter.

Property: If you removed non-support vectors, the solution wouldn't change!

Loading Python runtime...

4. Soft Margin SVM

The Problem: Non-Separable Data

Real data is rarely perfectly separable. Solution: Allow some misclassifications!

Slack variables (\xi_i \geq 0): How much point (i) violates the margin.

Soft Margin SVM: [ \begin{aligned} \min_{w,b,\xi} \quad & \frac{1}{2}||w||^2 + C \sum_{i=1}^{n} \xi_i \ \text{s.t.} \quad & y_i(w \cdot x_i + b) \geq 1 - \xi_i \ & \xi_i \geq 0 \quad \forall i \end{aligned} ]

Hyperparameter C: Tradeoff between margin and violations

Large C: Few violations, small margin (risk overfitting)
Small C: Many violations, large margin (more regularization)

Loading Python runtime...

5. Implementing SVM from Scratch

Let's implement a simple linear SVM using gradient descent on the primal formulation.

Hinge Loss: (L(y, f(x)) = \max(0, 1 - y \cdot f(x)))

Objective: (\frac{1}{2}||w||^2 + C \sum_i \max(0, 1 - y_i(w \cdot x_i + b)))

Loading Python runtime...

6. SVM vs Logistic Regression

Both are linear classifiers, but with different loss functions:

Logistic Regression: Log-loss (cross-entropy) [L(y, f(x)) = \log(1 + e^{-y \cdot f(x)})]

SVM: Hinge loss [L(y, f(x)) = \max(0, 1 - y \cdot f(x))]

Loading Python runtime...

Key Takeaways

✓ Maximum Margin: SVM finds separator with largest distance to nearest points

✓ Support Vectors: Only points on margin boundaries matter; rest are irrelevant

✓ Optimization: Convex quadratic program → global optimum guaranteed

✓ Hard Margin: Requires perfect separation (rarely possible)

✓ Soft Margin: Allows violations via slack variables; controlled by hyperparameter C

✓ C Hyperparameter: Large C = less regularization (risk overfit); Small C = more regularization

✓ Hinge Loss: (\max(0, 1 - y \cdot f(x))) – no loss if margin ≥ 1

✓ vs Logistic Regression: Similar but SVM uses hinge loss (sparse) vs log-loss (probabilities)

Practice Problems

Problem 1: Geometric Margin Calculation

Loading Python runtime...

Problem 2: Tune C Parameter

Loading Python runtime...

Next Steps

Linear SVMs are powerful, but what if data isn't linearly separable?

Next lesson: Kernel Methods – the famous "kernel trick" that lets SVMs handle non-linear decision boundaries without explicitly transforming features!

This is one of the most elegant ideas in machine learning!

Classical Machine Learning: Supervised Learning Foundations

Support Vector Machines: Linear Case

Introduction: Finding the Best Separator

Learning Objectives

1. The Maximum Margin Principle

Logistic Regression vs SVM

2. Mathematical Formulation

The Optimization Problem

3. Support Vectors

4. Soft Margin SVM

The Problem: Non-Separable Data

5. Implementing SVM from Scratch

6. SVM vs Logistic Regression

Key Takeaways

Practice Problems

Problem 1: Geometric Margin Calculation

Problem 2: Tune C Parameter

Next Steps

Further Reading