Support Vector Machines: Linear Case

Introduction: Finding the Best Separator

Imagine you're designing a fence to separate two groups of animals in a zoo. There are many possible positions for the fence, but which is best?

The smartest choice is to place the fence as far as possible from both groups. This gives maximum safety margin – if animals wander slightly, they still won't cross!

Support Vector Machines (SVM) apply this principle to classification: find the decision boundary that's as far as possible from both classes!

Key Insight: Instead of just finding any separator, SVMs find the optimal separator with maximum margin – leading to better generalization!

Learning Objectives

  • Understand the maximum margin principle
  • Derive the SVM optimization problem
  • Grasp support vectors and their role
  • Implement linear SVM from scratch
  • Handle soft margins with slack variables
  • Tune the C hyperparameter
  • Compare SVM with logistic regression

1. The Maximum Margin Principle

Logistic Regression vs SVM

Logistic Regression: Find any separator that classifies well

SVM: Find the best separator with maximum margin

Loading interactive component...

Loading Python runtime...


2. Mathematical Formulation

The Optimization Problem

Decision function: (f(x) = w \cdot x + b)

Classification:

  • (y_i = +1) if (f(x_i) \geq 0)
  • (y_i = -1) if (f(x_i) < 0)

Margin: Distance from point to hyperplane = (\frac{|w \cdot x + b|}{||w||})

Goal: Maximize minimum margin

Constraint: All points correctly classified: [y_i(w \cdot x_i + b) \geq 1 \quad \forall i]

Optimization (Hard Margin SVM): [ \begin{aligned} \min_{w,b} \quad & \frac{1}{2}||w||^2 \ \text{s.t.} \quad & y_i(w \cdot x_i + b) \geq 1 \quad \forall i \end{aligned} ]

This is a convex quadratic program – guaranteed global optimum!

Loading interactive component...

Loading Python runtime...


3. Support Vectors

Support Vectors: Training points that lie exactly on the margin boundaries.

These points determine the decision boundary! Other points don't matter.

Property: If you removed non-support vectors, the solution wouldn't change!

Loading Python runtime...


4. Soft Margin SVM

The Problem: Non-Separable Data

Real data is rarely perfectly separable. Solution: Allow some misclassifications!

Slack variables (\xi_i \geq 0): How much point (i) violates the margin.

Soft Margin SVM: [ \begin{aligned} \min_{w,b,\xi} \quad & \frac{1}{2}||w||^2 + C \sum_{i=1}^{n} \xi_i \ \text{s.t.} \quad & y_i(w \cdot x_i + b) \geq 1 - \xi_i \ & \xi_i \geq 0 \quad \forall i \end{aligned} ]

Hyperparameter C: Tradeoff between margin and violations

  • Large C: Few violations, small margin (risk overfitting)
  • Small C: Many violations, large margin (more regularization)

Loading Python runtime...


5. Implementing SVM from Scratch

Let's implement a simple linear SVM using gradient descent on the primal formulation.

Hinge Loss: (L(y, f(x)) = \max(0, 1 - y \cdot f(x)))

Objective: (\frac{1}{2}||w||^2 + C \sum_i \max(0, 1 - y_i(w \cdot x_i + b)))

Loading Python runtime...


6. SVM vs Logistic Regression

Both are linear classifiers, but with different loss functions:

Logistic Regression: Log-loss (cross-entropy) [L(y, f(x)) = \log(1 + e^{-y \cdot f(x)})]

SVM: Hinge loss [L(y, f(x)) = \max(0, 1 - y \cdot f(x))]

Loading Python runtime...


Key Takeaways

Maximum Margin: SVM finds separator with largest distance to nearest points

Support Vectors: Only points on margin boundaries matter; rest are irrelevant

Optimization: Convex quadratic program → global optimum guaranteed

Hard Margin: Requires perfect separation (rarely possible)

Soft Margin: Allows violations via slack variables; controlled by hyperparameter C

C Hyperparameter: Large C = less regularization (risk overfit); Small C = more regularization

Hinge Loss: (\max(0, 1 - y \cdot f(x))) – no loss if margin ≥ 1

vs Logistic Regression: Similar but SVM uses hinge loss (sparse) vs log-loss (probabilities)


Practice Problems

Problem 1: Geometric Margin Calculation

Loading Python runtime...

Problem 2: Tune C Parameter

Loading Python runtime...


Next Steps

Linear SVMs are powerful, but what if data isn't linearly separable?

Next lesson: Kernel Methods – the famous "kernel trick" that lets SVMs handle non-linear decision boundaries without explicitly transforming features!

This is one of the most elegant ideas in machine learning!


Further Reading


Remember: SVMs find the optimal separator by maximizing the margin. They're elegant, theoretically grounded, and still widely used in practice!