Anomaly Detection: Isolation Forest & One-Class SVM

Introduction: The Needle in the Haystack

Imagine you're a fraud detection analyst at a bank. Among millions of normal transactions, you need to find the suspicious ones. Or you're monitoring server infrastructure – most metrics are normal, but you need to catch the anomalies before they cause outages.

Anomaly detection is the art of finding rare items, events, or observations that differ significantly from the majority of the data. It's crucial for:

Fraud detection: Unusual credit card transactions
Network security: Detecting intrusions
Healthcare: Identifying rare diseases
Manufacturing: Finding defective products
System monitoring: Detecting performance issues

Key Insight: Anomalies are rare, but impactful. Catching them early can save millions of dollars or even lives!

Learning Objectives

Understand different types of anomalies
Master Isolation Forest algorithm
Apply One-Class SVM for anomaly detection
Use statistical methods for outlier detection
Handle high-dimensional anomaly detection
Evaluate anomaly detection performance
Choose the right method for different scenarios

1. Types of Anomalies

Point Anomalies

Individual instances that deviate from the norm.

Example: A credit card transaction of $10,000 when typical transactions are$ 20-$100.

Contextual Anomalies

Normal in one context but anomalous in another.

Example: Temperature of 80°F is normal in summer but anomalous in winter.

Collective Anomalies

A collection of instances is anomalous together.

Example: A series of small withdrawals that together drain an account.

2. Isolation Forest

The Intuition

Isolation Forest is based on a beautiful idea: anomalies are easier to isolate than normal points.

Analogy: Imagine you're at a party. To isolate a popular person, you need many questions ("Are you near the food?" "Are you talking to Sarah?"). But to isolate someone standing alone in the corner? Just one question: "Are you in the corner?"

Key Idea: Anomalies require fewer random splits to isolate than normal points.

How Isolation Forest Works

Build isolation trees: Randomly select a feature and split value
Measure path length: How many splits to isolate each point?
Score anomalies: Shorter paths = more anomalous

Anomaly Score:

s(x) = 2^{-\frac{E(h(x))}{c(n)}}

Where:

$h(x)$ = path length
$c(n)$ = average path length for a tree with n points
$s(x)$ close to 1 = anomaly
$s(x)$ close to 0 = normal

Interactive Exploration

Loading interactive component...

Try this:

Generate data with "Add Anomalies" button
Watch how Isolation Forest isolates outliers
Adjust Contamination – expected % of anomalies
Try different datasets – when does it work best?

Implementation Example

Loading Python runtime...

Isolation Forest Strengths & Limitations

Strengths	Limitations
✅ Fast and scalable	❌ Performance depends on contamination parameter
✅ Works well in high dimensions	❌ Struggles with local anomalies in dense clusters
✅ No distance metric needed	❌ Less interpretable than statistical methods
✅ Handles mixed data types	❌ May miss subtle anomalies

3. One-Class SVM

The Intuition

One-Class SVM learns a boundary around normal data. Anything outside that boundary is anomalous.

Analogy: Like drawing a fence around your property. Anything inside the fence is yours, anything outside is not.

Key Idea: Find a hyperplane that separates normal data from the origin with maximum margin.

How One-Class SVM Works

Map data to high-dimensional space (using kernel trick)
Find hyperplane with maximum margin from origin
Normal data is on one side, anomalies on the other

Decision Function:

Positive score = normal
Negative score = anomaly

Interactive Exploration

Loading interactive component...

Try this:

Compare One-Class SVM with Isolation Forest
Adjust nu parameter – controls boundary tightness
Try different kernel functions (RBF, linear, polynomial)
Notice how SVM creates smooth decision boundaries

Implementation Example

Loading Python runtime...

4. Statistical Methods

Z-Score Method

Idea: Points far from the mean (in terms of standard deviations) are anomalous.

Rule of Thumb: If $|z| > 3$ , it's an outlier (99.7% of data within 3σ in normal distribution).

z = \frac{x - \mu}{\sigma}

IQR (Interquartile Range) Method

Idea: Use quartiles to define outlier thresholds.

Outliers: Points outside $[Q1 - 1.5 \times IQR, Q3 + 1.5 \times IQR]$

Where IQR = Q3 - Q1

Interactive Exploration

Loading interactive component...

Try this:

Compare statistical methods with ML approaches
Adjust the threshold (Z-score or IQR multiplier)
Notice how statistical methods assume distribution shape
See when they fail on complex data

Implementation Example

Loading Python runtime...

5. Comparing Methods

Loading Python runtime...

Decision Guide

Use Isolation Forest when:

High-dimensional data
Need scalability
Anomalies are global (far from normal data)
No assumptions about data distribution

Use One-Class SVM when:

Need smooth decision boundaries
Complex, non-linear patterns
Small to medium datasets
Can tune hyperparameters

Use Statistical Methods when:

Data follows known distribution (e.g., Gaussian)
Interpretability is crucial
Simple, univariate outlier detection
Need explainable thresholds

Key Takeaways

✅ Anomaly detection finds rare, unusual patterns in data

✅ Isolation Forest: Fast, scalable, isolates anomalies efficiently

✅ One-Class SVM: Creates decision boundaries around normal data

✅ Statistical Methods: Simple, interpretable, assume distribution

✅ Evaluation: Use precision, recall, F1-score when labels available

✅ Real-world: Choose method based on data characteristics and constraints

What's Next?

Next lesson: Gaussian Mixture Models – probabilistic clustering and soft anomaly detection!

Advanced ML: Unsupervised Learning & Production

Anomaly Detection: Isolation Forest & One-Class SVM

Introduction: The Needle in the Haystack

Learning Objectives

1. Types of Anomalies

Point Anomalies

Contextual Anomalies

Collective Anomalies

2. Isolation Forest

The Intuition

How Isolation Forest Works

Interactive Exploration

Implementation Example

Isolation Forest Strengths & Limitations

3. One-Class SVM

The Intuition

How One-Class SVM Works

Interactive Exploration

Implementation Example

4. Statistical Methods

Z-Score Method

IQR (Interquartile Range) Method

Interactive Exploration

Implementation Example

5. Comparing Methods

Decision Guide

Key Takeaways

What's Next?