Dimensionality Reduction: PCA, t-SNE, UMAP

Introduction: The Curse of High Dimensions

Imagine trying to visualize a dataset with 1,000 features. Impossible, right? Or training a model where each sample has 10,000 dimensions – slow, memory-intensive, and prone to overfitting.

Dimensionality reduction solves this by projecting high-dimensional data into fewer dimensions while preserving essential structure. It's like looking at shadows: a 3D object casts a 2D shadow that captures its key shape.

In this lesson, we'll master three powerful techniques: PCA (linear, fast), t-SNE (non-linear, great for visualization), and UMAP (balances both, fast).

Prerequisites

Completed ML Fundamentals and Clustering Algorithms
Linear algebra: vectors, matrices, eigenvectors
Python with NumPy, scikit-learn
Nice to have: Understanding of variance and covariance

1. Why Dimensionality Reduction?

The Curse of Dimensionality

As dimensions increase, strange things happen:

Volume explodes: A 10D hypercube has 1,024 corners!
Distance loses meaning: All points become equally far apart
Data becomes sparse: Most of the volume is empty
Models suffer: Need exponentially more data to generalize

Analogy: In a 1D line, 10 points fill it well. In 2D, you need 100 points. In 10D? A million points barely scratch the surface!

Benefits of Dimensionality Reduction

Benefit	Example
Visualization	Plot 1000D data in 2D/3D for exploration
Speed	Faster training with fewer features
Memory	Store less data
Noise reduction	Remove irrelevant dimensions
Multicollinearity	Remove correlated features
Better generalization	Avoid overfitting from too many features

Two Approaches

Feature Selection: Choose a subset of original features (we covered this in ML Fundamentals)
Feature Extraction: Create new features by combining originals (this lesson!)

2. Principal Component Analysis (PCA)

The Intuition

PCA finds directions of maximum variance in your data. Think of it as rotating your coordinate system to align with the data's natural spread.

Analogy: Imagine photographing a flat disk from different angles. One angle (face-on) captures the most information. That's the first principal component!

Key Idea: Project data onto fewer dimensions that capture the most variance.

Mathematical Foundation

Given data matrix $X$ (n samples × p features):

Center the data: $X_c = X - \bar{X}$
Compute covariance matrix: $C = \frac{1}{n-1}X_c^TX_c$
Find eigenvectors/eigenvalues: $C v_i = \lambda_i v_i$
Sort by eigenvalues: Largest $\lambda$ → most variance
Project: $Z = X_c V_k$ (keep top k eigenvectors)

Explained Variance:

\text{Explained Variance Ratio} = \frac{\lambda_k}{\sum_{i=1}^{p} \lambda_i}

Interactive Exploration

Loading interactive component...

Try this:

Switch between datasets – see how PCA projects 3D → 2D
Check "Explained Variance" – how much information is retained?
Compare PCA with t-SNE and UMAP on the same data

Implementation from Scratch

Loading Python runtime...

Choosing the Number of Components

Loading Python runtime...

Rule of Thumb: Keep enough components to explain 90-95% of variance.

PCA Strengths & Limitations

Strengths	Limitations
✅ Fast and scalable	❌ Only captures linear relationships
✅ Mathematically principled	❌ Assumes variance = importance
✅ Reversible (can reconstruct)	❌ Sensitive to scaling
✅ No hyperparameters	❌ Hard to interpret components
✅ Noise reduction	❌ Fails on non-linear manifolds

3. t-SNE: t-Distributed Stochastic Neighbor Embedding

The Intuition

t-SNE preserves local structure – keeping nearby points close and far points far. Unlike PCA, it can capture non-linear relationships.

Analogy: Imagine a crumpled paper (high-D) that you carefully unfold (2D) while preserving which parts touch.

Key Idea: Model pairwise similarities in high-D, then find low-D layout that preserves those similarities.

How t-SNE Works

High-dimensional similarities: Model as Gaussian
$p_{j|i} = \frac{\exp(-\|x_i - x_j\|^2 / 2\sigma_i^2)}{\sum_{k \neq i} \exp(-\|x_i - x_k\|^2 / 2\sigma_i^2)}$
Low-dimensional similarities: Model as t-distribution (heavy-tailed)
$q_{ij} = \frac{(1 + \|y_i - y_j\|^2)^{-1}}{\sum_{k \neq l} (1 + \|y_k - y_l\|^2)^{-1}}$
Optimize: Minimize KL-divergence between P and Q using gradient descent

Why t-distribution? Heavy tails prevent "crowding problem" – far points don't cluster together.

Interactive Exploration

Loading interactive component...

Try this:

Adjust Perplexity – balances local vs global structure (typical: 5-50)
Notice t-SNE is stochastic – different runs give different results
Compare with PCA on the same "Swiss Roll" data

Implementation Example

Loading Python runtime...

The Perplexity Parameter

Perplexity roughly corresponds to the number of close neighbors for each point.

Low perplexity (5-15): Focuses on local structure, may create fragmented clusters
High perplexity (30-50): Considers more global structure, may merge clusters
Rule of thumb: Start with 30, adjust based on dataset size

Loading Python runtime...

t-SNE Strengths & Limitations

Strengths	Limitations
✅ Excellent for visualization	❌ Very slow for large datasets
✅ Captures non-linear structure	❌ Non-deterministic (random init)
✅ Preserves local neighborhoods	❌ Doesn't preserve global structure
✅ Beautiful visualizations	❌ Can't transform new data
✅ Works well for clusters	❌ Perplexity needs tuning

Key Limitation: t-SNE is for visualization only – don't use distances in 2D for downstream tasks!

4. UMAP: Uniform Manifold Approximation and Projection

The Intuition

UMAP is the best of both worlds: preserves local structure like t-SNE but faster and better at preserving global structure.

Key Idea: Build a graph of the data manifold, then find a low-dimensional layout that preserves the graph structure.

Mathematical Approach

Construct fuzzy topological graph in high dimensions
Optimize layout in low dimensions to match graph structure
Uses Riemannian geometry and algebraic topology (simplified implementation)

Advantages over t-SNE:

Faster (can handle millions of points)
Preserves more global structure
Can transform new data
Fewer hyperparameters

Interactive Exploration

Loading interactive component...

Try this:

Adjust N Neighbors – similar to perplexity in t-SNE
Notice UMAP is faster than t-SNE
Compare global structure preservation across all three methods

Implementation Example

Loading Python runtime...

UMAP Hyperparameters

n_neighbors: Size of local neighborhood (default 15). Smaller → more local focus
min_dist: Minimum distance between points in embedding (default 0.1). Smaller → tighter clusters

UMAP Strengths & Limitations

Strengths	Limitations
✅ Fast – scales to millions	❌ Less mature than PCA/t-SNE
✅ Preserves local AND global	❌ Still stochastic
✅ Can transform new data	❌ Hyperparameters affect results
✅ Strong theoretical foundation	❌ Requires additional library
✅ Great for visualization	❌ Sensitivity to hyperparameters

5. Method Comparison

Side-by-Side Comparison

Loading Python runtime...

Decision Matrix

Criterion	PCA	t-SNE	UMAP
Speed	⚡⚡⚡ Very Fast	⚡ Slow	⚡⚡ Fast
Scalability	Millions	Thousands	Millions
Non-linear	❌ No	✅ Yes	✅ Yes
Global structure	✅ Yes	❌ No	✅ Moderate
Local structure	❌ No	✅ Yes	✅ Yes
Deterministic	✅ Yes	❌ No	❌ No
Transform new data	✅ Yes	❌ No	✅ Yes
Best for	Preprocessing	Visualization	General purpose

When to Use Each

Use PCA when:

You need speed and scalability
Data is mostly linear
You need to transform new data
You want interpretable components
Preprocessing before other algorithms

Use t-SNE when:

Primary goal is visualization
Dataset is small-medium (<10k samples)
Local structure is most important
You have time for hyperparameter tuning

Use UMAP when:

You want visualization + modeling
Large datasets
Both local and global structure matter
You need to transform new data
Modern, general-purpose choice

6. Real-World Application: Image Dataset Exploration

Loading Python runtime...

Practice Problems

Problem 1: Explain the Difference

A colleague says "PCA and t-SNE are both dimensionality reduction, so they're interchangeable." Explain why this is incorrect.

Problem 2: Curse of Dimensionality

Show empirically that as dimensions increase, the ratio of distances between nearest and farthest neighbors approaches 1.

Problem 3: Face Recognition Pipeline

Build a face recognition system using:

PCA for dimensionality reduction
K-Means or DBSCAN for clustering
Evaluate using silhouette score

Key Takeaways

✅ Dimensionality reduction makes high-D data manageable and visualizable

✅ PCA: Fast, linear, mathematically clean. Perfect for preprocessing.

✅ t-SNE: Best for visualization, captures non-linear structure, preserves local neighborhoods

✅ UMAP: Modern alternative, faster than t-SNE, balances local and global structure

✅ Choose wisely: PCA for speed/preprocessing, t-SNE for visualization, UMAP for both

✅ Applications: Visualization, preprocessing, noise reduction, feature engineering

What's Next?

Next up: Anomaly Detection – using dimensionality reduction and distance metrics to identify outliers!

Advanced ML: Unsupervised Learning & Production

Dimensionality Reduction: PCA, t-SNE, UMAP

Introduction: The Curse of High Dimensions

Prerequisites

1. Why Dimensionality Reduction?

The Curse of Dimensionality

Benefits of Dimensionality Reduction

Two Approaches

2. Principal Component Analysis (PCA)

The Intuition

Mathematical Foundation

Interactive Exploration

Implementation from Scratch

Choosing the Number of Components

PCA Strengths & Limitations

3. t-SNE: t-Distributed Stochastic Neighbor Embedding

The Intuition

How t-SNE Works

Interactive Exploration

Implementation Example

The Perplexity Parameter

t-SNE Strengths & Limitations

4. UMAP: Uniform Manifold Approximation and Projection

The Intuition

Mathematical Approach

Interactive Exploration

Implementation Example

UMAP Hyperparameters

UMAP Strengths & Limitations

5. Method Comparison

Side-by-Side Comparison

Decision Matrix

When to Use Each

6. Real-World Application: Image Dataset Exploration

Practice Problems

Problem 1: Explain the Difference

Problem 2: Curse of Dimensionality

Problem 3: Face Recognition Pipeline

Key Takeaways

What's Next?