Feature Selection and Dimensionality Reduction

Introduction: Less Can Be More

Imagine cleaning your closet. You have 500 items but only wear 50 regularly. The other 450 just add clutter, make it hard to find things, and waste space!

Feature selection is the same: remove features that don't help (or hurt) your model. The result? Faster training, better generalization, and easier interpretation.

More features ≠ better model. Often, fewer good features beat many mediocre ones!

Key Insight: Irrelevant and redundant features add noise, increase overfitting risk, and slow down training. Feature selection finds the optimal subset.

Learning Objectives

Understand why feature selection matters
Master filter methods (correlation, mutual information)
Apply wrapper methods (RFE, sequential selection)
Use embedded methods (L1 regularization, tree-based)
Handle multicollinearity
Avoid selection bias in cross-validation
Choose appropriate methods for different problems

1. Why Feature Selection Matters

The Curse of Dimensionality

Problem: As dimensions increase, data becomes sparse and models overfit

2. Filter Methods: Independent of Model

Based on Statistical Properties

Idea: Score features independently, keep top-k

Advantages: Fast, model-agnostic Disadvantages: Ignores feature interactions

Variance Threshold

Remove features with low variance (nearly constant)

Correlation with Target

Select features highly correlated with target

Mutual Information

Captures non-linear dependencies

3. Wrapper Methods: Model-Based Selection

Recursive Feature Elimination (RFE)

Algorithm:

Train model on all features
Remove least important feature
Repeat until desired number of features

Forward/Backward Sequential Selection

Forward: Start with 0, add best feature iteratively Backward: Start with all, remove worst feature iteratively

4. Embedded Methods: Selection During Training

L1 Regularization (Lasso)

L1 penalty drives some coefficients to exactly zero → automatic feature selection!

Tree-Based Feature Importance

Decision trees and ensembles provide built-in feature importance

5. Handling Multicollinearity

Detecting and Removing Correlated Features

Problem: Highly correlated features are redundant

6. Avoiding Selection Bias

The Right Way: Selection Inside CV

Problem: If you select features on full data, then do CV, you overestimate performance!

Solution: Feature selection must be done inside each CV fold

Key Takeaways

✓ Why: Remove irrelevant/redundant features → faster, better generalization, interpretability

✓ Filter Methods: Fast, model-agnostic (variance, correlation, mutual information)

✓ Wrapper Methods: Model-based, considers feature combinations (RFE, sequential)

✓ Embedded Methods: Selection during training (L1 regularization, tree importance)

✓ Multicollinearity: Remove highly correlated features (threshold ~0.9)

✓ Avoid Bias: Always do selection INSIDE CV folds (use Pipeline)

✓ Trade-offs: Filter (fast, simple) vs Wrapper (slow, thorough) vs Embedded (integrated)

Practice Problems

Problem 1: Complete Feature Selection Pipeline

Problem 2: Compare Selection Methods

Next Steps

You've mastered feature selection! 🎉

Final lesson: End-to-End ML Project – putting everything together in a real-world workflow!

This will tie together everything you've learned in the course!