CLASSICAL MACHINE LEARNING: SUPERVISED LEARNING FOUNDATIONS / L14FEATURE SELECTION AND DIMENSIONALITY REDUCTION
课程 · 15 · 14 / 15
LESSON 14 · INTERMEDIATE · 60 MIN · ◆ 1 INSTRUMENT

Feature Selection and Dimensionality Reduction

Select the best features: filter/wrapper/embedded methods, and reduce dimensions with PCA and feature importance.

Introduction: Less Can Be More

Imagine cleaning your closet. You have 500 items but only wear 50 regularly. The other 450 just add clutter, make it hard to find things, and waste space!

Feature selection is the same: remove features that don't help (or hurt) your model. The result? Faster training, better generalization, and easier interpretation.

More features ≠ better model. Often, fewer good features beat many mediocre ones!

Key Insight: Irrelevant and redundant features add noise, increase overfitting risk, and slow down training. Feature selection finds the optimal subset.

Learning Objectives

  • Understand why feature selection matters
  • Master filter methods (correlation, mutual information)
  • Apply wrapper methods (RFE, sequential selection)
  • Use embedded methods (L1 regularization, tree-based)
  • Handle multicollinearity
  • Avoid selection bias in cross-validation
  • Choose appropriate methods for different problems

1. Why Feature Selection Matters

The Curse of Dimensionality

Problem: As dimensions increase, data becomes sparse and models overfit

FIG. 02Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 02Interactive Python code execution environment

2. Filter Methods: Independent of Model

Based on Statistical Properties

Idea: Score features independently, keep top-k

Advantages: Fast, model-agnostic Disadvantages: Ignores feature interactions

Variance Threshold

Remove features with low variance (nearly constant)

FIG. 04Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 04Interactive Python code execution environment

Correlation with Target

Select features highly correlated with target

FIG. 06Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 06Interactive Python code execution environment

Mutual Information

Captures non-linear dependencies

FIG. 08Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 08Interactive Python code execution environment

3. Wrapper Methods: Model-Based Selection

Recursive Feature Elimination (RFE)

Algorithm:

  1. Train model on all features
  2. Remove least important feature
  3. Repeat until desired number of features
FIG. 10Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 10Interactive Python code execution environment

Forward/Backward Sequential Selection

Forward: Start with 0, add best feature iteratively Backward: Start with all, remove worst feature iteratively

FIG. 12Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 12Interactive Python code execution environment

4. Embedded Methods: Selection During Training

L1 Regularization (Lasso)

L1 penalty drives some coefficients to exactly zero → automatic feature selection!

FIG. 14Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 14Interactive Python code execution environment

Tree-Based Feature Importance

Decision trees and ensembles provide built-in feature importance

FIG. 16Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 16Interactive Python code execution environment

5. Handling Multicollinearity

Detecting and Removing Correlated Features

Problem: Highly correlated features are redundant

FIG. 18Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 18Interactive Python code execution environment

6. Avoiding Selection Bias

The Right Way: Selection Inside CV

Problem: If you select features on full data, then do CV, you overestimate performance!

Solution: Feature selection must be done inside each CV fold

FIG. 20Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 20Interactive Python code execution environment

Key Takeaways

Why: Remove irrelevant/redundant features → faster, better generalization, interpretability

Filter Methods: Fast, model-agnostic (variance, correlation, mutual information)

Wrapper Methods: Model-based, considers feature combinations (RFE, sequential)

Embedded Methods: Selection during training (L1 regularization, tree importance)

Multicollinearity: Remove highly correlated features (threshold ~0.9)

Avoid Bias: Always do selection INSIDE CV folds (use Pipeline)

Trade-offs: Filter (fast, simple) vs Wrapper (slow, thorough) vs Embedded (integrated)


Practice Problems

Problem 1: Complete Feature Selection Pipeline

FIG. 22Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 22Interactive Python code execution environment

Problem 2: Compare Selection Methods

FIG. 24Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 24Interactive Python code execution environment

Next Steps

You've mastered feature selection! 🎉

Final lesson: End-to-End ML Project – putting everything together in a real-world workflow!

This will tie together everything you've learned in the course!


Further Reading

Interactive Visualizations

Video Tutorials

Papers & Articles

Documentation & Books


Remember: "It is not the strongest features that survive, nor the most intelligent, but the ones most responsive to change." – Adapted from Darwin