Introduction: The Wisdom of Crowds
Imagine you're trying to guess the number of jelly beans in a jar. Instead of relying on one person's guess, you ask 100 people and average their guesses. Surprisingly, this average is often more accurate than any individual guess – even experts!
This is the wisdom of crowds: combining many diverse opinions often beats individual expertise.
Random Forests apply this principle to machine learning: instead of one decision tree, we train many trees on slightly different data and average their predictions. This ensemble is more accurate and robust than any single tree!
Key Insight: Random Forests reduce overfitting and variance by combining predictions from multiple diverse trees trained through bootstrap aggregating (bagging).
Learning Objectives
- Understand ensemble learning principles
- Master bootstrap sampling and bagging
- Build Random Forests from scratch
- Tune forest hyperparameters
- Understand feature randomness and its benefits
- Compare Random Forests with single trees
1. Ensemble Learning: Combining Models
Why Ensembles?
A single decision tree is:
- ✅ Interpretable
- ✅ Fast to train
- ❌ High variance (unstable)
- ❌ Tends to overfit
Solution: Train multiple trees and combine them!
Loading Python runtime...
2. Bootstrap Aggregating (Bagging)
Bootstrap Sampling
Bootstrap: Sample (n) data points with replacement from dataset of size (n).
- Some samples appear multiple times
- Some samples don't appear at all (~37% left out)
- Each bootstrap sample is slightly different
Loading Python runtime...
Bagging Algorithm
Bootstrap Aggregating:
- Create (B) bootstrap samples from training data
- Train one model (tree) on each bootstrap sample
- Aggregate predictions:
- Classification: Majority vote
- Regression: Average
Loading Python runtime...
3. Random Forests: Adding Feature Randomness
The Extra Ingredient
Problem: Bagging helps, but trees can still be too similar (correlated).
Solution: When splitting each node, only consider random subset of features!
Random Forest = Bagging + Feature Randomness
At each split:
- Select (m) features at random (typically (m = \sqrt{d}) for classification)
- Find best split among these (m) features only
- This decorrelates trees → better diversity → better ensemble
Loading Python runtime...
4. Out-of-Bag (OOB) Error
Free Cross-Validation
Remember: ~37% of data is not in each bootstrap sample.
OOB samples: Samples not used to train a particular tree.
OOB Error: For each sample, average predictions from trees that didn't see it during training.
Benefit: Get validation error estimate without separate validation set!
Loading Python runtime...
5. Hyperparameter Tuning
Key Hyperparameters
Parameter | What it Controls | Typical Values |
---|---|---|
n_estimators | Number of trees | 100-1000 (more is better, diminishing returns) |
max_depth | Maximum tree depth | 10-30 (or None for full depth) |
max_features | Features per split | 'sqrt' (classification), 'log2', or integer |
min_samples_split | Min samples to split | 2-10 |
min_samples_leaf | Min samples in leaf | 1-5 |
Loading Python runtime...
6. Feature Importance
Random Forests automatically calculate feature importance!
Method: Sum the decrease in impurity (Gini/entropy) weighted by probability of reaching that node, averaged across all trees.
Loading Python runtime...
7. Advantages and Limitations
✅ Advantages
- High Accuracy: Often best off-the-shelf performance
- Robust: Handles outliers and noisy data well
- No Overfitting: More trees doesn't overfit (unlike single tree)
- Feature Importance: Automatic ranking
- Handles Mixed Data: Numerical and categorical features
- Parallel: Trees train independently (fast on multiple cores)
- OOB Error: Built-in validation
❌ Limitations
- Black Box: Less interpretable than single tree
- Memory: Stores many trees (can be large)
- Slow Prediction: Must query all trees
- Not for Extrapolation: Can't predict beyond training data range
- Bias: Biased toward features with many categories
Loading Python runtime...
Key Takeaways
✓ Random Forests: Ensemble of decision trees via bagging + feature randomness
✓ Bagging: Bootstrap sampling + aggregating (majority vote or average)
✓ Feature Randomness: Consider only subset of features at each split → decorrelates trees
✓ OOB Error: Free validation using samples not in bootstrap → no need for separate val set
✓ Hyperparameters: n_estimators
(more is better), max_depth
, max_features
✓ Feature Importance: Automatic ranking of feature relevance
✓ Strengths: High accuracy, robust, handles mixed data, parallelizable
✓ Limitations: Black box, can't extrapolate, slower prediction than single tree
Practice Problems
Problem 1: Implement Simple Bagging
Loading Python runtime...
Problem 2: Compare Bagging vs Random Forest
Loading Python runtime...
Next Steps
Random Forests are powerful, but there's an even better ensemble method: Boosting!
Next lesson:
- Lesson 8: Gradient Boosting – sequentially building trees that fix previous errors
Boosting methods like XGBoost and LightGBM dominate ML competitions!
Further Reading
- Original Paper: Random Forests by Leo Breiman (2001)
- Tutorial: Random Forests in scikit-learn
- Book: The Elements of Statistical Learning (Chapter 15)
- Practical: Feature Importance Methods
Remember: Random Forests combine simplicity (trees) with power (ensembles). They're often the first model to try on tabular data!