Decision Trees: Intuition and Implementation

Introduction: Playing 20 Questions

Remember playing "20 Questions"? You ask yes/no questions to narrow down possibilities:

"Is it bigger than a breadbox?" → Yes
"Is it electronic?" → No
"Can you eat it?" → Yes
"Is it a fruit?" → Yes
"Is it red?" → Yes
"Is it an apple?" → YES!

Decision trees work exactly like this! They ask a series of questions about features to classify or predict outcomes. Unlike linear models that draw straight boundaries, trees create rectangular decision boundaries by partitioning feature space.

Key Insight: Decision trees are interpretable, handle non-linear relationships naturally, and don't require feature scaling. They're the foundation for powerful ensemble methods!

Learning Objectives

Understand tree-based decision making
Master entropy and information gain
Build decision trees from scratch
Prune trees to prevent overfitting
Compare trees with linear models
Visualize tree structures and decisions

1. From Linear to Non-Linear

SEE

🏡 The definitive interactive explainer: before we code a single split, scroll through A Visual Introduction to Machine Learning — Part 1 (R2D3). It teaches every idea in this lesson — splits, entropy, depth, overfitting — using a San-Francisco-vs-New-York housing classifier. Five minutes of scrolling and the rest feels obvious.

Why Trees?

Linear models create straight decision boundaries. But what if your data looks like this?

FIG. 02Python Code Executor

INTERACTIVE

LOADING INSTRUMENT

Fig. 02Interactive Python code execution environment

2. How Decision Trees Work

The Tree Structure

A decision tree consists of:

Root node: Starting point (all data)
Internal nodes: Decision points (if feature_x > threshold?)
Leaf nodes: Final predictions (class label or value)
Branches: Outcomes of decisions (yes/no)

FIG. 04Flow Diagram

DIAGRAM

LOADING INSTRUMENT

Fig. 04Flow diagrams, timelines, and process visualizations

Making Predictions

To classify a new sample:

Start at root
Follow branches based on feature values
Arrive at leaf node
Return leaf's prediction

FIG. 06Python Code Executor

INTERACTIVE

LOADING INSTRUMENT

Fig. 06Interactive Python code execution environment

3. Entropy and Information Gain

Measuring Impurity

To build a tree, we need to decide which feature to split on and where to split it. We use entropy to measure "messiness" or impurity.

Entropy for a node: [ H(S) = -\sum_{c=1}^{C} p_c \log_2(p_c) ]

Where (p_c) is the proportion of samples in class (c).

Properties:

(H = 0): Pure node (all same class) – best!
(H = 1): Maximum impurity (50/50 split for binary) – worst!

FIG. 08Python Code Executor

INTERACTIVE

LOADING INSTRUMENT

Fig. 08Interactive Python code execution environment

Information Gain

Information Gain = How much does splitting reduce entropy?

[ IG(S, A) = H(S) - \sum_{v \in Values(A)} \frac{|S_v|}{|S|} H(S_v) ]

Where:

(S): Parent node samples
(A): Feature to split on
(S_v): Subset after split on value (v)

Goal: Choose split that maximizes information gain (reduces entropy most).

FIG. 10Python Code Executor

INTERACTIVE

LOADING INSTRUMENT

Fig. 10Interactive Python code execution environment

4. Building a Decision Tree (ID3 Algorithm)

Algorithm Steps

Start with all data at root
For each node:
- If pure (all same class) → make leaf
- If max depth reached → make leaf
- Else:
  - Find best split (maximum information gain)
  - Create child nodes
  - Recursively repeat
Stop when: all leaves are pure or max depth reached

FIG. 12Python Code Executor

INTERACTIVE

LOADING INSTRUMENT

Fig. 12Interactive Python code execution environment

5. Overfitting and Pruning

The Overfitting Problem

Deep trees can memorize training data perfectly but fail on new data!

FIG. 14Python Code Executor

INTERACTIVE

LOADING INSTRUMENT

Fig. 14Interactive Python code execution environment

Pruning Strategies

Pre-pruning (stop early):

Limit max_depth
Require min_samples_split (minimum samples to split)
Require min_samples_leaf (minimum samples in leaf)
Require minimum information gain

Post-pruning (build full tree, then trim):

Remove branches that don't improve validation performance
More complex but often better

6. Advantages and Disadvantages

✅ Advantages

Interpretable: Easy to understand and visualize
No preprocessing: Don't need to scale features
Handle non-linear: Naturally capture non-linear relationships
Mixed data: Can handle numerical and categorical features
Feature importance: Automatically ranks features
Fast prediction: (O(\log n)) depth traversal

❌ Disadvantages

Overfitting: Can create overly complex trees
Instability: Small changes in data → completely different tree
Bias: Greedy algorithm may not find globally optimal tree
Imbalanced: Biased toward dominant classes
Smooth functions: Poor at learning smooth relationships

FIG. 16Python Code Executor

INTERACTIVE

LOADING INSTRUMENT

Fig. 16Interactive Python code execution environment

Key Takeaways

✓ Decision Trees: Make sequential binary decisions to classify/predict

✓ Entropy: Measures impurity/disorder in a node (0 = pure, 1 = maximum impurity)

✓ Information Gain: Reduction in entropy from a split (choose split that maximizes this)

✓ Building Trees: Recursive algorithm (ID3, C4.5, CART) that greedily selects best splits

✓ Overfitting: Deep trees memorize training data → use max_depth, min_samples, or pruning

✓ Interpretability: Trees are "white box" models – easy to understand decision logic

✓ Non-linear: Naturally handle non-linear relationships with rectangular boundaries

Practice Problems

Problem 1: Calculate Entropy

FIG. 18Python Code Executor

INTERACTIVE

LOADING INSTRUMENT

Fig. 18Interactive Python code execution environment

Problem 2: Find Best Split

FIG. 20Python Code Executor

INTERACTIVE

LOADING INSTRUMENT

Fig. 20Interactive Python code execution environment

Next Steps

Decision trees are powerful on their own, but their real strength comes from ensembles!

Next lessons:

Lesson 7: Random Forests – combining many trees through bagging
Lesson 8: Gradient Boosting – sequentially improving trees

These ensemble methods are among the most powerful ML techniques available!

Decision Trees: Intuition and Implementation

Introduction: Playing 20 Questions

Learning Objectives

1. From Linear to Non-Linear

Why Trees?

2. How Decision Trees Work

The Tree Structure

Making Predictions

3. Entropy and Information Gain

Measuring Impurity

Information Gain

4. Building a Decision Tree (ID3 Algorithm)

Algorithm Steps

5. Overfitting and Pruning

The Overfitting Problem

Pruning Strategies

6. Advantages and Disadvantages

✅ Advantages

❌ Disadvantages

Key Takeaways

Practice Problems

Problem 1: Calculate Entropy

Problem 2: Find Best Split

Next Steps

Further Reading

Interactive Visualizations

Video Tutorials

Papers & Articles

Documentation & Books