Introduction: Soft Clustering with Probabilities
Remember K-Means? It assigns each point to exactly ONE cluster – a hard assignment. But what if a point is right between two clusters? What if clusters overlap?
Gaussian Mixture Models (GMMs) solve this with soft clustering: each point can belong to multiple clusters with different probabilities!
Key Insight: GMMs model data as a mixture of Gaussian distributions, providing probabilistic cluster assignments and enabling anomaly detection, density estimation, and generative modeling.
Learning Objectives
- Understand Gaussian distributions and mixture models
- Master the Expectation-Maximization (EM) algorithm
- Compare GMMs with K-Means
- Use GMMs for anomaly detection
- Apply GMMs to real-world clustering problems
- Handle model selection (choosing number of components)
1. From Gaussian to Mixture
Single Gaussian Distribution
A Gaussian (normal) distribution is defined by:
- Mean : center
- Covariance : spread and orientation
Probability density:
Loading Python runtime...
Mixture of Gaussians
A mixture is a weighted sum of multiple Gaussians:
Where:
- = number of components
- = mixing coefficient (weight) for component
- and
Interpretation: Data is generated by first choosing a cluster with probability , then sampling from .
Loading Python runtime...
2. The EM Algorithm
Since we don't know which component generated each point, we use Expectation-Maximization (EM):
E-Step (Expectation)
Compute responsibility = probability that point belongs to component :
Interpretation: "How much" does component explain point ?
M-Step (Maximization)
Update parameters using responsibilities:
Algorithm Steps
- Initialize: Random , ,
- E-step: Compute responsibilities
- M-step: Update parameters using responsibilities
- Repeat until convergence (log-likelihood stops improving)
Loading Python runtime...
3. GMM vs K-Means
Comparison
Aspect | K-Means | GMM |
---|---|---|
Assignment | Hard (0 or 1) | Soft (probabilities) |
Cluster shape | Spherical | Elliptical (any orientation) |
Algorithm | Lloyd's algorithm | EM algorithm |
Output | Cluster labels | Probabilities + density model |
Generative | No | Yes (can sample new data) |
Anomaly detection | Difficult | Natural (low likelihood) |
Speed | Faster | Slower |
When to Use Each
Loading Python runtime...
4. Anomaly Detection with GMM
GMMs naturally support anomaly detection: points with low likelihood under the model are anomalies!
Loading Python runtime...
5. Model Selection: Choosing K
How many components should we use? Use information criteria:
Bayesian Information Criterion (BIC)
Where:
- = likelihood
- = number of parameters
- = number of samples
Lower BIC = better model. BIC penalizes model complexity.
Loading Python runtime...
Key Takeaways
✅ GMMs model data as a mixture of Gaussian distributions
✅ Soft clustering: Points can belong to multiple clusters with probabilities
✅ EM algorithm: Iteratively estimates responsibilities (E-step) and updates parameters (M-step)
✅ Advantages over K-Means: Elliptical clusters, probabilistic assignments, generative model
✅ Anomaly detection: Natural with likelihood-based scoring
✅ Model selection: Use BIC or AIC to choose number of components
What's Next?
Next lesson: Neural Networks Fundamentals – from biological inspiration to backpropagation!