Learning Objectives: After this lesson, you'll master NumPy's linear algebra capabilities, random number generation, array reshaping, and advanced indexing techniques essential for data science and machine learning.
Reshaping Arrays
Reshaping is fundamental to data manipulation—it lets you transform data between different dimensional representations without changing the underlying values.
Explore how arrays store elements and how operations like append and pop work:
When you reshape this 12-element array into a (3,4) matrix, the elements are distributed row by row:
Flatten and Ravel
Converting multi-dimensional arrays back to 1D:
Transpose and Axis Swapping
Stacking and Splitting Arrays
Stacking (Combining Arrays)
Splitting Arrays
Linear Algebra Operations
Linear algebra is the mathematical foundation of machine learning. NumPy provides comprehensive support. The diagram below shows common linear algebra workflows:
Matrix Multiplication
Dot Product of Vectors
Matrix Properties
Eigenvalues and Eigenvectors
Eigenvalues are crucial for understanding data transformations and dimensionality reduction (like PCA):
Solving Linear Systems
Solving Ax = b is fundamental to many algorithms:
Advanced Indexing
Using np.where()
np.where() is like a vectorized if-else statement:
np.select() for Multiple Conditions
np.clip() for Bounding Values
Random Number Generation
NumPy's random module is essential for simulations, sampling, and machine learning. Here's a visualization of different distribution types:
Random Number Generators
Sampling and Shuffling
Statistical Distributions
Practical Example: Data Preprocessing Pipeline
Let's combine everything into a realistic data preprocessing example:
Key Takeaways
✅ Reshaping transforms array dimensions with reshape(), flatten(), ravel(), and transpose
✅ Stacking and splitting combine or divide arrays with vstack(), hstack(), concatenate(), split()
✅ Linear algebra operations include matrix multiplication (@), inverse, determinant, eigenvalues, and solving systems
✅ Advanced indexing with np.where(), np.select(), np.clip() enables powerful conditional operations
✅ Random generation provides tools for sampling, shuffling, and various statistical distributions
✅ Data preprocessing pipelines combine these tools for real-world data preparation
Connections: Advanced NumPy in Practice
🔗 Connection to Machine Learning
| ML Task | NumPy Operation |
|---|---|
| Feature scaling | Broadcasting + axis operations |
| PCA | Eigenvalue decomposition |
| Linear regression | Solving Ax = b |
| Train/test split | Shuffling + slicing |
| Data augmentation | Random transformations |
🔗 Connection to Deep Learning
NumPy operations mirror neural network computations:
- Forward pass: Matrix multiplications (@ operator)
- Batch normalization: Mean/std along batch axis
- Dropout: Random masking
- Weight initialization: Random distributions
🔗 Connection to Statistics
| Statistical Concept | NumPy Function |
|---|---|
| Covariance matrix | np.cov() |
| Correlation | np.corrcoef() |
| Monte Carlo simulation | Random sampling |
| Bootstrapping | np.random.choice() with replacement |
Practice Exercises
Exercise 1: Matrix Operations
Exercise 2: Data Simulation
Next Steps
Now that you've mastered NumPy, you're ready for pandas—the library that builds on NumPy to provide intuitive data structures for real-world data analysis.
Ready to work with real datasets? Pandas is next!