PYTHON FOR DATA SCIENCE: FROM ARRAYS TO ANALYSIS / L01NUMPY FUNDAMENTALS: THE FOUNDATION OF SCIENTIFIC PYTHON
课程 · 10 · 01 / 10
LESSON 01 · INTERMEDIATE · 75 MIN · ◆ 3 INSTRUMENTS

NumPy Fundamentals: The Foundation of Scientific Python

Learn the core of numerical computing in Python. Master NumPy arrays, vectorized operations, and broadcasting—the building blocks of data science.

TIP

Learning Objectives: After this lesson, you'll understand NumPy arrays, vectorized operations, and broadcasting—the essential building blocks for data science and machine learning in Python.

Why NumPy?

Imagine you need to multiply every element in a list of 1 million numbers by 2. With pure Python, you'd loop through each element one by one. With NumPy, you perform this operation in a single, lightning-fast step.

NumPy (Numerical Python) is the foundation of the Python data science ecosystem. Libraries like pandas, scikit-learn, and TensorFlow all build on NumPy arrays.

FIG. 02Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 02Interactive Python code execution environment

Creating NumPy Arrays

Before we dive into code, let's visualize how arrays work. NumPy arrays store elements in contiguous memory, making them efficient for numerical operations:

FIG. 04Data Structure Visualizer
INTERACTIVE
LOADING INSTRUMENT
Fig. 04Interactive visualization of Python data structures

Try the Append and Pop buttons above to see how array operations work visually. Notice the index labels and memory addresses—this is how NumPy organizes data!

From Python Lists

The most common way to create arrays is from existing Python lists:

FIG. 06Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 06Interactive Python code execution environment

Using NumPy Creation Functions

NumPy provides convenient functions to create common array patterns:

FIG. 08Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 08Interactive Python code execution environment

Random Arrays

Random numbers are essential for simulations, sampling, and machine learning:

FIG. 10Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 10Interactive Python code execution environment

Array Properties

Understanding array properties is crucial for working with data:

FIG. 12Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 12Interactive Python code execution environment

Indexing and Slicing

NumPy arrays support powerful indexing—think of it as selecting data from a spreadsheet. Use the interactive visualizer below to see how element positions work:

FIG. 14Data Structure Visualizer
INTERACTIVE
LOADING INSTRUMENT
Fig. 14Interactive visualization of Python data structures

Click on elements and try Sort, Reverse, or Insert operations. Notice how indices change when the array is modified!

Basic Indexing

FIG. 16Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 16Interactive Python code execution environment

Boolean Indexing (Filtering)

One of NumPy's most powerful features—select elements based on conditions:

FIG. 18Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 18Interactive Python code execution environment

Fancy Indexing

Select multiple specific elements or rows:

FIG. 20Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 20Interactive Python code execution environment

Vectorized Operations

The key to NumPy's speed: operations apply to entire arrays at once, without explicit loops.

Element-wise Operations

FIG. 22Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 22Interactive Python code execution environment

Mathematical Functions

NumPy provides optimized versions of common math functions:

FIG. 24Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 24Interactive Python code execution environment

Comparison Operations

FIG. 26Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 26Interactive Python code execution environment

Aggregation Functions

Summarize data with statistical operations:

FIG. 28Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 28Interactive Python code execution environment

Aggregation Along Axes

For 2D arrays, you can aggregate along specific axes:

FIG. 30Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 30Interactive Python code execution environment

Broadcasting

Broadcasting is NumPy's powerful mechanism for operations between arrays of different shapes:

FIG. 32Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 32Interactive Python code execution environment

Broadcasting Rules

Broadcasting follows specific rules—shapes are compared from right to left:

FIG. 34Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 34Interactive Python code execution environment

Practical Example: Data Analysis

Let's apply what we've learned to analyze some data. Here's an interactive visualization of sales patterns:

FIG. 36Graph Plotter
INTERACTIVE
LOADING INSTRUMENT
Fig. 36Interactive plotting tool for visualizing data and relationships

Now let's analyze this data with NumPy:

FIG. 38Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 38Interactive Python code execution environment

Key Takeaways

NumPy arrays are the foundation of scientific Python—faster and more memory-efficient than lists

Create arrays using np.array(), np.zeros(), np.ones(), np.arange(), np.linspace(), and random functions

Indexing and slicing work like lists but with powerful additions: boolean indexing and fancy indexing

Vectorized operations apply to entire arrays without loops—this is the key to NumPy's speed

Broadcasting allows operations between arrays of different shapes following specific rules

Aggregation functions like np.sum(), np.mean(), np.std() can operate along specific axes

Connections: NumPy Across Domains

🔗 Connection to Mathematics

NumPy arrays are essentially mathematical vectors and matrices:

Math ConceptNumPy Implementation
Vector v=[1,2,3]\vec{v} = [1, 2, 3]np.array([1, 2, 3])
Matrix multiplicationnp.dot(A, B) or A @ B
Transpose ATA^TA.T
Element-wise operationsStandard operators: +, -, *, /

🔗 Connection to Data Science

NumPy is the foundation for the entire Python data science stack:

  • pandas: DataFrames are built on NumPy arrays
  • scikit-learn: All ML algorithms use NumPy arrays internally
  • TensorFlow/PyTorch: Neural network operations are NumPy-like
  • matplotlib: Plotting functions expect NumPy arrays

🔗 Connection to Machine Learning

Understanding NumPy prepares you for ML concepts:

ML ConceptNumPy Foundation
Feature matrix X2D array (samples × features)
Target vector y1D array
Model weights1D or 2D arrays
Gradient descentArray arithmetic

Practice Exercises

Exercise 1: Temperature Analysis

FIG. 40Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 40Interactive Python code execution environment

Exercise 2: Grade Normalization

FIG. 42Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 42Interactive Python code execution environment

Next Steps

In the next lesson, we'll dive deeper into Advanced NumPy: linear algebra operations, random number generation, reshaping arrays, and advanced indexing techniques that are essential for data manipulation and machine learning.


Ready to master more NumPy? The next lesson will unlock even more powerful array operations!


Further Reading

Visualize It

Official Docs

Tutorials

Modern Numerical Python

  • JAX — NumPy-on-GPU/TPU with autodiff. Same API, much more horsepower.
  • Polars — Rust-backed alternative for tabular data; pairs well with NumPy.
  • Array API Standard — emerging cross-library standard so the same code works on NumPy, JAX, PyTorch, CuPy.

Books

  • Book: Python Data Science Handbook — Jake VanderPlas, Chapter 2 ("Introduction to NumPy"). Free online.
  • Book: Python for Data Analysis (3rd ed., 2022) — Wes McKinney (author of pandas). Chapter 4 covers NumPy.