PYTHON FOR DATA SCIENCE: FROM ARRAYS TO ANALYSIS / L03PANDAS BASICS: SERIES AND DATAFRAMES
课程 · 10 · 03 / 10
LESSON 03 · INTERMEDIATE · 75 MIN · ◆ 2 INSTRUMENTS

Pandas Basics: Series and DataFrames

Master the pandas library fundamentals. Learn to create, index, and manipulate Series and DataFrames—the workhorses of data analysis.

TIP

Learning Objectives: After this lesson, you'll master pandas fundamentals—creating, indexing, and manipulating Series and DataFrames, the essential data structures for data analysis in Python.

What is Pandas?

TIP

🐼 Pin this tab open while you study pandas: Pandas Tutor shows step-by-step what each operation does to your DataFrame — rows shuffle, columns appear, groups merge. After 10 minutes there, every later pandas operation feels obvious.

If NumPy is for numerical arrays, pandas is for structured, tabular data—think spreadsheets or database tables. Named after "Panel Data" (an econometrics term), pandas makes data analysis intuitive and powerful.

FIG. 02Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 02Interactive Python code execution environment

Series: The 1D Building Block

A Series is a one-dimensional labeled array. Think of it as a single column of data with an index.

Creating Series

FIG. 04Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 04Interactive Python code execution environment

Series Properties and Access

FIG. 06Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 06Interactive Python code execution environment

Series Operations

FIG. 08Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 08Interactive Python code execution environment

DataFrames: The 2D Powerhouse

A DataFrame is a two-dimensional table with labeled rows and columns. It's the primary pandas data structure.

Creating DataFrames

FIG. 10Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 10Interactive Python code execution environment

DataFrame Properties

FIG. 12Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 12Interactive Python code execution environment

Quick Data Exploration

FIG. 14Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 14Interactive Python code execution environment

Statistical Summary

Use this interactive explorer to understand DataFrame statistics. Switch between views to see:

  • Table: Raw data in spreadsheet format
  • Statistics: Summary stats (mean, std, quartiles)
  • Distributions: Histograms for each feature
  • Correlations: Relationship heatmap between variables
UNKNOWN COMPONENT
DataFrameExplorer
FIG. 18Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 18Interactive Python code execution environment

Selecting Data

Pandas offers multiple ways to select data. Understanding these is crucial for data analysis.

Column Selection

FIG. 20Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 20Interactive Python code execution environment

Row Selection with .loc and .iloc

FIG. 22Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 22Interactive Python code execution environment

Boolean Indexing (Filtering)

FIG. 24Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 24Interactive Python code execution environment

Modifying DataFrames

Adding and Removing Columns

FIG. 26Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 26Interactive Python code execution environment

Modifying Values

FIG. 28Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 28Interactive Python code execution environment

Renaming Columns

FIG. 30Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 30Interactive Python code execution environment

Sorting Data

FIG. 32Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 32Interactive Python code execution environment

Practical Example: Employee Analysis

FIG. 34Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 34Interactive Python code execution environment

Key Takeaways

Series is a 1D labeled array—like a dictionary or a single DataFrame column

DataFrame is a 2D table with labeled rows and columns—the core pandas structure

Column selection uses brackets df['col'] or dot notation df.col

Row selection uses .loc (by label) or .iloc (by position)

Boolean indexing filters data with conditions df[df['col'] > value]

Modification operations allow adding, removing, and transforming columns

Sorting with sort_values() orders data by one or more columns

Connections: Pandas in the Data Science Ecosystem

🔗 Connection to Spreadsheets

SpreadsheetPandas
WorkbookMultiple DataFrames
WorksheetDataFrame
ColumnSeries
Cell reference (A1).loc[row, col]
FilterBoolean indexing
Sortsort_values()

🔗 Connection to SQL

SQLPandas
SELECT columnsdf[['col1', 'col2']]
WHEREBoolean indexing
ORDER BYsort_values()
DISTINCTunique() or drop_duplicates()
COUNTvalue_counts()

🔗 Connection to Machine Learning

  • Feature matrix X: DataFrame with feature columns
  • Target vector y: Series with labels
  • Train/test split: DataFrame slicing
  • Feature selection: Column selection

Practice Exercises

Exercise 1: Create and Explore

FIG. 36Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 36Interactive Python code execution environment

Exercise 2: Data Transformation

FIG. 38Python Code Executor
INTERACTIVE
LOADING INSTRUMENT
Fig. 38Interactive Python code execution environment

Next Steps

In the next lesson, we'll dive into Data Wrangling with Pandas—merging datasets, grouping operations, pivot tables, and handling missing data like a pro.


Ready to transform and clean real-world data? Let's wrangle!


Further Reading

Visualize It

  • Pandas Tutor — like Python Tutor but for pandas: paste any DataFrame operation and step through to see rows move, filter, and group. Indispensable.
  • Just Pandas Things — tiny visual recipe site with copy-pasteable patterns.

Official Docs

Tutorials

Modern Alternatives & Companions

  • Polars — Rust-backed DataFrame library with lazy evaluation. Much faster on large data; pandas-like but not identical API.
  • DuckDB — run SQL against DataFrames (and Parquet files) with duckdb.sql("SELECT ... FROM df"). Often faster than pure pandas.
  • ydata-profiling — one-line EDA report generator.

Books

  • Book: Python for Data Analysis (3rd ed., 2022) — Wes McKinney (free online). Author of pandas. The definitive reference.
  • Book: Effective Pandas — Matt Harrison. Modern idioms, chained operations, categoricals.