课程 · 10 · 03 / 10
Pandas Basics: Series and DataFrames
Master the pandas library fundamentals. Learn to create, index, and manipulate Series and DataFrames—the workhorses of data analysis.
TIPLearning Objectives: After this lesson, you'll master pandas fundamentals—creating, indexing, and manipulating Series and DataFrames, the essential data structures for data analysis in Python.
What is Pandas?
TIP🐼 Pin this tab open while you study pandas: Pandas Tutor shows step-by-step what each operation does to your DataFrame — rows shuffle, columns appear, groups merge. After 10 minutes there, every later pandas operation feels obvious.
If NumPy is for numerical arrays, pandas is for structured, tabular data—think spreadsheets or database tables. Named after "Panel Data" (an econometrics term), pandas makes data analysis intuitive and powerful.
Series: The 1D Building Block
A Series is a one-dimensional labeled array. Think of it as a single column of data with an index.
Creating Series
Series Properties and Access
Series Operations
DataFrames: The 2D Powerhouse
A DataFrame is a two-dimensional table with labeled rows and columns. It's the primary pandas data structure.
Creating DataFrames
DataFrame Properties
Quick Data Exploration
Statistical Summary
Use this interactive explorer to understand DataFrame statistics. Switch between views to see:
- Table: Raw data in spreadsheet format
- Statistics: Summary stats (mean, std, quartiles)
- Distributions: Histograms for each feature
- Correlations: Relationship heatmap between variables
Selecting Data
Pandas offers multiple ways to select data. Understanding these is crucial for data analysis.
Column Selection
Row Selection with .loc and .iloc
Boolean Indexing (Filtering)
Modifying DataFrames
Adding and Removing Columns
Modifying Values
Renaming Columns
Sorting Data
Practical Example: Employee Analysis
Key Takeaways
✅ Series is a 1D labeled array—like a dictionary or a single DataFrame column
✅ DataFrame is a 2D table with labeled rows and columns—the core pandas structure
✅ Column selection uses brackets df['col'] or dot notation df.col
✅ Row selection uses .loc (by label) or .iloc (by position)
✅ Boolean indexing filters data with conditions df[df['col'] > value]
✅ Modification operations allow adding, removing, and transforming columns
✅ Sorting with sort_values() orders data by one or more columns
Connections: Pandas in the Data Science Ecosystem
🔗 Connection to Spreadsheets
| Spreadsheet | Pandas |
|---|---|
| Workbook | Multiple DataFrames |
| Worksheet | DataFrame |
| Column | Series |
| Cell reference (A1) | .loc[row, col] |
| Filter | Boolean indexing |
| Sort | sort_values() |
🔗 Connection to SQL
| SQL | Pandas |
|---|---|
| SELECT columns | df[['col1', 'col2']] |
| WHERE | Boolean indexing |
| ORDER BY | sort_values() |
| DISTINCT | unique() or drop_duplicates() |
| COUNT | value_counts() |
🔗 Connection to Machine Learning
- Feature matrix X: DataFrame with feature columns
- Target vector y: Series with labels
- Train/test split: DataFrame slicing
- Feature selection: Column selection
Practice Exercises
Exercise 1: Create and Explore
Exercise 2: Data Transformation
Next Steps
In the next lesson, we'll dive into Data Wrangling with Pandas—merging datasets, grouping operations, pivot tables, and handling missing data like a pro.
Ready to transform and clean real-world data? Let's wrangle!
Further Reading
Visualize It
- Pandas Tutor — like Python Tutor but for pandas: paste any DataFrame operation and step through to see rows move, filter, and group. Indispensable.
- Just Pandas Things — tiny visual recipe site with copy-pasteable patterns.
Official Docs
- 10 minutes to pandas — the official crash course. Start here.
- Pandas Cookbook — idiomatic recipes for every common task.
- Comparison with SQL — if you think in SQL, this is the fastest on-ramp.
Tutorials
- Real Python — The pandas DataFrame — the canonical primer.
- Kaggle — Pandas micro-course — 5 interactive notebooks, ~4 hours total.
- Modern Pandas (7-part series) — Tom Augspurger (pandas core dev). Idiomatic modern style.
Modern Alternatives & Companions
- Polars — Rust-backed DataFrame library with lazy evaluation. Much faster on large data;
pandas-like but not identical API. - DuckDB — run SQL against DataFrames (and Parquet files) with
duckdb.sql("SELECT ... FROM df"). Often faster than pure pandas. ydata-profiling— one-line EDA report generator.
Books
- Book: Python for Data Analysis (3rd ed., 2022) — Wes McKinney (free online). Author of pandas. The definitive reference.
- Book: Effective Pandas — Matt Harrison. Modern idioms, chained operations, categoricals.