Learning Objectives: After this lesson, you'll master pandas fundamentals—creating, indexing, and manipulating Series and DataFrames, the essential data structures for data analysis in Python.
What is Pandas?
If NumPy is for numerical arrays, pandas is for structured, tabular data—think spreadsheets or database tables. Named after "Panel Data" (an econometrics term), pandas makes data analysis intuitive and powerful.
Series: The 1D Building Block
A Series is a one-dimensional labeled array. Think of it as a single column of data with an index.
Creating Series
Series Properties and Access
Series Operations
DataFrames: The 2D Powerhouse
A DataFrame is a two-dimensional table with labeled rows and columns. It's the primary pandas data structure.
Creating DataFrames
DataFrame Properties
Quick Data Exploration
Statistical Summary
Use this interactive explorer to understand DataFrame statistics. Switch between views to see:
- Table: Raw data in spreadsheet format
- Statistics: Summary stats (mean, std, quartiles)
- Distributions: Histograms for each feature
- Correlations: Relationship heatmap between variables
Selecting Data
Pandas offers multiple ways to select data. Understanding these is crucial for data analysis.
Column Selection
Row Selection with .loc and .iloc
Boolean Indexing (Filtering)
Modifying DataFrames
Adding and Removing Columns
Modifying Values
Renaming Columns
Sorting Data
Practical Example: Employee Analysis
Key Takeaways
✅ Series is a 1D labeled array—like a dictionary or a single DataFrame column
✅ DataFrame is a 2D table with labeled rows and columns—the core pandas structure
✅ Column selection uses brackets df['col'] or dot notation df.col
✅ Row selection uses .loc (by label) or .iloc (by position)
✅ Boolean indexing filters data with conditions df[df['col'] > value]
✅ Modification operations allow adding, removing, and transforming columns
✅ Sorting with sort_values() orders data by one or more columns
Connections: Pandas in the Data Science Ecosystem
🔗 Connection to Spreadsheets
| Spreadsheet | Pandas |
|---|---|
| Workbook | Multiple DataFrames |
| Worksheet | DataFrame |
| Column | Series |
| Cell reference (A1) | .loc[row, col] |
| Filter | Boolean indexing |
| Sort | sort_values() |
🔗 Connection to SQL
| SQL | Pandas |
|---|---|
| SELECT columns | df[['col1', 'col2']] |
| WHERE | Boolean indexing |
| ORDER BY | sort_values() |
| DISTINCT | unique() or drop_duplicates() |
| COUNT | value_counts() |
🔗 Connection to Machine Learning
- Feature matrix X: DataFrame with feature columns
- Target vector y: Series with labels
- Train/test split: DataFrame slicing
- Feature selection: Column selection
Practice Exercises
Exercise 1: Create and Explore
Exercise 2: Data Transformation
Next Steps
In the next lesson, we'll dive into Data Wrangling with Pandas—merging datasets, grouping operations, pivot tables, and handling missing data like a pro.
Ready to transform and clean real-world data? Let's wrangle!