Pandas Basics: Series and DataFrames

Learning Objectives: After this lesson, you'll master pandas fundamentals—creating, indexing, and manipulating Series and DataFrames, the essential data structures for data analysis in Python.

What is Pandas?

If NumPy is for numerical arrays, pandas is for structured, tabular data—think spreadsheets or database tables. Named after "Panel Data" (an econometrics term), pandas makes data analysis intuitive and powerful.

Loading tool...

Series: The 1D Building Block

A Series is a one-dimensional labeled array. Think of it as a single column of data with an index.

Creating Series

Loading tool...

Series Properties and Access

Loading tool...

Series Operations

Loading tool...

DataFrames: The 2D Powerhouse

A DataFrame is a two-dimensional table with labeled rows and columns. It's the primary pandas data structure.

Creating DataFrames

Loading tool...

DataFrame Properties

Loading tool...

Quick Data Exploration

Loading tool...

Statistical Summary

Use this interactive explorer to understand DataFrame statistics. Switch between views to see:

Table: Raw data in spreadsheet format
Statistics: Summary stats (mean, std, quartiles)
Distributions: Histograms for each feature
Correlations: Relationship heatmap between variables

Unknown component: DataFrameExplorer

Selecting Data

Pandas offers multiple ways to select data. Understanding these is crucial for data analysis.

Column Selection

Loading tool...

Row Selection with .loc and .iloc

Loading tool...

Boolean Indexing (Filtering)

Loading tool...

Modifying DataFrames

Adding and Removing Columns

Loading tool...

Modifying Values

Loading tool...

Renaming Columns

Loading tool...

Sorting Data

Loading tool...

Practical Example: Employee Analysis

Loading tool...

Key Takeaways

✅ Series is a 1D labeled array—like a dictionary or a single DataFrame column

✅ DataFrame is a 2D table with labeled rows and columns—the core pandas structure

✅ Column selection uses brackets df['col'] or dot notation df.col

✅ Row selection uses .loc (by label) or .iloc (by position)

✅ Boolean indexing filters data with conditions df[df['col'] > value]

✅ Modification operations allow adding, removing, and transforming columns

✅ Sorting with sort_values() orders data by one or more columns

Connections: Pandas in the Data Science Ecosystem

🔗 Connection to Spreadsheets

Spreadsheet	Pandas
Workbook	Multiple DataFrames
Worksheet	DataFrame
Column	Series
Cell reference (A1)	`.loc[row, col]`
Filter	Boolean indexing
Sort	`sort_values()`

🔗 Connection to SQL

SQL	Pandas
SELECT columns	`df[['col1', 'col2']]`
WHERE	Boolean indexing
ORDER BY	`sort_values()`
DISTINCT	`unique()` or `drop_duplicates()`
COUNT	`value_counts()`

🔗 Connection to Machine Learning

Feature matrix X: DataFrame with feature columns
Target vector y: Series with labels
Train/test split: DataFrame slicing
Feature selection: Column selection

Practice Exercises

Exercise 1: Create and Explore

Loading tool...

Exercise 2: Data Transformation

Loading tool...

Next Steps

In the next lesson, we'll dive into Data Wrangling with Pandas—merging datasets, grouping operations, pivot tables, and handling missing data like a pro.

Ready to transform and clean real-world data? Let's wrangle!

Python for Data Science: From Arrays to Analysis

Pandas Basics: Series and DataFrames

What is Pandas?

Series: The 1D Building Block

Creating Series

Series Properties and Access

Series Operations

DataFrames: The 2D Powerhouse

Creating DataFrames

DataFrame Properties

Quick Data Exploration

Statistical Summary

Selecting Data

Column Selection

Row Selection with .loc and .iloc

Boolean Indexing (Filtering)

Modifying DataFrames

Adding and Removing Columns

Modifying Values

Renaming Columns

Sorting Data

Practical Example: Employee Analysis

Key Takeaways

Connections: Pandas in the Data Science Ecosystem

🔗 Connection to Spreadsheets

🔗 Connection to SQL

🔗 Connection to Machine Learning

Practice Exercises

Exercise 1: Create and Explore

Exercise 2: Data Transformation

Next Steps