Pandas Basics: Series and DataFrames

Learning Objectives: After this lesson, you'll master pandas fundamentals—creating, indexing, and manipulating Series and DataFrames, the essential data structures for data analysis in Python.

What is Pandas?

If NumPy is for numerical arrays, pandas is for structured, tabular data—think spreadsheets or database tables. Named after "Panel Data" (an econometrics term), pandas makes data analysis intuitive and powerful.

Loading tool...

Series: The 1D Building Block

A Series is a one-dimensional labeled array. Think of it as a single column of data with an index.

Creating Series

Loading tool...

Series Properties and Access

Loading tool...

Series Operations

Loading tool...

DataFrames: The 2D Powerhouse

A DataFrame is a two-dimensional table with labeled rows and columns. It's the primary pandas data structure.

Creating DataFrames

Loading tool...

DataFrame Properties

Loading tool...

Quick Data Exploration

Loading tool...

Statistical Summary

Use this interactive explorer to understand DataFrame statistics. Switch between views to see:

  • Table: Raw data in spreadsheet format
  • Statistics: Summary stats (mean, std, quartiles)
  • Distributions: Histograms for each feature
  • Correlations: Relationship heatmap between variables
Unknown component: DataFrameExplorer

Selecting Data

Pandas offers multiple ways to select data. Understanding these is crucial for data analysis.

Column Selection

Loading tool...

Row Selection with .loc and .iloc

Loading tool...

Boolean Indexing (Filtering)

Loading tool...

Modifying DataFrames

Adding and Removing Columns

Loading tool...

Modifying Values

Loading tool...

Renaming Columns

Loading tool...

Sorting Data

Loading tool...

Practical Example: Employee Analysis

Loading tool...

Key Takeaways

Series is a 1D labeled array—like a dictionary or a single DataFrame column

DataFrame is a 2D table with labeled rows and columns—the core pandas structure

Column selection uses brackets df['col'] or dot notation df.col

Row selection uses .loc (by label) or .iloc (by position)

Boolean indexing filters data with conditions df[df['col'] > value]

Modification operations allow adding, removing, and transforming columns

Sorting with sort_values() orders data by one or more columns

Connections: Pandas in the Data Science Ecosystem

🔗 Connection to Spreadsheets

SpreadsheetPandas
WorkbookMultiple DataFrames
WorksheetDataFrame
ColumnSeries
Cell reference (A1).loc[row, col]
FilterBoolean indexing
Sortsort_values()

🔗 Connection to SQL

SQLPandas
SELECT columnsdf[['col1', 'col2']]
WHEREBoolean indexing
ORDER BYsort_values()
DISTINCTunique() or drop_duplicates()
COUNTvalue_counts()

🔗 Connection to Machine Learning

  • Feature matrix X: DataFrame with feature columns
  • Target vector y: Series with labels
  • Train/test split: DataFrame slicing
  • Feature selection: Column selection

Practice Exercises

Exercise 1: Create and Explore

Loading tool...

Exercise 2: Data Transformation

Loading tool...

Next Steps

In the next lesson, we'll dive into Data Wrangling with Pandas—merging datasets, grouping operations, pivot tables, and handling missing data like a pro.


Ready to transform and clean real-world data? Let's wrangle!