Data Input/Output: Loading and Saving Data

Learning Objectives: After this lesson, you'll master reading and writing data in various formats—CSV, JSON, Excel, and more. You'll also learn to fetch data from web APIs and handle different file encodings.

The Data Lifecycle

Data analysis starts with loading data and ends with saving results. Pandas makes this seamless across formats.

Loading tool...

Reading CSV Files

CSV (Comma-Separated Values) is the most common data format. Pandas handles it effortlessly.

Basic CSV Reading

Loading tool...

Common CSV Options

Loading tool...

Handling Large Files

Loading tool...

Writing CSV Files

Loading tool...

Working with JSON

JSON is common for web APIs and nested data structures.

Reading JSON

Loading tool...

Writing JSON

Loading tool...

Excel Files

Pandas can read and write Excel files (requires openpyxl for .xlsx).

Loading tool...

Working with SQL Databases

Pandas integrates seamlessly with SQL databases.

Loading tool...

Fetching Data from Web APIs

APIs return data (usually JSON) that can be converted to DataFrames.

Loading tool...

Pagination and Multiple API Calls

Loading tool...

Handling File Encodings

Different files use different character encodings. Understanding this prevents errors.

Loading tool...

Binary Formats: Parquet and Pickle

For performance, consider binary formats.

Loading tool...

Practical Example: Data Pipeline

Loading tool...

Key Takeaways

CSV is universal—use pd.read_csv() and df.to_csv() with appropriate options

JSON handles nested data—use pd.json_normalize() for complex structures

Excel requires openpyxl—supports multiple sheets with ExcelWriter

SQL integration is seamless—use pd.read_sql() and df.to_sql()

APIs return JSON—convert to DataFrames after parsing

Encodings matter—specify encoding for international data

Binary formats (Parquet, Pickle) are faster for large data

Connections: Data I/O in Practice

🔗 Connection to Data Engineering

TaskPandas Function
ETL Pipelineread_* → transform → to_*
Data Lakeread_parquet() / to_parquet()
Data Warehouseread_sql() / to_sql()
API Integrationrequests + read_json()

🔗 Connection to Machine Learning

Loading data is the first step in any ML workflow:

# Typical ML data loading train = pd.read_csv('train.csv') X = train.drop('target', axis=1) y = train['target']

Practice Exercises

Exercise 1: Multi-Format Pipeline

Loading tool...

Next Steps

Now that you can load and save data, you're ready for Data Visualization with Matplotlib—turning your data into compelling visual stories.


Ready to visualize your data? Let's create beautiful charts!