Data Input/Output: Loading and Saving Data

Learning Objectives: After this lesson, you'll master reading and writing data in various formats—CSV, JSON, Excel, and more. You'll also learn to fetch data from web APIs and handle different file encodings.

The Data Lifecycle

Data analysis starts with loading data and ends with saving results. Pandas makes this seamless across formats.

Loading tool...

Reading CSV Files

CSV (Comma-Separated Values) is the most common data format. Pandas handles it effortlessly.

Basic CSV Reading

Loading tool...

Common CSV Options

Loading tool...

Handling Large Files

Loading tool...

Writing CSV Files

Loading tool...

Working with JSON

JSON is common for web APIs and nested data structures.

Reading JSON

Loading tool...

Writing JSON

Loading tool...

Excel Files

Pandas can read and write Excel files (requires openpyxl for .xlsx).

Loading tool...

Working with SQL Databases

Pandas integrates seamlessly with SQL databases.

Loading tool...

Fetching Data from Web APIs

APIs return data (usually JSON) that can be converted to DataFrames.

Loading tool...

Pagination and Multiple API Calls

Loading tool...

Handling File Encodings

Different files use different character encodings. Understanding this prevents errors.

Loading tool...

Binary Formats: Parquet and Pickle

For performance, consider binary formats.

Loading tool...

Practical Example: Data Pipeline

Loading tool...

Key Takeaways

✅ CSV is universal—use pd.read_csv() and df.to_csv() with appropriate options

✅ JSON handles nested data—use pd.json_normalize() for complex structures

✅ Excel requires openpyxl—supports multiple sheets with ExcelWriter

✅ SQL integration is seamless—use pd.read_sql() and df.to_sql()

✅ APIs return JSON—convert to DataFrames after parsing

✅ Encodings matter—specify encoding for international data

✅ Binary formats (Parquet, Pickle) are faster for large data

Connections: Data I/O in Practice

🔗 Connection to Data Engineering

Task	Pandas Function
ETL Pipeline	read_* → transform → to_*
Data Lake	read_parquet() / to_parquet()
Data Warehouse	read_sql() / to_sql()
API Integration	requests + read_json()

🔗 Connection to Machine Learning

Loading data is the first step in any ML workflow:

# Typical ML data loading
train = pd.read_csv('train.csv')
X = train.drop('target', axis=1)
y = train['target']

Practice Exercises

Exercise 1: Multi-Format Pipeline

Loading tool...

Next Steps

Now that you can load and save data, you're ready for Data Visualization with Matplotlib—turning your data into compelling visual stories.

Ready to visualize your data? Let's create beautiful charts!

Python for Data Science: From Arrays to Analysis

Data Input/Output: Loading and Saving Data

The Data Lifecycle

Reading CSV Files

Basic CSV Reading

Common CSV Options

Handling Large Files

Writing CSV Files

Working with JSON

Reading JSON

Writing JSON

Excel Files

Working with SQL Databases

Fetching Data from Web APIs

Pagination and Multiple API Calls

Handling File Encodings

Binary Formats: Parquet and Pickle

Practical Example: Data Pipeline

Key Takeaways

Connections: Data I/O in Practice

🔗 Connection to Data Engineering

🔗 Connection to Machine Learning

Practice Exercises

Exercise 1: Multi-Format Pipeline

Next Steps