Learning Objectives: After this lesson, you'll master reading and writing data in various formats—CSV, JSON, Excel, and more. You'll also learn to fetch data from web APIs and handle different file encodings.
The Data Lifecycle
Data analysis starts with loading data and ends with saving results. Pandas makes this seamless across formats.
Reading CSV Files
CSV (Comma-Separated Values) is the most common data format. Pandas handles it effortlessly.
Basic CSV Reading
Common CSV Options
Handling Large Files
Writing CSV Files
Working with JSON
JSON is common for web APIs and nested data structures.
Reading JSON
Writing JSON
Excel Files
Pandas can read and write Excel files (requires openpyxl for .xlsx).
Working with SQL Databases
Pandas integrates seamlessly with SQL databases.
Fetching Data from Web APIs
APIs return data (usually JSON) that can be converted to DataFrames.
Pagination and Multiple API Calls
Handling File Encodings
Different files use different character encodings. Understanding this prevents errors.
Binary Formats: Parquet and Pickle
For performance, consider binary formats.
Practical Example: Data Pipeline
Key Takeaways
✅ CSV is universal—use pd.read_csv() and df.to_csv() with appropriate options
✅ JSON handles nested data—use pd.json_normalize() for complex structures
✅ Excel requires openpyxl—supports multiple sheets with ExcelWriter
✅ SQL integration is seamless—use pd.read_sql() and df.to_sql()
✅ APIs return JSON—convert to DataFrames after parsing
✅ Encodings matter—specify encoding for international data
✅ Binary formats (Parquet, Pickle) are faster for large data
Connections: Data I/O in Practice
🔗 Connection to Data Engineering
| Task | Pandas Function |
|---|---|
| ETL Pipeline | read_* → transform → to_* |
| Data Lake | read_parquet() / to_parquet() |
| Data Warehouse | read_sql() / to_sql() |
| API Integration | requests + read_json() |
🔗 Connection to Machine Learning
Loading data is the first step in any ML workflow:
# Typical ML data loading train = pd.read_csv('train.csv') X = train.drop('target', axis=1) y = train['target']
Practice Exercises
Exercise 1: Multi-Format Pipeline
Next Steps
Now that you can load and save data, you're ready for Data Visualization with Matplotlib—turning your data into compelling visual stories.
Ready to visualize your data? Let's create beautiful charts!