课程 · 12 · 07 / 12
Multiprocessing: True Parallelism in Python
Bypass the GIL with multiprocessing. Use process pools, manage shared state safely, and understand inter-process communication.
TIPLearning Objectives: After this lesson, you'll understand how to bypass the GIL with multiprocessing, use process pools efficiently, manage shared state safely, and implement inter-process communication patterns.
Why Multiprocessing?
Multiprocessing creates separate Python processes, each with its own GIL, enabling true parallelism.
Multiprocessing vs Threading
Basic Process Creation
Process Pools
Process pools manage a fixed number of worker processes for efficient task distribution.
Using Pool
Pool Methods
Shared State and Communication
Processes don't share memory by default. Here's how to share data.
Shared Memory Values
Manager for Complex Shared Objects
Queues for Communication
Pipes for Two-Way Communication
Practical Patterns
Map-Reduce Pattern
Parallel Pipeline
Worker Pool with Callbacks
Best Practices
Key Takeaways
| Concept | Description |
|---|---|
| Process | Separate Python interpreter, own GIL |
| Pool | Reusable worker processes |
| ProcessPoolExecutor | Modern, clean pool interface |
| Value/Array | Shared memory primitives |
| Manager | Shared complex objects (dict, list) |
| Queue | Thread/process-safe communication |
| Pipe | Two-way process communication |
Multiprocessing vs Alternatives
| Scenario | Best Choice |
|---|---|
| CPU-heavy calculation | Multiprocessing |
| Network I/O | Threading or asyncio |
| File I/O | Threading |
| Many small tasks | Pool with chunksize |
| Large shared data | SharedMemory (Python 3.8+) |
| NumPy operations | NumPy (already parallel) |
Next Steps
In the next lesson, we'll explore Async/Await Fundamentals—understand event loops, write coroutines, use asyncio for concurrent I/O, and build async patterns for production code.
Ready for non-blocking Python? Async/await awaits!
Further Reading
Official Docs
- Python —
multiprocessingmodule —Process,Pool,Queue,Pipe,Manager,shared_memory. - Python —
concurrent.futures.ProcessPoolExecutor— the modern high-level API. Use this before rawProcess. - Python —
multiprocessing.shared_memory— Python 3.8+. Share NumPy arrays without pickling overhead.
Tutorials
- Real Python — Speed Up Your Python Program with Concurrency — the canonical comparison of threading / multiprocessing / asyncio.
- Real Python —
multiprocessing.Pool— the pool patterns deep-dive.
Modern Parallel Python
joblib— the parallelism library used by scikit-learn. Cleaner API than raw multiprocessing for embarrassingly-parallel work.ray— distributed Python. Scales from one machine to a cluster with@ray.remote.dask— parallel computing for analytics; familiar pandas/numpy APIs.mpire— modern multiprocessing wrapper with progress bars and worker reuse.
Common Pitfalls
- The "fork" vs "spawn" vs "forkserver" debate — macOS defaults to
spawnsince 3.8 for good reasons. Know which you're using. - Pickling errors — only picklable objects can cross process boundaries. Lambdas and local functions can't.
Books
- Book: Fluent Python (2nd ed.) — Chapter 20 ("Concurrent Executors").
- Book: High Performance Python — Gorelick & Ozsvald (2nd ed., 2020). Covers when multiprocessing actually helps vs. just adding overhead.