mainmatter · LukeMathWalker · Jan 15, 2025 · Jan 7, 2025 · Jan 8, 2025 · Jan 8, 2025
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/Cargo.toml b/Cargo.toml
@@ -1,11 +1,17 @@
 [workspace]
 members = ["exercises/*/*", "verifier"]
+exclude = [
+  "exercises/03_concurrency/00_introduction",
+  "exercises/03_concurrency/01_python_threads",
+]
 resolver = "2"
 
 [workspace.dependencies]
 anyhow = "1"
 duct = "0.13"
+primes = "0.4"
 pyo3 = "0.23.3"
+rayon = "1"
 semver = "1.0.23"
 serde = "1.0.204"
 serde_json = "1.0.120"
diff --git a/book/src/03_concurrency/00_introduction.md b/book/src/03_concurrency/00_introduction.md
@@ -0,0 +1,67 @@
+# Concurrency
+
+All our code so far has been designed for sequential execution, on both the Python and Rust side.
+It's time to spice things up a bit and explore concurrency[^scope]!
+
+We won't dive straight into Rust this time.\
+We'll start by solving a few parallel processing problems in Python, to get a feel for Python's capabilities and limitations.
+Once we have a good grasp of what's possible there, we'll port our solutions over to Rust.
+
+## Multiprocessing
+
+If you've ever tried to write parallel code in Python, you've probably come across the `multiprocessing` module.
+Before we dive into the details, let's take a step back and review the terminology we'll be using.
+
+### Processes
+
+A **process** is an instance of a running program.\
+The precise anatomy of a process depends on the underlying **operating system** (e.g. Windows or Linux).
+Some characteristics are common across most operating systems, though. In particular, a process typically consists of:
+
+- The program's code
+- Its memory space, allocated by the operating system
+- A set of resources (file handles, sockets, etc.)
+
+```ascii
++------------------------+
+|        Memory          |
+|                        |
+| +--------------------+ |
+| |  Process A Space   | |  <-- Each process has a separate memory space.
+| +--------------------+ |
+|                        |
+| +--------------------+ |
+| |  Process B Space   | |
+| |                    | |
+| +--------------------+ |
+|                        |
+| +--------------------+ |
+| |  Process C Space   | |
+| +--------------------+ |
++------------------------+
+```
+
+There can be multiple processes running the same program, each with its own memory space and resources, fully
+isolated from one another.\
+The **operating system's scheduler** is in charge of deciding which process to run at any given time, partitioning CPU time
+among them to maximize throughput and/or responsiveness.
+
+### The `multiprocessing` module
+
+Python's `multiprocessing` module allows us to spawn new processes, each running its own Python interpreter.
+
+A process is created by invoking the `Process` constructor with a target function to execute as well as
+any arguments that function might need.
+The process is launched by calling its `start` method, and we can wait for it to finish by calling `join`.
+
+If we want to communicate between processes, we can use `Queue` objects, which are shared between processes.
+These queues try to abstract away the complexities of inter-process communication, allowing us to pass messages
+between our processes in a relatively straightforward manner.
+
+## References:
+
+- [`multiprocessing` module](https://docs.python.org/3/library/multiprocessing.html)
+- [`Process` class](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Process)
+- [`Queue` class](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Queue)
+
+[^scope]: We'll limit our exploration to threads and processes, without venturing into the realm of `async`/`await`.
diff --git a/book/src/03_concurrency/01_python_threads.md b/book/src/03_concurrency/01_python_threads.md
@@ -0,0 +1,96 @@
+# Threads
+
+## The overhead of multiprocessing
+
+Let's have a look at the solution for the previous exercise:
+
+```python
+from multiprocessing import Process, Queue
+
+def word_count(text: str, n_processes: int) -> int:
+    result_queue = Queue()
+    processes = []
+    for chunk in split_into_chunks(text, n_processes):
+        p = Process(target=word_count_task, args=(chunk, result_queue))
+        p.start()
+        processes.append(p)
+    for p in processes:
+        p.join()
+    results = [result_queue.get() for _ in range(len(processes))]
+    return sum(results)
+```
+
+Let's focus, in particular, on process creation:
+
+```python
+p = Process(target=word_count_task, args=(chunk, result_queue))
+```
+
+The parent process (the one executing `word_count`) doesn't share memory with the child process (the one
+spawned via `p.start()`). As a result, the child process can't access `chunk` or `result_queue` directly.
+Instead, it needs to be provided a **deep copy** of these objects[^pickle].\
+That's not a major issue if the data is small, but it can become a problem on larger datasets.\
+For example, if we're working with 8 GB of text, we'll end up with at least 16 GB of memory usage: 8 GB for the
+parent process and 8 GB split among the child processes. Not ideal!
+
+We could try to circumvent this issue[^mmap], but that's not always possible nor easy to do.\
+A more straightforward solution is to use **threads** instead of processes.
+
+## Threads
+
+A **thread** is an execution context **within a process**.\
+Threads share the same memory space and resources as the process that spawned them, thus allowing them to communicate
+and share data with one another more easily than processes can.
+
+```ascii
++------------------------+
+|        Memory          |
+|                        |
+| +--------------------+ |
+| |  Process A Space   | |  <-- Each process has its own memory space.
+| |  +-------------+   | |      Threads share the same memory space
+| |  | Thread 1    |   | |      of the process that spawned them.
+| |  | Thread 2    |   | |
+| |  | Thread 3    |   | |
+| |  +-------------+   | |
+| +--------------------+ |
+|                        |
+| +--------------------+ |
+| |  Process B Space   | |
+| |  +-------------+   | |
+| |  | Thread 1    |   | |
+| |  | Thread 2    |   | |
+| |  +-------------+   | |
+| +--------------------+ |
++------------------------+
+```
+
+Threads, just like processes, are operating system constructs.\
+The operating system's scheduler is in charge of deciding which thread to run at any given time, partitioning CPU time
+among them.
+
+## The `threading` module
+
+Python's `threading` module provides a high-level interface for working with threads.\
+The API of the `Thread` class, in particular, mirrors what you already know from the `Process` class:
+
+- A thread is created by calling the `Thread` constructor and passing it a target function to execute as well as
+  any arguments that function might need.
+- The thread is launched by calling its `start` method, and we can wait for it to finish by calling `join`.
+- If we want to communicate between threads, we can use `Queue` objects, from the `queue` module, which are shared between threads.
+
+## References:
+
+- [`threading` module](https://docs.python.org/3/library/threading.html)
+- [`Thread` class](https://docs.python.org/3/library/threading.html#threading.Thread)
+- [`Queue` class](https://docs.python.org/3/library/queue.html)
+
+[^pickle]: To be more precise, the `multiprocessing` module uses the `pickle` module to serialize the objects
+that must be passed as arguments to the child process.
+The serialized data is then sent to the child process, as a byte stream, over an operating system pipe.
+On the other side of the pipe, the child process deserializes the byte stream back into Python objects using `pickle`
+and passes them to the target function.\
+This all system has higher overhead than a "simple" deep copy.
+
+[^mmap]: Common workarounds include memory-mapped files and shared-memory objects, but these can be quite
+difficult to work with. They also suffer from portability issues, as they rely on OS-specific features.
diff --git a/book/src/03_concurrency/02_gil.md b/book/src/03_concurrency/02_gil.md
@@ -0,0 +1,66 @@
+# The GIL problem
+
+## Concurrent, yes, but not parallel
+
+On the surface, our thread-based solution addresses all the issues we identified in the `multiprocessing` module:
+
+```python
+from threading import Process
+from queue import Queue
+
+def word_count(text: str, n_threads: int) -> int:
+    result_queue = Queue()
+    threads = []
+
+    for chunk in split_into_chunks(text, n_threads):
+        t = Thread(target=word_count_task, args=(chunk, result_queue))
+        t.start()
+        threads.append(t)
+
+    for t in threads:
+        t.join()
+
+    results = [result_queue.get() for _ in range(len(threads))]
+    return sum(results)
+```
+
+When a thread is created, we are no longer cloning the text chunk nor incurring the overhead of inter-process communication:
+
+```python
+t = Thread(target=word_count_task, args=(chunk, result_queue))
+```
+
+Since the spawned threads share the same memory space as the parent thread, they can access the `chunk` and `result_queue` directly.
+
+Nonetheless, there's a major issue with this code: **it won't actually use multiple CPU cores**.\
+It will run sequentially, even if we pass `n_threads > 1` and multiple CPU cores are available.
+
+## Python concurrency
+
+You guessed it: the infamous Global Interpreter Lock (GIL) is to blame.
+As we discussed in the [GIL chapter](../01_intro/05_gil.md),
+Python's GIL prevents multiple threads from executing Python code simultaneously[^free-threading].
+
+As a result, thread-based parallelism has historically
+seen limited use in Python, as it doesn't provide the performance benefits one might expect from a
+multithreaded application.
+
+That's why the `multiprocessing` module is so popular: it allows Python developers to bypass the GIL.
+Each process has its own Python interpreter, and thus its own GIL. The operating system schedules these processes
+independently, allowing them to run in parallel on multicore CPUs.
+
+But, as we've seen, multiprocessing comes with its own set of challenges.
+
+## Native extensions
+
+There's a third way to achieve parallelism in Python: **native extensions**.\
+We must [be holding the GIL](../01_intro/05_gil.html#pythonpy) when we invoke a Rust function from Python, but
+pure Rust threads are not affected by the GIL, as long as they don't need to interact with Python objects.
+
+Let's rewrite again our `word_count` function, this time in Rust!
+
+[^free-threading]: This is the current state of Python's concurrency model. There are some exciting changes on the horizon, though!
+[`CPython`'s free-threading mode](https://docs.python.org/3/howto/free-threading-python.html) is an experimental feature
+that aims to remove the GIL entirely.
+It would allow multiple threads to execute Python code simultaneously, without forcing developers to rely on multiprocessing.
+We won't cover the new free-threading mode in this course, but it's worth keeping an eye on it as it matures out of the experimental phase.