From ed21d0632022b3075b7f2865c82da157ea96c8e0 Mon Sep 17 00:00:00 2001 From: LukeMathWalker <20745048+LukeMathWalker@users.noreply.github.com> Date: Tue, 7 Jan 2025 14:44:02 +0100 Subject: [PATCH 01/10] Concurrency intro --- book/src/03_concurrency/00_concurrency.md | 75 +++++++++++++++++++++++ book/src/SUMMARY.md | 1 + 2 files changed, 76 insertions(+) create mode 100644 book/src/03_concurrency/00_concurrency.md diff --git a/book/src/03_concurrency/00_concurrency.md b/book/src/03_concurrency/00_concurrency.md new file mode 100644 index 0000000..d6e7caf --- /dev/null +++ b/book/src/03_concurrency/00_concurrency.md @@ -0,0 +1,75 @@ +# Concurrency + +Up until now, we've kept things quite simple: all our code was designed for sequential execution, on both the Python and Rust side.\ +It's time to spice things up a bit and explore concurrency! + +In particular, we want to look at: + +- How to run multithreaded routines in Rust, with Python code waiting for them to finish +- How to perform some processing in Rust, while allowing Python code to perform other tasks in the meantime +- How we can synchronize across threads in Rust, keeping Python's GIL in mind + +We'll limit our exploration to threads, without venturing into the realm of `async`/`await`. + +## Threads and processes + +Throughout this chapter we'll often refer to **threads** and **processes**.\ +Let's make sure we're all on the same page about what these terms mean before moving on. + +### Processes + +A **process** is an instance of a running program.\ +The precise anatomy of a process depends on the underlying **operating system** (e.g. Windows or Linux). +Some characteristics are common across most operating systems, though. In particular, a process typically consists of: + +- The program's code +- Its memory space, allocated by the operating system +- A set of resources (file handles, sockets, etc.) + +There can be multiple processes running the same program, each with its own memory space and resources, fully +isolated from one another. + +### Threads + +A **thread** is an execution context **within a process**.\ +Threads share the same memory space and resources as the process that spawned them, thus allowing them to communicate +and share data with one another more easily than processes can. + +### Scheduling + +Threads, just like processes, are a logical construct managed by the operating system.\ +In the end, you can only run one set of instructions at a time on a CPU core, the physical execution unit. +Since there can be many more threads than there are CPU cores, the **operating system's scheduler** is in charge of +deciding which thread to run at any given time, partitioning CPU time among them to maximize throughput and responsiveness. + +## Python concurrency + +Let's start by looking at Python's concurrency model.\ +As we discussed in the [Global Interpreter Lock](../01_intro/05_gil.md) chapter, +Python's GIL prevents multiple threads from executing Python code simultaneously. + +As a result, [thread-based parallelism](https://docs.python.org/3/library/threading.html) has historically +seen limited use in Python, as it doesn't provide the performance benefits one might expect from a +multithreaded application. + +To work around the GIL, Python developers have turned to [**multiprocessing**](https://docs.python.org/3/library/multiprocessing.html): +rather than using multiple threads, they spawn multiple **processes**. +Each process has its own Python interpreter, and thus its own GIL. The operating system schedules these processes +independently, allowing them to run in parallel on multicore CPUs. + +The multiprocessing paradigm is quite powerful, but it's not a good fit for every use case. +In particular, it's not well-suited for problems that require a lot of inter-process communication, since processes +don't share the same memory space. This can lead to performance bottlenecks and/or increased complexity[^mmap]. + +That's where native extensions come in: they can **bypass the GIL** (under certain conditions) and allow us to run +multithreaded code, without the overhead of spawning and coordinating multiple processes. +We'll explore what this looks like for Rust in the next sections. + +### Free-threading mode + +Before moving on it's worth mentioning that Python's concurrency model is likely to undergo some significant changes +in the future due to the introduction of [`CPython`'s free-threading mode](https://docs.python.org/3/howto/free-threading-python.html). +We won't cover it in this book, but it's worth keeping an eye on it as it matures out of the experimental phase. + +[^mmap]: Common workaround include memory-mapped files and shared-memory objects, but these can be quite + difficult to work with. They also suffer from portability issues, as they rely on OS-specific features. \ No newline at end of file diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md index dce39c7..c8058bf 100644 --- a/book/src/SUMMARY.md +++ b/book/src/SUMMARY.md @@ -17,3 +17,4 @@ - [Inheritance](02_classes/05_inheritance.md) - [Parent class](02_classes/06_parent.md) - [Outro](02_classes/07_outro.md) +- [Concurrency](03_concurrency/00_concurrency.md) From 65c1db2d9e6b32883ef38d3e51d992d5fd152d62 Mon Sep 17 00:00:00 2001 From: LukeMathWalker <20745048+LukeMathWalker@users.noreply.github.com> Date: Wed, 8 Jan 2025 11:30:56 +0100 Subject: [PATCH 02/10] Introduction to multithreading --- .../00_introduction.md} | 17 +++-- book/src/SUMMARY.md | 2 +- .../00_introduction/.gitignore | 10 +++ .../00_introduction/README.md | 0 .../00_introduction/pyproject.toml | 18 +++++ .../src/mprocessing/__init__.py | 72 +++++++++++++++++++ .../00_introduction/tests/test_sample.py | 24 +++++++ .../03_multithreading/00_introduction/uv.lock | 68 ++++++++++++++++++ 8 files changed, 204 insertions(+), 7 deletions(-) rename book/src/{03_concurrency/00_concurrency.md => 03_multithreading/00_introduction.md} (81%) create mode 100644 exercises/03_multithreading/00_introduction/.gitignore create mode 100644 exercises/03_multithreading/00_introduction/README.md create mode 100644 exercises/03_multithreading/00_introduction/pyproject.toml create mode 100644 exercises/03_multithreading/00_introduction/src/mprocessing/__init__.py create mode 100644 exercises/03_multithreading/00_introduction/tests/test_sample.py create mode 100644 exercises/03_multithreading/00_introduction/uv.lock diff --git a/book/src/03_concurrency/00_concurrency.md b/book/src/03_multithreading/00_introduction.md similarity index 81% rename from book/src/03_concurrency/00_concurrency.md rename to book/src/03_multithreading/00_introduction.md index d6e7caf..c80e9cc 100644 --- a/book/src/03_concurrency/00_concurrency.md +++ b/book/src/03_multithreading/00_introduction.md @@ -1,4 +1,4 @@ -# Concurrency +# Multithreading Up until now, we've kept things quite simple: all our code was designed for sequential execution, on both the Python and Rust side.\ It's time to spice things up a bit and explore concurrency! @@ -9,7 +9,7 @@ In particular, we want to look at: - How to perform some processing in Rust, while allowing Python code to perform other tasks in the meantime - How we can synchronize across threads in Rust, keeping Python's GIL in mind -We'll limit our exploration to threads, without venturing into the realm of `async`/`await`. +We'll limit our exploration to threads and processes, without venturing into the realm of `async`/`await`. ## Threads and processes @@ -67,9 +67,14 @@ We'll explore what this looks like for Rust in the next sections. ### Free-threading mode -Before moving on it's worth mentioning that Python's concurrency model is likely to undergo some significant changes -in the future due to the introduction of [`CPython`'s free-threading mode](https://docs.python.org/3/howto/free-threading-python.html). -We won't cover it in this book, but it's worth keeping an eye on it as it matures out of the experimental phase. +The section above captures the current state of Python's concurrency model. There are some exciting changes on the horizon, though! -[^mmap]: Common workaround include memory-mapped files and shared-memory objects, but these can be quite +[`CPython`'s free-threading mode](https://docs.python.org/3/howto/free-threading-python.html) is an experimental feature +that aims to remove the GIL entirely.\ +It would allow multiple threads to execute Python code simultaneously, without forcing developers to rely on multiprocessing. + +We won't cover the new free-threading mode in this course, +but it's worth keeping an eye on it as it matures out of the experimental phase. + +[^mmap]: Common workarounds include memory-mapped files and shared-memory objects, but these can be quite difficult to work with. They also suffer from portability issues, as they rely on OS-specific features. \ No newline at end of file diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md index c8058bf..9690d1e 100644 --- a/book/src/SUMMARY.md +++ b/book/src/SUMMARY.md @@ -17,4 +17,4 @@ - [Inheritance](02_classes/05_inheritance.md) - [Parent class](02_classes/06_parent.md) - [Outro](02_classes/07_outro.md) -- [Concurrency](03_concurrency/00_concurrency.md) +- [Multithreading](03_multithreading/00_introduction.md) diff --git a/exercises/03_multithreading/00_introduction/.gitignore b/exercises/03_multithreading/00_introduction/.gitignore new file mode 100644 index 0000000..ae8554d --- /dev/null +++ b/exercises/03_multithreading/00_introduction/.gitignore @@ -0,0 +1,10 @@ +# python generated files +__pycache__/ +*.py[oc] +build/ +dist/ +wheels/ +*.egg-info + +# venv +.venv diff --git a/exercises/03_multithreading/00_introduction/README.md b/exercises/03_multithreading/00_introduction/README.md new file mode 100644 index 0000000..e69de29 diff --git a/exercises/03_multithreading/00_introduction/pyproject.toml b/exercises/03_multithreading/00_introduction/pyproject.toml new file mode 100644 index 0000000..0bc6fa6 --- /dev/null +++ b/exercises/03_multithreading/00_introduction/pyproject.toml @@ -0,0 +1,18 @@ +[project] +name = "mprocessing" +version = "0.1.0" +dependencies = [] +readme = "README.md" +requires-python = ">=3.11" + +[build-system] +requires = ["hatchling"] +build-backend = "hatchling.build" + +[tool.hatch.build.targets.wheel] +packages = ["src/mprocessing"] + +[dependency-groups] +dev = [ + "pytest>=8.2.2", +] diff --git a/exercises/03_multithreading/00_introduction/src/mprocessing/__init__.py b/exercises/03_multithreading/00_introduction/src/mprocessing/__init__.py new file mode 100644 index 0000000..6ea8c9f --- /dev/null +++ b/exercises/03_multithreading/00_introduction/src/mprocessing/__init__.py @@ -0,0 +1,72 @@ +from multiprocessing import Process, Queue + + +# Before diving into how Rust can make our lives easier, let's first get a taste +# of what it feels like to work with multiprocessing in Python. +# +# Return the number of words in `text` using `n_processes` processes. +# You'll need to: +# - create a result queue to store the results of each process +# - launch up to `n` processes in a loop, storing each process handle in a list +# - join each process in a loop, to wait for them to finish +# - drain the result queue into a list +# - sum the results in the list to get the final count +# +# We provide a function to split the text into chunks as well as +# a function to perform the counting in each process. +# +# Relevant links: +# - https://docs.python.org/3/library/multiprocessing.html +def word_count(text: str, n_processes: int) -> int: + result_queue = Queue() + processes = [] + for chunk in split_into_chunks(text, n_processes): + p = Process(target=word_count_task, args=(chunk, result_queue)) + p.start() + processes.append(p) + for p in processes: + p.join() + results = [result_queue.get() for _ in range(len(processes))] + return sum(results) + + +# Compute the number of words in `text` and push the result into `result_queue`. +# This function should be used as the target function for a `Process`. +def word_count_task(text: str, result_queue: 'Queue[int]') -> None: + n_words = len(text.split()) + result_queue.put(n_words) + + +# Splits a string into `n` chunks, ensuring splits occur at whitespace. +def split_into_chunks(s: str, n: int): + if n <= 0: + raise ValueError("Number of chunks 'n' must be greater than 0") + + avg_length = len(s) // n + length = len(s) + start = 0 + + for _ in range(n): + if start >= length: + return # No more content to yield + + # Calculate tentative end index + end = start + avg_length + + # Ensure we don't exceed the string length + if end >= length: + yield s[start:] + return + + # Adjust the end index to the nearest whitespace + while end < length and not s[end].isspace(): + end += 1 + + # If no whitespace was found, return the rest of the string + if end == length: + yield s[start:] + return + + # Yield the chunk and update the start index + yield s[start:end].strip() + start = end + 1 # Move past the whitespace diff --git a/exercises/03_multithreading/00_introduction/tests/test_sample.py b/exercises/03_multithreading/00_introduction/tests/test_sample.py new file mode 100644 index 0000000..733187b --- /dev/null +++ b/exercises/03_multithreading/00_introduction/tests/test_sample.py @@ -0,0 +1,24 @@ +# Modify the Python package under `src` to satisfy the tests. +# Do NOT modify the tests themselves! +import pytest + +from mprocessing import word_count + +def test_word_count_single_process(): + text = "hello world" + assert word_count(text, 1) == 2 + + +def test_word_count_multiple_processes(): + text = "hello world" + assert word_count(text, 2) == 2 + + +def test_word_count_multiple_processes_long_text(): + text = "hello world " * 1000 + assert word_count(text, 2) == 2000 + + +def test_more_processes_than_words(): + text = "hello world" + assert word_count(text, 10) == 2 diff --git a/exercises/03_multithreading/00_introduction/uv.lock b/exercises/03_multithreading/00_introduction/uv.lock new file mode 100644 index 0000000..50cce7d --- /dev/null +++ b/exercises/03_multithreading/00_introduction/uv.lock @@ -0,0 +1,68 @@ +version = 1 +requires-python = ">=3.11" + +[[package]] +name = "colorama" +version = "0.4.6" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d8/53/6f443c9a4a8358a93a6792e2acffb9d9d5cb0a5cfd8802644b7b1c9a02e4/colorama-0.4.6.tar.gz", hash = "sha256:08695f5cb7ed6e0531a20572697297273c47b8cae5a63ffc6d6ed5c201be6e44", size = 27697 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/d1/d6/3965ed04c63042e047cb6a3e6ed1a63a35087b6a609aa3a15ed8ac56c221/colorama-0.4.6-py2.py3-none-any.whl", hash = "sha256:4f1d9991f5acc0ca119f9d443620b77f9d6b33703e51011c16baf57afb285fc6", size = 25335 }, +] + +[[package]] +name = "iniconfig" +version = "2.0.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d7/4b/cbd8e699e64a6f16ca3a8220661b5f83792b3017d0f79807cb8708d33913/iniconfig-2.0.0.tar.gz", hash = "sha256:2d91e135bf72d31a410b17c16da610a82cb55f6b0477d1a902134b24a455b8b3", size = 4646 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/ef/a6/62565a6e1cf69e10f5727360368e451d4b7f58beeac6173dc9db836a5b46/iniconfig-2.0.0-py3-none-any.whl", hash = "sha256:b6a85871a79d2e3b22d2d1b94ac2824226a63c6b741c88f7ae975f18b6778374", size = 5892 }, +] + +[[package]] +name = "mprocessing" +version = "0.1.0" +source = { editable = "." } + +[package.dev-dependencies] +dev = [ + { name = "pytest" }, +] + +[package.metadata] + +[package.metadata.requires-dev] +dev = [{ name = "pytest", specifier = ">=8.2.2" }] + +[[package]] +name = "packaging" +version = "24.2" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d0/63/68dbb6eb2de9cb10ee4c9c14a0148804425e13c4fb20d61cce69f53106da/packaging-24.2.tar.gz", hash = "sha256:c228a6dc5e932d346bc5739379109d49e8853dd8223571c7c5b55260edc0b97f", size = 163950 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/88/ef/eb23f262cca3c0c4eb7ab1933c3b1f03d021f2c48f54763065b6f0e321be/packaging-24.2-py3-none-any.whl", hash = "sha256:09abb1bccd265c01f4a3aa3f7a7db064b36514d2cba19a2f694fe6150451a759", size = 65451 }, +] + +[[package]] +name = "pluggy" +version = "1.5.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/96/2d/02d4312c973c6050a18b314a5ad0b3210edb65a906f868e31c111dede4a6/pluggy-1.5.0.tar.gz", hash = "sha256:2cffa88e94fdc978c4c574f15f9e59b7f4201d439195c3715ca9e2486f1d0cf1", size = 67955 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/88/5f/e351af9a41f866ac3f1fac4ca0613908d9a41741cfcf2228f4ad853b697d/pluggy-1.5.0-py3-none-any.whl", hash = "sha256:44e1ad92c8ca002de6377e165f3e0f1be63266ab4d554740532335b9d75ea669", size = 20556 }, +] + +[[package]] +name = "pytest" +version = "8.3.4" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "colorama", marker = "sys_platform == 'win32'" }, + { name = "iniconfig" }, + { name = "packaging" }, + { name = "pluggy" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/05/35/30e0d83068951d90a01852cb1cef56e5d8a09d20c7f511634cc2f7e0372a/pytest-8.3.4.tar.gz", hash = "sha256:965370d062bce11e73868e0335abac31b4d3de0e82f4007408d242b4f8610761", size = 1445919 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/11/92/76a1c94d3afee238333bc0a42b82935dd8f9cf8ce9e336ff87ee14d9e1cf/pytest-8.3.4-py3-none-any.whl", hash = "sha256:50e16d954148559c9a74109af1eaf0c945ba2d8f30f0a3d3335edde19788b6f6", size = 343083 }, +] From 08b7bb3bb76a1e75d9135cc3fccb4f3853ab1dd2 Mon Sep 17 00:00:00 2001 From: LukeMathWalker <20745048+LukeMathWalker@users.noreply.github.com> Date: Wed, 8 Jan 2025 13:47:51 +0100 Subject: [PATCH 03/10] Cover both Python threads and processes --- book/src/03_multithreading/00_introduction.md | 19 +++-- .../03_multithreading/01_python_threads.md | 52 +++++++++++++ book/src/SUMMARY.md | 1 + .../src/mprocessing/__init__.py | 3 - .../01_python_threads/.gitignore | 10 +++ .../01_python_threads/README.md | 0 .../01_python_threads/pyproject.toml | 18 +++++ .../src/mthreading/__init__.py | 74 +++++++++++++++++++ .../01_python_threads/tests/test_sample.py | 24 ++++++ .../01_python_threads/uv.lock | 68 +++++++++++++++++ 10 files changed, 256 insertions(+), 13 deletions(-) create mode 100644 book/src/03_multithreading/01_python_threads.md create mode 100644 exercises/03_multithreading/01_python_threads/.gitignore create mode 100644 exercises/03_multithreading/01_python_threads/README.md create mode 100644 exercises/03_multithreading/01_python_threads/pyproject.toml create mode 100644 exercises/03_multithreading/01_python_threads/src/mthreading/__init__.py create mode 100644 exercises/03_multithreading/01_python_threads/tests/test_sample.py create mode 100644 exercises/03_multithreading/01_python_threads/uv.lock diff --git a/book/src/03_multithreading/00_introduction.md b/book/src/03_multithreading/00_introduction.md index c80e9cc..1c0068f 100644 --- a/book/src/03_multithreading/00_introduction.md +++ b/book/src/03_multithreading/00_introduction.md @@ -1,19 +1,16 @@ # Multithreading -Up until now, we've kept things quite simple: all our code was designed for sequential execution, on both the Python and Rust side.\ -It's time to spice things up a bit and explore concurrency! +Up until now, we've kept things quite simple: all our code was designed for sequential execution, on both the Python and Rust side. +It's time to spice things up a bit and explore concurrency[^scope]! -In particular, we want to look at: - -- How to run multithreaded routines in Rust, with Python code waiting for them to finish -- How to perform some processing in Rust, while allowing Python code to perform other tasks in the meantime -- How we can synchronize across threads in Rust, keeping Python's GIL in mind - -We'll limit our exploration to threads and processes, without venturing into the realm of `async`/`await`. +We won't dive straight into Rust this time.\ +We'll start by solving a few parallel processing problems in Python, to get a feel for both Python's +multiprocessing and multithreading modules. Once we have a good grasp of how they work, we'll port our solutions +over to Rust. ## Threads and processes -Throughout this chapter we'll often refer to **threads** and **processes**.\ +Throughout this chapter we'll often mention **threads** and **processes**.\ Let's make sure we're all on the same page about what these terms mean before moving on. ### Processes @@ -76,5 +73,7 @@ It would allow multiple threads to execute Python code simultaneously, without f We won't cover the new free-threading mode in this course, but it's worth keeping an eye on it as it matures out of the experimental phase. +[^scope]: We'll limit our exploration to threads and processes, without venturing into the realm of `async`/`await`. + [^mmap]: Common workarounds include memory-mapped files and shared-memory objects, but these can be quite difficult to work with. They also suffer from portability issues, as they rely on OS-specific features. \ No newline at end of file diff --git a/book/src/03_multithreading/01_python_threads.md b/book/src/03_multithreading/01_python_threads.md new file mode 100644 index 0000000..719fa84 --- /dev/null +++ b/book/src/03_multithreading/01_python_threads.md @@ -0,0 +1,52 @@ +# Python's threads + +## The overhead of multiprocessing + +Let's have a look at the solution for the previous exercise: + +```python +from multiprocessing import Process, Queue + +def word_count(text: str, n_processes: int) -> int: + result_queue = Queue() + processes = [] + for chunk in split_into_chunks(text, n_processes): + p = Process(target=word_count_task, args=(chunk, result_queue)) + p.start() + processes.append(p) + for p in processes: + p.join() + results = [result_queue.get() for _ in range(len(processes))] + return sum(results) +``` + +Let's focus, in particular, on process creation: + +```python + p = Process(target=word_count_task, args=(chunk, result_queue)) +``` + +The parent process (the one executing `word_count`) doesn't share memory with the child process (the one +spawned via `p.start()`). As a result, the child process can't access `chunk` or `result_queue` directly. +Instead, it needs to be provided a **deep copy** of these objects[^pickle].\ +That's not a major issue if the data is small, but it can become a problem on larger datasets.\ +For example, if we're working with 8 GB of text, we'll end up with at least 16 GB of memory usage: 8 GB for the +parent process and 8 GB split among the child processes. Not ideal! + +We could try to circumvent this issue by using [shared memory](https://docs.python.org/3/library/multiprocessing.shared_memory.html), +but that's not always possible nor easy to do. + +## Threads to the rescue + +As we discussed in the previous chapter, distinct processes don't share memory. Threads within the same process, on the other hand, do. +If we restructure our solution to use threads instead of processes, we can avoid the overhead of deep copying data. + +Let's try! + + +[^pickle]: To be more precise, the `multiprocessing` module uses the `pickle` module to serialize the objects + that must be passed as arguments to the child process. + The serialized data is then sent to the child process, as a byte stream, over an operating system pipe. + On the other side of the pipe, the child process deserializes the byte stream back into Python objects using `pickle` + and passes them to the target function.\ + This all system has higher overhead than a "simple" deep copy. diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md index 9690d1e..a8b075a 100644 --- a/book/src/SUMMARY.md +++ b/book/src/SUMMARY.md @@ -18,3 +18,4 @@ - [Parent class](02_classes/06_parent.md) - [Outro](02_classes/07_outro.md) - [Multithreading](03_multithreading/00_introduction.md) + - [Python's threads](03_multithreading/01_python_threads.md) diff --git a/exercises/03_multithreading/00_introduction/src/mprocessing/__init__.py b/exercises/03_multithreading/00_introduction/src/mprocessing/__init__.py index 6ea8c9f..688270c 100644 --- a/exercises/03_multithreading/00_introduction/src/mprocessing/__init__.py +++ b/exercises/03_multithreading/00_introduction/src/mprocessing/__init__.py @@ -1,9 +1,6 @@ from multiprocessing import Process, Queue -# Before diving into how Rust can make our lives easier, let's first get a taste -# of what it feels like to work with multiprocessing in Python. -# # Return the number of words in `text` using `n_processes` processes. # You'll need to: # - create a result queue to store the results of each process diff --git a/exercises/03_multithreading/01_python_threads/.gitignore b/exercises/03_multithreading/01_python_threads/.gitignore new file mode 100644 index 0000000..ae8554d --- /dev/null +++ b/exercises/03_multithreading/01_python_threads/.gitignore @@ -0,0 +1,10 @@ +# python generated files +__pycache__/ +*.py[oc] +build/ +dist/ +wheels/ +*.egg-info + +# venv +.venv diff --git a/exercises/03_multithreading/01_python_threads/README.md b/exercises/03_multithreading/01_python_threads/README.md new file mode 100644 index 0000000..e69de29 diff --git a/exercises/03_multithreading/01_python_threads/pyproject.toml b/exercises/03_multithreading/01_python_threads/pyproject.toml new file mode 100644 index 0000000..6ffa8c3 --- /dev/null +++ b/exercises/03_multithreading/01_python_threads/pyproject.toml @@ -0,0 +1,18 @@ +[project] +name = "mthreading" +version = "0.1.0" +dependencies = [] +readme = "README.md" +requires-python = ">=3.11" + +[build-system] +requires = ["hatchling"] +build-backend = "hatchling.build" + +[tool.hatch.build.targets.wheel] +packages = ["src/mthreading"] + +[dependency-groups] +dev = [ + "pytest>=8.2.2", +] diff --git a/exercises/03_multithreading/01_python_threads/src/mthreading/__init__.py b/exercises/03_multithreading/01_python_threads/src/mthreading/__init__.py new file mode 100644 index 0000000..d213a74 --- /dev/null +++ b/exercises/03_multithreading/01_python_threads/src/mthreading/__init__.py @@ -0,0 +1,74 @@ +from threading import Thread +from queue import Queue + + +# Return the number of words in `text` using `n_processes` processes. +# You'll need to: +# - create a queue to store the results of each process +# - launch up to `n` threads in a loop, storing each thread handle in a list +# - join each thread in a loop, to wait for them to finish +# - drain the result queue into a list +# - sum the results in the list to get the final count +# +# We provide a function to split the text into chunks as well as +# a function to perform the counting in each thread. +# +# Relevant links: +# - https://docs.python.org/3/library/threading.html +# - https://docs.python.org/3/library/queue.html +def word_count(text: str, n_threads: int) -> int: + result_queue = Queue() + threads = [] + + for chunk in split_into_chunks(text, n_threads): + t = Thread(target=word_count_task, args=(chunk, result_queue)) + t.start() + threads.append(t) + + for t in threads: + t.join() + + results = [result_queue.get() for _ in range(len(threads))] + return sum(results) + + +# Compute the number of words in `text` and push the result into `result_queue`. +# This function should be used as the target function for a `Process`. +def word_count_task(text: str, result_queue: 'Queue[int]') -> None: + n_words = len(text.split()) + result_queue.put(n_words) + + +# Splits a string into `n` chunks, ensuring splits occur at whitespace. +def split_into_chunks(s: str, n: int): + if n <= 0: + raise ValueError("Number of chunks 'n' must be greater than 0") + + avg_length = len(s) // n + length = len(s) + start = 0 + + for _ in range(n): + if start >= length: + return # No more content to yield + + # Calculate tentative end index + end = start + avg_length + + # Ensure we don't exceed the string length + if end >= length: + yield s[start:] + return + + # Adjust the end index to the nearest whitespace + while end < length and not s[end].isspace(): + end += 1 + + # If no whitespace was found, return the rest of the string + if end == length: + yield s[start:] + return + + # Yield the chunk and update the start index + yield s[start:end].strip() + start = end + 1 # Move past the whitespace diff --git a/exercises/03_multithreading/01_python_threads/tests/test_sample.py b/exercises/03_multithreading/01_python_threads/tests/test_sample.py new file mode 100644 index 0000000..cb21d40 --- /dev/null +++ b/exercises/03_multithreading/01_python_threads/tests/test_sample.py @@ -0,0 +1,24 @@ +# Modify the Python package under `src` to satisfy the tests. +# Do NOT modify the tests themselves! +import pytest + +from mthreading import word_count + +def test_word_count_single_process(): + text = "hello world" + assert word_count(text, 1) == 2 + + +def test_word_count_multiple_processes(): + text = "hello world" + assert word_count(text, 2) == 2 + + +def test_word_count_multiple_processes_long_text(): + text = "hello world " * 1000 + assert word_count(text, 2) == 2000 + + +def test_more_processes_than_words(): + text = "hello world" + assert word_count(text, 10) == 2 diff --git a/exercises/03_multithreading/01_python_threads/uv.lock b/exercises/03_multithreading/01_python_threads/uv.lock new file mode 100644 index 0000000..526fa18 --- /dev/null +++ b/exercises/03_multithreading/01_python_threads/uv.lock @@ -0,0 +1,68 @@ +version = 1 +requires-python = ">=3.11" + +[[package]] +name = "colorama" +version = "0.4.6" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d8/53/6f443c9a4a8358a93a6792e2acffb9d9d5cb0a5cfd8802644b7b1c9a02e4/colorama-0.4.6.tar.gz", hash = "sha256:08695f5cb7ed6e0531a20572697297273c47b8cae5a63ffc6d6ed5c201be6e44", size = 27697 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/d1/d6/3965ed04c63042e047cb6a3e6ed1a63a35087b6a609aa3a15ed8ac56c221/colorama-0.4.6-py2.py3-none-any.whl", hash = "sha256:4f1d9991f5acc0ca119f9d443620b77f9d6b33703e51011c16baf57afb285fc6", size = 25335 }, +] + +[[package]] +name = "iniconfig" +version = "2.0.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d7/4b/cbd8e699e64a6f16ca3a8220661b5f83792b3017d0f79807cb8708d33913/iniconfig-2.0.0.tar.gz", hash = "sha256:2d91e135bf72d31a410b17c16da610a82cb55f6b0477d1a902134b24a455b8b3", size = 4646 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/ef/a6/62565a6e1cf69e10f5727360368e451d4b7f58beeac6173dc9db836a5b46/iniconfig-2.0.0-py3-none-any.whl", hash = "sha256:b6a85871a79d2e3b22d2d1b94ac2824226a63c6b741c88f7ae975f18b6778374", size = 5892 }, +] + +[[package]] +name = "mthreading" +version = "0.1.0" +source = { editable = "." } + +[package.dev-dependencies] +dev = [ + { name = "pytest" }, +] + +[package.metadata] + +[package.metadata.requires-dev] +dev = [{ name = "pytest", specifier = ">=8.2.2" }] + +[[package]] +name = "packaging" +version = "24.2" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d0/63/68dbb6eb2de9cb10ee4c9c14a0148804425e13c4fb20d61cce69f53106da/packaging-24.2.tar.gz", hash = "sha256:c228a6dc5e932d346bc5739379109d49e8853dd8223571c7c5b55260edc0b97f", size = 163950 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/88/ef/eb23f262cca3c0c4eb7ab1933c3b1f03d021f2c48f54763065b6f0e321be/packaging-24.2-py3-none-any.whl", hash = "sha256:09abb1bccd265c01f4a3aa3f7a7db064b36514d2cba19a2f694fe6150451a759", size = 65451 }, +] + +[[package]] +name = "pluggy" +version = "1.5.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/96/2d/02d4312c973c6050a18b314a5ad0b3210edb65a906f868e31c111dede4a6/pluggy-1.5.0.tar.gz", hash = "sha256:2cffa88e94fdc978c4c574f15f9e59b7f4201d439195c3715ca9e2486f1d0cf1", size = 67955 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/88/5f/e351af9a41f866ac3f1fac4ca0613908d9a41741cfcf2228f4ad853b697d/pluggy-1.5.0-py3-none-any.whl", hash = "sha256:44e1ad92c8ca002de6377e165f3e0f1be63266ab4d554740532335b9d75ea669", size = 20556 }, +] + +[[package]] +name = "pytest" +version = "8.3.4" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "colorama", marker = "sys_platform == 'win32'" }, + { name = "iniconfig" }, + { name = "packaging" }, + { name = "pluggy" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/05/35/30e0d83068951d90a01852cb1cef56e5d8a09d20c7f511634cc2f7e0372a/pytest-8.3.4.tar.gz", hash = "sha256:965370d062bce11e73868e0335abac31b4d3de0e82f4007408d242b4f8610761", size = 1445919 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/11/92/76a1c94d3afee238333bc0a42b82935dd8f9cf8ce9e336ff87ee14d9e1cf/pytest-8.3.4-py3-none-any.whl", hash = "sha256:50e16d954148559c9a74109af1eaf0c945ba2d8f30f0a3d3335edde19788b6f6", size = 343083 }, +] From 4ded7ebee4b2f7e73d65753439cd2c3a22cbb22b Mon Sep 17 00:00:00 2001 From: LukeMathWalker <20745048+LukeMathWalker@users.noreply.github.com> Date: Wed, 8 Jan 2025 15:17:27 +0100 Subject: [PATCH 04/10] Rust threads --- Cargo.lock | 7 ++ Cargo.toml | 4 + book/src/03_concurrency/00_introduction.md | 48 +++++++++++ .../01_python_threads.md | 35 ++++++-- book/src/03_concurrency/02_gil.md | 64 ++++++++++++++ book/src/03_multithreading/00_introduction.md | 79 ------------------ book/src/SUMMARY.md | 5 +- .../00_introduction/.gitignore | 0 .../00_introduction/README.md | 0 .../00_introduction/pyproject.toml | 0 .../src/mprocessing/__init__.py | 0 .../00_introduction/tests/test_sample.py | 0 .../00_introduction/uv.lock | 0 .../01_python_threads/.gitignore | 0 .../01_python_threads/README.md | 0 .../01_python_threads/pyproject.toml | 0 .../src/mthreading/__init__.py | 0 .../01_python_threads/tests/test_sample.py | 0 .../01_python_threads/uv.lock | 0 exercises/03_concurrency/02_gil/Cargo.toml | 10 +++ .../03_concurrency/02_gil/pyproject.toml | 29 +++++++ .../03_concurrency/02_gil/sample/.gitignore | 10 +++ .../03_concurrency/02_gil/sample/README.md | 0 .../02_gil/sample/pyproject.toml | 19 +++++ .../02_gil/sample/src/sample/__init__.py | 1 + .../02_gil/sample/tests/test_sample.py | 24 ++++++ exercises/03_concurrency/02_gil/src/lib.rs | 81 ++++++++++++++++++ exercises/03_concurrency/02_gil/uv.lock | 83 +++++++++++++++++++ 28 files changed, 411 insertions(+), 88 deletions(-) create mode 100644 book/src/03_concurrency/00_introduction.md rename book/src/{03_multithreading => 03_concurrency}/01_python_threads.md (51%) create mode 100644 book/src/03_concurrency/02_gil.md delete mode 100644 book/src/03_multithreading/00_introduction.md rename exercises/{03_multithreading => 03_concurrency}/00_introduction/.gitignore (100%) rename exercises/{03_multithreading => 03_concurrency}/00_introduction/README.md (100%) rename exercises/{03_multithreading => 03_concurrency}/00_introduction/pyproject.toml (100%) rename exercises/{03_multithreading => 03_concurrency}/00_introduction/src/mprocessing/__init__.py (100%) rename exercises/{03_multithreading => 03_concurrency}/00_introduction/tests/test_sample.py (100%) rename exercises/{03_multithreading => 03_concurrency}/00_introduction/uv.lock (100%) rename exercises/{03_multithreading => 03_concurrency}/01_python_threads/.gitignore (100%) rename exercises/{03_multithreading => 03_concurrency}/01_python_threads/README.md (100%) rename exercises/{03_multithreading => 03_concurrency}/01_python_threads/pyproject.toml (100%) rename exercises/{03_multithreading => 03_concurrency}/01_python_threads/src/mthreading/__init__.py (100%) rename exercises/{03_multithreading => 03_concurrency}/01_python_threads/tests/test_sample.py (100%) rename exercises/{03_multithreading => 03_concurrency}/01_python_threads/uv.lock (100%) create mode 100644 exercises/03_concurrency/02_gil/Cargo.toml create mode 100644 exercises/03_concurrency/02_gil/pyproject.toml create mode 100644 exercises/03_concurrency/02_gil/sample/.gitignore create mode 100644 exercises/03_concurrency/02_gil/sample/README.md create mode 100644 exercises/03_concurrency/02_gil/sample/pyproject.toml create mode 100644 exercises/03_concurrency/02_gil/sample/src/sample/__init__.py create mode 100644 exercises/03_concurrency/02_gil/sample/tests/test_sample.py create mode 100644 exercises/03_concurrency/02_gil/src/lib.rs create mode 100644 exercises/03_concurrency/02_gil/uv.lock diff --git a/Cargo.lock b/Cargo.lock index 39259fe..a71f064 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -61,6 +61,13 @@ dependencies = [ "pyo3", ] +[[package]] +name = "gil2" +version = "0.1.0" +dependencies = [ + "pyo3", +] + [[package]] name = "heck" version = "0.5.0" diff --git a/Cargo.toml b/Cargo.toml index 368b8be..fa09075 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -1,5 +1,9 @@ [workspace] members = ["exercises/*/*", "verifier"] +exclude = [ + "exercises/03_concurrency/00_introduction", + "exercises/03_concurrency/01_python_threads", +] resolver = "2" [workspace.dependencies] diff --git a/book/src/03_concurrency/00_introduction.md b/book/src/03_concurrency/00_introduction.md new file mode 100644 index 0000000..0540075 --- /dev/null +++ b/book/src/03_concurrency/00_introduction.md @@ -0,0 +1,48 @@ +# Concurrency + +All our code so far has been designed for sequential execution, on both the Python and Rust side. +It's time to spice things up a bit and explore concurrency[^scope]! + +We won't dive straight into Rust this time.\ +We'll start by solving a few parallel processing problems in Python, to get a feel for Python's capabilities and limitations. +Once we have a good grasp of what's possible there, we'll port our solutions over to Rust. + +## Multiprocessing + +If you've ever tried to write parallel code in Python, you've probably come across the `multiprocessing` module. +Before we dive into the details, let's take a step back and review the terminology we'll be using. + +### Processes + +A **process** is an instance of a running program.\ +The precise anatomy of a process depends on the underlying **operating system** (e.g. Windows or Linux). +Some characteristics are common across most operating systems, though. In particular, a process typically consists of: + +- The program's code +- Its memory space, allocated by the operating system +- A set of resources (file handles, sockets, etc.) + +There can be multiple processes running the same program, each with its own memory space and resources, fully +isolated from one another.\ +The **operating system's scheduler** is in charge of deciding which process to run at any given time, partitioning CPU time +among them to maximize throughput and/or responsiveness. + +### The `multiprocessing` module + +Python's `multiprocessing` module allows us to spawn new processes, each running its own Python interpreter. + +A process is created by invoking the `Process` constructor with a target function to execute as well as +any arguments that function might need. +The process is launched by calling its `start` method, and we can wait for it to finish by calling `join`. + +If we want to communicate between processes, we can use `Queue` objects, which are shared between processes. +These queues try to abstract away the complexities of inter-process communication, allowing us to pass messages +between our processes in a relatively straightforward manner. + +## References: + +- [`multiprocessing` module](https://docs.python.org/3/library/multiprocessing.html) +- [`Process` class](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Process) +- [`Queue` class](https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Queue) + +[^scope]: We'll limit our exploration to threads and processes, without venturing into the realm of `async`/`await`. diff --git a/book/src/03_multithreading/01_python_threads.md b/book/src/03_concurrency/01_python_threads.md similarity index 51% rename from book/src/03_multithreading/01_python_threads.md rename to book/src/03_concurrency/01_python_threads.md index 719fa84..fb706f7 100644 --- a/book/src/03_multithreading/01_python_threads.md +++ b/book/src/03_concurrency/01_python_threads.md @@ -1,4 +1,4 @@ -# Python's threads +# Threads ## The overhead of multiprocessing @@ -33,16 +33,34 @@ That's not a major issue if the data is small, but it can become a problem on la For example, if we're working with 8 GB of text, we'll end up with at least 16 GB of memory usage: 8 GB for the parent process and 8 GB split among the child processes. Not ideal! -We could try to circumvent this issue by using [shared memory](https://docs.python.org/3/library/multiprocessing.shared_memory.html), -but that's not always possible nor easy to do. +We could try to circumvent this issue[^mmap], but that's not always possible nor easy to do.\ +A more straightforward solution is to use **threads** instead of processes. -## Threads to the rescue +## Threads -As we discussed in the previous chapter, distinct processes don't share memory. Threads within the same process, on the other hand, do. -If we restructure our solution to use threads instead of processes, we can avoid the overhead of deep copying data. +A **thread** is an execution context **within a process**.\ +Threads share the same memory space and resources as the process that spawned them, thus allowing them to communicate +and share data with one another more easily than processes can. -Let's try! +Threads, just like processes, are operating system constructs.\ +The operating system's scheduler is in charge of deciding which thread to run at any given time, partitioning CPU time +among them. +## The `threading` module + +Python's `threading` module provides a high-level interface for working with threads.\ +The API of the `Thread` class, in particular, mirrors what you already know from the `Process` class: + +- A thread is created by calling the `Thread` constructor and passing it a target function to execute as well as + any arguments that function might need. +- The thread is launched by calling its `start` method, and we can wait for it to finish by calling `join`. +- If we want to communicate between threads, we can use `Queue` objects, from the `queue` module, which are shared between threads. + +## References: + +- [`threading` module](https://docs.python.org/3/library/threading.html) +- [`Thread` class](https://docs.python.org/3/library/threading.html#threading.Thread) +- [`Queue` class](https://docs.python.org/3/library/queue.html) [^pickle]: To be more precise, the `multiprocessing` module uses the `pickle` module to serialize the objects that must be passed as arguments to the child process. @@ -50,3 +68,6 @@ Let's try! On the other side of the pipe, the child process deserializes the byte stream back into Python objects using `pickle` and passes them to the target function.\ This all system has higher overhead than a "simple" deep copy. + +[^mmap]: Common workarounds include memory-mapped files and shared-memory objects, but these can be quite + difficult to work with. They also suffer from portability issues, as they rely on OS-specific features. diff --git a/book/src/03_concurrency/02_gil.md b/book/src/03_concurrency/02_gil.md new file mode 100644 index 0000000..709c533 --- /dev/null +++ b/book/src/03_concurrency/02_gil.md @@ -0,0 +1,64 @@ +# The GIL problem + +## Concurrent, yes, but not parallel + +On the surface, our thread-based solution seems to address all the issues we identified in the `multiprocessing` module: + +```python +from threading import Process +from queue import Queue + +def word_count(text: str, n_threads: int) -> int: + result_queue = Queue() + threads = [] + + for chunk in split_into_chunks(text, n_threads): + t = Thread(target=word_count_task, args=(chunk, result_queue)) + t.start() + threads.append(t) + + for t in threads: + t.join() + + results = [result_queue.get() for _ in range(len(threads))] + return sum(results) +``` + +When a thread is created, we are no longer cloning the text chunk nor incurring the overhead of inter-process communication: + +```python + t = Thread(target=word_count_task, args=(chunk, result_queue)) +``` + +Nonetheless, there's a major issue with this code: **it won't actually use multiple CPU cores**.\ +It will run sequentially, even if we pass `n_threads > 1` and multiple CPU cores are available. + +## Python concurrency + +You guessed it: the infamous Global Interpreter Lock (GIL) is to blame. +As we discussed in the [GIL chapter](../01_intro/05_gil.md), +Python's GIL prevents multiple threads from executing Python code simultaneously[^free-threading]. + +As a result, [thread-based parallelism](https://docs.python.org/3/library/threading.html) has historically +seen limited use in Python, as it doesn't provide the performance benefits one might expect from a +multithreaded application. + +That's why the `multiprocessing` module is so popular: it allows Python developers to bypass the GIL. +Each process has its own Python interpreter, and thus its own GIL. The operating system schedules these processes +independently, allowing them to run in parallel on multicore CPUs. + +But, as we've seen, multiprocessing comes with its own set of challenges. + +## Native extensions + +There's a third way to achieve parallelism in Python: **native extensions**.\ +We must [be holding the GIL](../01_intro/05_gil.html#pythonpy) when we invoke a Rust function from Python, but +pure Rust threads are not affected by the GIL, as long as they don't need to interact with Python objects. + +Let's rewrite again our `word_count` function, this time in Rust! + +[^free-threading]: This is the current state of Python's concurrency model. There are some exciting changes on the horizon, though! + [`CPython`'s free-threading mode](https://docs.python.org/3/howto/free-threading-python.html) is an experimental feature + that aims to remove the GIL entirely. + It would allow multiple threads to execute Python code simultaneously, without forcing developers to rely on multiprocessing. + We won't cover the new free-threading mode in this course, but it's worth keeping an eye on it as it matures out of the experimental phase. diff --git a/book/src/03_multithreading/00_introduction.md b/book/src/03_multithreading/00_introduction.md deleted file mode 100644 index 1c0068f..0000000 --- a/book/src/03_multithreading/00_introduction.md +++ /dev/null @@ -1,79 +0,0 @@ -# Multithreading - -Up until now, we've kept things quite simple: all our code was designed for sequential execution, on both the Python and Rust side. -It's time to spice things up a bit and explore concurrency[^scope]! - -We won't dive straight into Rust this time.\ -We'll start by solving a few parallel processing problems in Python, to get a feel for both Python's -multiprocessing and multithreading modules. Once we have a good grasp of how they work, we'll port our solutions -over to Rust. - -## Threads and processes - -Throughout this chapter we'll often mention **threads** and **processes**.\ -Let's make sure we're all on the same page about what these terms mean before moving on. - -### Processes - -A **process** is an instance of a running program.\ -The precise anatomy of a process depends on the underlying **operating system** (e.g. Windows or Linux). -Some characteristics are common across most operating systems, though. In particular, a process typically consists of: - -- The program's code -- Its memory space, allocated by the operating system -- A set of resources (file handles, sockets, etc.) - -There can be multiple processes running the same program, each with its own memory space and resources, fully -isolated from one another. - -### Threads - -A **thread** is an execution context **within a process**.\ -Threads share the same memory space and resources as the process that spawned them, thus allowing them to communicate -and share data with one another more easily than processes can. - -### Scheduling - -Threads, just like processes, are a logical construct managed by the operating system.\ -In the end, you can only run one set of instructions at a time on a CPU core, the physical execution unit. -Since there can be many more threads than there are CPU cores, the **operating system's scheduler** is in charge of -deciding which thread to run at any given time, partitioning CPU time among them to maximize throughput and responsiveness. - -## Python concurrency - -Let's start by looking at Python's concurrency model.\ -As we discussed in the [Global Interpreter Lock](../01_intro/05_gil.md) chapter, -Python's GIL prevents multiple threads from executing Python code simultaneously. - -As a result, [thread-based parallelism](https://docs.python.org/3/library/threading.html) has historically -seen limited use in Python, as it doesn't provide the performance benefits one might expect from a -multithreaded application. - -To work around the GIL, Python developers have turned to [**multiprocessing**](https://docs.python.org/3/library/multiprocessing.html): -rather than using multiple threads, they spawn multiple **processes**. -Each process has its own Python interpreter, and thus its own GIL. The operating system schedules these processes -independently, allowing them to run in parallel on multicore CPUs. - -The multiprocessing paradigm is quite powerful, but it's not a good fit for every use case. -In particular, it's not well-suited for problems that require a lot of inter-process communication, since processes -don't share the same memory space. This can lead to performance bottlenecks and/or increased complexity[^mmap]. - -That's where native extensions come in: they can **bypass the GIL** (under certain conditions) and allow us to run -multithreaded code, without the overhead of spawning and coordinating multiple processes. -We'll explore what this looks like for Rust in the next sections. - -### Free-threading mode - -The section above captures the current state of Python's concurrency model. There are some exciting changes on the horizon, though! - -[`CPython`'s free-threading mode](https://docs.python.org/3/howto/free-threading-python.html) is an experimental feature -that aims to remove the GIL entirely.\ -It would allow multiple threads to execute Python code simultaneously, without forcing developers to rely on multiprocessing. - -We won't cover the new free-threading mode in this course, -but it's worth keeping an eye on it as it matures out of the experimental phase. - -[^scope]: We'll limit our exploration to threads and processes, without venturing into the realm of `async`/`await`. - -[^mmap]: Common workarounds include memory-mapped files and shared-memory objects, but these can be quite - difficult to work with. They also suffer from portability issues, as they rely on OS-specific features. \ No newline at end of file diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md index a8b075a..be5d7ec 100644 --- a/book/src/SUMMARY.md +++ b/book/src/SUMMARY.md @@ -17,5 +17,6 @@ - [Inheritance](02_classes/05_inheritance.md) - [Parent class](02_classes/06_parent.md) - [Outro](02_classes/07_outro.md) -- [Multithreading](03_multithreading/00_introduction.md) - - [Python's threads](03_multithreading/01_python_threads.md) +- [Concurrency](03_concurrency/00_introduction.md) + - [Python's threads](03_concurrency/01_python_threads.md) + - [The GIL problem](03_concurrency/02_gil.md) diff --git a/exercises/03_multithreading/00_introduction/.gitignore b/exercises/03_concurrency/00_introduction/.gitignore similarity index 100% rename from exercises/03_multithreading/00_introduction/.gitignore rename to exercises/03_concurrency/00_introduction/.gitignore diff --git a/exercises/03_multithreading/00_introduction/README.md b/exercises/03_concurrency/00_introduction/README.md similarity index 100% rename from exercises/03_multithreading/00_introduction/README.md rename to exercises/03_concurrency/00_introduction/README.md diff --git a/exercises/03_multithreading/00_introduction/pyproject.toml b/exercises/03_concurrency/00_introduction/pyproject.toml similarity index 100% rename from exercises/03_multithreading/00_introduction/pyproject.toml rename to exercises/03_concurrency/00_introduction/pyproject.toml diff --git a/exercises/03_multithreading/00_introduction/src/mprocessing/__init__.py b/exercises/03_concurrency/00_introduction/src/mprocessing/__init__.py similarity index 100% rename from exercises/03_multithreading/00_introduction/src/mprocessing/__init__.py rename to exercises/03_concurrency/00_introduction/src/mprocessing/__init__.py diff --git a/exercises/03_multithreading/00_introduction/tests/test_sample.py b/exercises/03_concurrency/00_introduction/tests/test_sample.py similarity index 100% rename from exercises/03_multithreading/00_introduction/tests/test_sample.py rename to exercises/03_concurrency/00_introduction/tests/test_sample.py diff --git a/exercises/03_multithreading/00_introduction/uv.lock b/exercises/03_concurrency/00_introduction/uv.lock similarity index 100% rename from exercises/03_multithreading/00_introduction/uv.lock rename to exercises/03_concurrency/00_introduction/uv.lock diff --git a/exercises/03_multithreading/01_python_threads/.gitignore b/exercises/03_concurrency/01_python_threads/.gitignore similarity index 100% rename from exercises/03_multithreading/01_python_threads/.gitignore rename to exercises/03_concurrency/01_python_threads/.gitignore diff --git a/exercises/03_multithreading/01_python_threads/README.md b/exercises/03_concurrency/01_python_threads/README.md similarity index 100% rename from exercises/03_multithreading/01_python_threads/README.md rename to exercises/03_concurrency/01_python_threads/README.md diff --git a/exercises/03_multithreading/01_python_threads/pyproject.toml b/exercises/03_concurrency/01_python_threads/pyproject.toml similarity index 100% rename from exercises/03_multithreading/01_python_threads/pyproject.toml rename to exercises/03_concurrency/01_python_threads/pyproject.toml diff --git a/exercises/03_multithreading/01_python_threads/src/mthreading/__init__.py b/exercises/03_concurrency/01_python_threads/src/mthreading/__init__.py similarity index 100% rename from exercises/03_multithreading/01_python_threads/src/mthreading/__init__.py rename to exercises/03_concurrency/01_python_threads/src/mthreading/__init__.py diff --git a/exercises/03_multithreading/01_python_threads/tests/test_sample.py b/exercises/03_concurrency/01_python_threads/tests/test_sample.py similarity index 100% rename from exercises/03_multithreading/01_python_threads/tests/test_sample.py rename to exercises/03_concurrency/01_python_threads/tests/test_sample.py diff --git a/exercises/03_multithreading/01_python_threads/uv.lock b/exercises/03_concurrency/01_python_threads/uv.lock similarity index 100% rename from exercises/03_multithreading/01_python_threads/uv.lock rename to exercises/03_concurrency/01_python_threads/uv.lock diff --git a/exercises/03_concurrency/02_gil/Cargo.toml b/exercises/03_concurrency/02_gil/Cargo.toml new file mode 100644 index 0000000..56942f0 --- /dev/null +++ b/exercises/03_concurrency/02_gil/Cargo.toml @@ -0,0 +1,10 @@ +[package] +name = "gil2" +version = "0.1.0" +edition = "2021" + +[lib] +crate-type = ["cdylib"] + +[dependencies] +pyo3 = { workspace = true } diff --git a/exercises/03_concurrency/02_gil/pyproject.toml b/exercises/03_concurrency/02_gil/pyproject.toml new file mode 100644 index 0000000..8dff496 --- /dev/null +++ b/exercises/03_concurrency/02_gil/pyproject.toml @@ -0,0 +1,29 @@ +[build-system] +requires = ["maturin>=1.8,<2.0"] +build-backend = "maturin" + +[project] +name = "gil2" +requires-python = ">=3.11" +classifiers = [ + "Programming Language :: Rust", + "Programming Language :: Python :: Implementation :: CPython", + "Programming Language :: Python :: Implementation :: PyPy", +] +version = "0.1.0" + +[tool.maturin] +features = ["pyo3/extension-module"] + +[tool.uv.config-settings] +# Faster feedback on Rust builds +build-args = "--profile=dev" + +[tool.uv] +cache-keys = ["pyproject.toml", "Cargo.toml", "src/*.rs"] + +[tool.uv.sources] +gil2 = { workspace = true } + +[tool.uv.workspace] +members = ["sample"] diff --git a/exercises/03_concurrency/02_gil/sample/.gitignore b/exercises/03_concurrency/02_gil/sample/.gitignore new file mode 100644 index 0000000..ae8554d --- /dev/null +++ b/exercises/03_concurrency/02_gil/sample/.gitignore @@ -0,0 +1,10 @@ +# python generated files +__pycache__/ +*.py[oc] +build/ +dist/ +wheels/ +*.egg-info + +# venv +.venv diff --git a/exercises/03_concurrency/02_gil/sample/README.md b/exercises/03_concurrency/02_gil/sample/README.md new file mode 100644 index 0000000..e69de29 diff --git a/exercises/03_concurrency/02_gil/sample/pyproject.toml b/exercises/03_concurrency/02_gil/sample/pyproject.toml new file mode 100644 index 0000000..7b6a8bd --- /dev/null +++ b/exercises/03_concurrency/02_gil/sample/pyproject.toml @@ -0,0 +1,19 @@ +[project] +name = "gil2_sample" +version = "0.1.0" +dependencies = ["gil2"] +readme = "README.md" +requires-python = ">=3.11" + +[build-system] +requires = ["hatchling"] +build-backend = "hatchling.build" + +[dependency-groups] +dev = ["pytest>=8.2.2"] + +[tool.hatch.metadata] +allow-direct-references = true + +[tool.hatch.build.targets.wheel] +packages = ["src/sample"] diff --git a/exercises/03_concurrency/02_gil/sample/src/sample/__init__.py b/exercises/03_concurrency/02_gil/sample/src/sample/__init__.py new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/exercises/03_concurrency/02_gil/sample/src/sample/__init__.py @@ -0,0 +1 @@ + diff --git a/exercises/03_concurrency/02_gil/sample/tests/test_sample.py b/exercises/03_concurrency/02_gil/sample/tests/test_sample.py new file mode 100644 index 0000000..f37094e --- /dev/null +++ b/exercises/03_concurrency/02_gil/sample/tests/test_sample.py @@ -0,0 +1,24 @@ +# Modify the Rust extension to get the test below to pass +# Do NOT modify the test itself! +import pytest + +from gil2 import word_count + +def test_word_count_single_process(): + text = "hello world" + assert word_count(text, 1) == 2 + + +def test_word_count_multiple_processes(): + text = "hello world" + assert word_count(text, 2) == 2 + + +def test_word_count_multiple_processes_long_text(): + text = "hello world " * 1000 + assert word_count(text, 2) == 2000 + + +def test_more_processes_than_words(): + text = "hello world" + assert word_count(text, 10) == 2 diff --git a/exercises/03_concurrency/02_gil/src/lib.rs b/exercises/03_concurrency/02_gil/src/lib.rs new file mode 100644 index 0000000..2f064ac --- /dev/null +++ b/exercises/03_concurrency/02_gil/src/lib.rs @@ -0,0 +1,81 @@ +use pyo3::prelude::*; + +/// Use `std::thread::scope` to spawn `n_threads` threads to count words in parallel. +/// +/// Rely on: +/// - `word_count_chunk` to count words in each chunk +/// - `split_into_chunks` to split the text into `n_threads` chunks +/// +/// If you've never used `std::thread::scope` before, you can find more information here: +/// https://rust-exercises.com/100-exercises/07_threads/04_scoped_threads.html +#[pyfunction] +fn word_count(text: &str, n_threads: usize) -> usize { + if n_threads == 0 { + panic!("Number of threads 'n_threads' must be greater than 0"); + } + + let chunks = split_into_chunks(text, n_threads); + let mut count = 0; + + std::thread::scope(|scope| { + let mut handles = Vec::with_capacity(n_threads); + for chunk in chunks { + let handle = scope.spawn(move || word_count_chunk(chunk)); + handles.push(handle); + } + + for handle in handles { + count += handle.join().unwrap(); + } + }); + + count +} + +/// Count words in a single chunk of text. +fn word_count_chunk(chunk: &str) -> usize { + chunk.split_whitespace().count() +} + +/// Splits a string into `n` chunks, ensuring splits occur at whitespace. +fn split_into_chunks(text: &str, n: usize) -> Vec<&str> { + if n == 0 { + panic!("Number of chunks 'n' must be greater than 0"); + } + + let mut chunks = Vec::new(); + let mut start = 0; + let avg_length = text.len() / n; + + for _ in 0..n { + if start >= text.len() { + break; // No more content to split + } + + // Tentative end index + let mut end = (start + avg_length).min(text.len()); + + // Adjust end to nearest whitespace + while end < text.len() && !text[end..].starts_with(char::is_whitespace) { + end += 1; + } + + // If we hit the end of the string, take the rest + if end >= text.len() { + chunks.push(&text[start..]); + break; + } + + // Add the chunk and move the start index forward + chunks.push(text[start..end].trim()); + start = end + 1; // Move past the whitespace + } + + chunks +} + +#[pymodule] +fn gil2(m: &Bound<'_, PyModule>) -> PyResult<()> { + m.add_function(wrap_pyfunction!(word_count, m)?)?; + Ok(()) +} diff --git a/exercises/03_concurrency/02_gil/uv.lock b/exercises/03_concurrency/02_gil/uv.lock new file mode 100644 index 0000000..ed3b2bd --- /dev/null +++ b/exercises/03_concurrency/02_gil/uv.lock @@ -0,0 +1,83 @@ +version = 1 +requires-python = ">=3.11" + +[manifest] +members = [ + "gil2", + "gil2-sample", +] + +[[package]] +name = "colorama" +version = "0.4.6" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d8/53/6f443c9a4a8358a93a6792e2acffb9d9d5cb0a5cfd8802644b7b1c9a02e4/colorama-0.4.6.tar.gz", hash = "sha256:08695f5cb7ed6e0531a20572697297273c47b8cae5a63ffc6d6ed5c201be6e44", size = 27697 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/d1/d6/3965ed04c63042e047cb6a3e6ed1a63a35087b6a609aa3a15ed8ac56c221/colorama-0.4.6-py2.py3-none-any.whl", hash = "sha256:4f1d9991f5acc0ca119f9d443620b77f9d6b33703e51011c16baf57afb285fc6", size = 25335 }, +] + +[[package]] +name = "gil2" +version = "0.1.0" +source = { editable = "." } + +[[package]] +name = "gil2-sample" +version = "0.1.0" +source = { editable = "sample" } +dependencies = [ + { name = "gil2" }, +] + +[package.dev-dependencies] +dev = [ + { name = "pytest" }, +] + +[package.metadata] +requires-dist = [{ name = "gil2", editable = "." }] + +[package.metadata.requires-dev] +dev = [{ name = "pytest", specifier = ">=8.2.2" }] + +[[package]] +name = "iniconfig" +version = "2.0.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d7/4b/cbd8e699e64a6f16ca3a8220661b5f83792b3017d0f79807cb8708d33913/iniconfig-2.0.0.tar.gz", hash = "sha256:2d91e135bf72d31a410b17c16da610a82cb55f6b0477d1a902134b24a455b8b3", size = 4646 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/ef/a6/62565a6e1cf69e10f5727360368e451d4b7f58beeac6173dc9db836a5b46/iniconfig-2.0.0-py3-none-any.whl", hash = "sha256:b6a85871a79d2e3b22d2d1b94ac2824226a63c6b741c88f7ae975f18b6778374", size = 5892 }, +] + +[[package]] +name = "packaging" +version = "24.2" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d0/63/68dbb6eb2de9cb10ee4c9c14a0148804425e13c4fb20d61cce69f53106da/packaging-24.2.tar.gz", hash = "sha256:c228a6dc5e932d346bc5739379109d49e8853dd8223571c7c5b55260edc0b97f", size = 163950 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/88/ef/eb23f262cca3c0c4eb7ab1933c3b1f03d021f2c48f54763065b6f0e321be/packaging-24.2-py3-none-any.whl", hash = "sha256:09abb1bccd265c01f4a3aa3f7a7db064b36514d2cba19a2f694fe6150451a759", size = 65451 }, +] + +[[package]] +name = "pluggy" +version = "1.5.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/96/2d/02d4312c973c6050a18b314a5ad0b3210edb65a906f868e31c111dede4a6/pluggy-1.5.0.tar.gz", hash = "sha256:2cffa88e94fdc978c4c574f15f9e59b7f4201d439195c3715ca9e2486f1d0cf1", size = 67955 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/88/5f/e351af9a41f866ac3f1fac4ca0613908d9a41741cfcf2228f4ad853b697d/pluggy-1.5.0-py3-none-any.whl", hash = "sha256:44e1ad92c8ca002de6377e165f3e0f1be63266ab4d554740532335b9d75ea669", size = 20556 }, +] + +[[package]] +name = "pytest" +version = "8.3.4" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "colorama", marker = "sys_platform == 'win32'" }, + { name = "iniconfig" }, + { name = "packaging" }, + { name = "pluggy" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/05/35/30e0d83068951d90a01852cb1cef56e5d8a09d20c7f511634cc2f7e0372a/pytest-8.3.4.tar.gz", hash = "sha256:965370d062bce11e73868e0335abac31b4d3de0e82f4007408d242b4f8610761", size = 1445919 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/11/92/76a1c94d3afee238333bc0a42b82935dd8f9cf8ce9e336ff87ee14d9e1cf/pytest-8.3.4-py3-none-any.whl", hash = "sha256:50e16d954148559c9a74109af1eaf0c945ba2d8f30f0a3d3335edde19788b6f6", size = 343083 }, +] From 3434bb76065756ffb521e730352c612aedc54eba Mon Sep 17 00:00:00 2001 From: LukeMathWalker <20745048+LukeMathWalker@users.noreply.github.com> Date: Thu, 9 Jan 2025 12:33:38 +0100 Subject: [PATCH 05/10] Releasing the GIL --- Cargo.lock | 7 + book/src/03_concurrency/00_introduction.md | 19 +++ book/src/03_concurrency/01_python_threads.md | 24 ++++ book/src/03_concurrency/02_gil.md | 6 +- .../03_concurrency/03_releasing_the_gil.md | 133 ++++++++++++++++++ book/src/SUMMARY.md | 1 + exercises/03_concurrency/02_gil/src/lib.rs | 12 +- .../03_releasing_the_gil/Cargo.toml | 10 ++ .../03_releasing_the_gil/pyproject.toml | 29 ++++ .../03_releasing_the_gil/sample/.gitignore | 10 ++ .../03_releasing_the_gil/sample/README.md | 0 .../sample/pyproject.toml | 19 +++ .../sample/src/sample/__init__.py | 1 + .../sample/tests/test_sample.py | 35 +++++ .../03_releasing_the_gil/src/lib.rs | 35 +++++ .../03_releasing_the_gil/uv.lock | 83 +++++++++++ 16 files changed, 419 insertions(+), 5 deletions(-) create mode 100644 book/src/03_concurrency/03_releasing_the_gil.md create mode 100644 exercises/03_concurrency/03_releasing_the_gil/Cargo.toml create mode 100644 exercises/03_concurrency/03_releasing_the_gil/pyproject.toml create mode 100644 exercises/03_concurrency/03_releasing_the_gil/sample/.gitignore create mode 100644 exercises/03_concurrency/03_releasing_the_gil/sample/README.md create mode 100644 exercises/03_concurrency/03_releasing_the_gil/sample/pyproject.toml create mode 100644 exercises/03_concurrency/03_releasing_the_gil/sample/src/sample/__init__.py create mode 100644 exercises/03_concurrency/03_releasing_the_gil/sample/tests/test_sample.py create mode 100644 exercises/03_concurrency/03_releasing_the_gil/src/lib.rs create mode 100644 exercises/03_concurrency/03_releasing_the_gil/uv.lock diff --git a/Cargo.lock b/Cargo.lock index a71f064..b9bf3d9 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -254,6 +254,13 @@ dependencies = [ "proc-macro2", ] +[[package]] +name = "release" +version = "0.1.0" +dependencies = [ + "pyo3", +] + [[package]] name = "setters" version = "0.1.0" diff --git a/book/src/03_concurrency/00_introduction.md b/book/src/03_concurrency/00_introduction.md index 0540075..367285d 100644 --- a/book/src/03_concurrency/00_introduction.md +++ b/book/src/03_concurrency/00_introduction.md @@ -22,6 +22,25 @@ Some characteristics are common across most operating systems, though. In partic - Its memory space, allocated by the operating system - A set of resources (file handles, sockets, etc.) +```ascii ++------------------------+ +| Memory | +| | +| +--------------------+ | +| | Process A Space | | <-- Each process has a separate memory space. +| +--------------------+ | +| | +| +--------------------+ | +| | Process B Space | | +| | | | +| +--------------------+ | +| | +| +--------------------+ | +| | Process C Space | | +| +--------------------+ | ++------------------------+ +``` + There can be multiple processes running the same program, each with its own memory space and resources, fully isolated from one another.\ The **operating system's scheduler** is in charge of deciding which process to run at any given time, partitioning CPU time diff --git a/book/src/03_concurrency/01_python_threads.md b/book/src/03_concurrency/01_python_threads.md index fb706f7..878c710 100644 --- a/book/src/03_concurrency/01_python_threads.md +++ b/book/src/03_concurrency/01_python_threads.md @@ -42,6 +42,30 @@ A **thread** is an execution context **within a process**.\ Threads share the same memory space and resources as the process that spawned them, thus allowing them to communicate and share data with one another more easily than processes can. + +```ascii ++------------------------+ +| Memory | +| | +| +--------------------+ | +| | Process A Space | | <-- Each process has its own memory space. +| | +-------------+ | | Threads share the same memory space +| | | Thread 1 | | | of the process that spawned them. +| | | Thread 2 | | | +| | | Thread 3 | | | +| | +-------------+ | | +| +--------------------+ | +| | +| +--------------------+ | +| | Process B Space | | +| | +-------------+ | | +| | | Thread 1 | | | +| | | Thread 2 | | | +| | +-------------+ | | +| +--------------------+ | ++------------------------+ +``` + Threads, just like processes, are operating system constructs.\ The operating system's scheduler is in charge of deciding which thread to run at any given time, partitioning CPU time among them. diff --git a/book/src/03_concurrency/02_gil.md b/book/src/03_concurrency/02_gil.md index 709c533..6f847ed 100644 --- a/book/src/03_concurrency/02_gil.md +++ b/book/src/03_concurrency/02_gil.md @@ -2,7 +2,7 @@ ## Concurrent, yes, but not parallel -On the surface, our thread-based solution seems to address all the issues we identified in the `multiprocessing` module: +On the surface, our thread-based solution addresses all the issues we identified in the `multiprocessing` module: ```python from threading import Process @@ -30,6 +30,8 @@ When a thread is created, we are no longer cloning the text chunk nor incurring t = Thread(target=word_count_task, args=(chunk, result_queue)) ``` +Since the spawned threads share the same memory space as the parent thread, they can access the `chunk` and `result_queue` directly. + Nonetheless, there's a major issue with this code: **it won't actually use multiple CPU cores**.\ It will run sequentially, even if we pass `n_threads > 1` and multiple CPU cores are available. @@ -39,7 +41,7 @@ You guessed it: the infamous Global Interpreter Lock (GIL) is to blame. As we discussed in the [GIL chapter](../01_intro/05_gil.md), Python's GIL prevents multiple threads from executing Python code simultaneously[^free-threading]. -As a result, [thread-based parallelism](https://docs.python.org/3/library/threading.html) has historically +As a result, thread-based parallelism has historically seen limited use in Python, as it doesn't provide the performance benefits one might expect from a multithreaded application. diff --git a/book/src/03_concurrency/03_releasing_the_gil.md b/book/src/03_concurrency/03_releasing_the_gil.md new file mode 100644 index 0000000..432429a --- /dev/null +++ b/book/src/03_concurrency/03_releasing_the_gil.md @@ -0,0 +1,133 @@ +# Releasing the GIL + +What happens to our Python code when it calls a Rust function?\ +It waits for the Rust function to return: + +```ascii + Time --> + + +------------+--------------------+------------+--------------------+ + Python: | Execute | Call Rust Function | Idle | Resume Execution | + +------------+--------------------+------------+--------------------+ + │ ▲ + ▼ │ + +------------+--------------------+------------+--------------------+ + Rust: | Idle | Idle | Execute | Return to Python | + +------------+--------------------+------------+--------------------+ +``` + +The schema doesn't change even if the Rust function is multithreaded: + + +```ascii + Time --> + + +------------+--------------------+-------------------+--------------------+ + Python: | Execute | Call Rust Function | Idle | Resume Execution | + +------------+--------------------+-------------------+--------------------+ + │ ▲ + ▼ │ + +------------+--------------------+-------------------+--------------------+ + Rust: | Idle | Idle | Execute Thread 1 | Return to Python | + | | | Execute Thread 2 | | + +------------+--------------------+-------------------+--------------------+ +``` + +It begs the question: can we have Python and Rust code running concurrently?\ +Yes! The focus point, once again, is the GIL. + +## Python access must be serialized + +The GIL's job is to serialize all interactions with Python objects.\ +On the `pyo3` side, this is modeled by the `Python<'py>` token: you can only get an instance of `Python<'py>` if you're holding +the GIL. Going further, you can only interact with Python objects via smart pointers like `Borrowed<'py, T>` or `Owned<'py, T>`, +which internally hold a `Python<'py>` instance.\ +There's no way around it: any interaction with Python objects must be serialized. But, here's the kicker: not all Rust code needs to +interact with Python objects! + +## `Python::allow_threads` + +For example, consider a Rust function that calculates the nth Fibonacci number: + +```rust +#[pyfunction] +fn fibonacci(n: u64) -> u64 { + let mut a = 0; + let mut b = 1; + for _ in 0..n { + let tmp = a; + a = b; + b = tmp + b; + } + a +} +``` + +There's no Python object in sight! We're just offloading a computation to Rust.\ +In principle, we could spawn a thread to run this function while the main thread continues executing Python code: + +```python +from threading import Thread + +def other_work(): + print("I'm doing other work!") + +t = Thread(target=fibonacci, args=(10,)) +t.start() +other_work() +t.join() +``` + +As it stands, `other_work` and `fibonacci` will not be run in parallel: our `fibonacci` routine is still holding the GIL, even though +it doesn't need it.\ +We can fix it by explicitly releasing the GIL: + +```rust +#[pyfunction] +fn fibonacci(py: Python<'_>, n: u64) -> u64 { + py.allow_threads(|| { + let mut a = 0; + let mut b = 1; + for _ in 0..n { + let tmp = a; + a = b; + b = tmp + b; + } + a + }) +} +``` + +`Python::allow_threads` releases the GIL while executing the closure passed to it.\ +This frees up the Python interpreter to run other Python code, such as the `other_work` function in our example, while the Rust +thread is busy calculating the nth Fibonacci number. + +## `Ungil` + +`Python::allow_threads` is only sound **if the closure doesn't interact with Python objects**.\ +If that's not the case, we end up with undefined behavior: Rust code touching Python objects while the Python interpreter is running +other Python code, assuming nothing else is happening to those objects thanks to the GIL. A recipe for disaster! + +It'd be ideal to rely on the type system to enforce this constraint for us at compile-time, in true Rust fashion—"if it compiles, it's +safe."\ +`pyo3` _tries_ to follow this principle with the [`Ungil` marker trait](https://docs.rs/pyo3/0.23.3/pyo3/marker/trait.Ungil.html): +only types that are safe to access without the GIL can implement `Ungil`. The trait is then used to constrain the arguments of +`Python::allow_threads`: + +```rust +pub fn allow_threads(self, f: F) -> T +where + F: Ungil + FnOnce() -> T, + T: Ungil, +{ + // ... +} +``` + +Unfortunately, `Ungil` is not perfect. +On stable Rust, it leans on the `Send` trait, but that allows for some +[unsafe interactions with Python objects](https://github.com/PyO3/pyo3/issues/2141). The tracking is more precise on `nightly` Rust, +but it doesn't catch [every possible misuse of `Python::allow_threads`](https://github.com/PyO3/pyo3/issues/3640). + +My suggestion: if you're using `Python::allow_threads`, trigger an additional run of your CI pipeline using the `nightly` Rust compiler +to catch more issues. On top of that, review your code carefully. diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md index be5d7ec..a044a1e 100644 --- a/book/src/SUMMARY.md +++ b/book/src/SUMMARY.md @@ -20,3 +20,4 @@ - [Concurrency](03_concurrency/00_introduction.md) - [Python's threads](03_concurrency/01_python_threads.md) - [The GIL problem](03_concurrency/02_gil.md) + - [Releasing the GIL](03_concurrency/03_releasing_the_gil.md) diff --git a/exercises/03_concurrency/02_gil/src/lib.rs b/exercises/03_concurrency/02_gil/src/lib.rs index 2f064ac..9d76cec 100644 --- a/exercises/03_concurrency/02_gil/src/lib.rs +++ b/exercises/03_concurrency/02_gil/src/lib.rs @@ -1,4 +1,4 @@ -use pyo3::prelude::*; +use pyo3::{prelude::*, types::PyString}; /// Use `std::thread::scope` to spawn `n_threads` threads to count words in parallel. /// @@ -9,11 +9,17 @@ use pyo3::prelude::*; /// If you've never used `std::thread::scope` before, you can find more information here: /// https://rust-exercises.com/100-exercises/07_threads/04_scoped_threads.html #[pyfunction] -fn word_count(text: &str, n_threads: usize) -> usize { +fn word_count(text: Bound<'_, PyString>, n_threads: usize) -> PyResult { if n_threads == 0 { panic!("Number of threads 'n_threads' must be greater than 0"); } + // Get a Rust view (&str) over the Python string + // This may fail if the string contains invalid UTF-8 + // We go down this route, rather than asking for a `&str` + // directly as an argument, to avoid an extra copy of the string + let text = text.to_str()?; + let chunks = split_into_chunks(text, n_threads); let mut count = 0; @@ -29,7 +35,7 @@ fn word_count(text: &str, n_threads: usize) -> usize { } }); - count + Ok(count) } /// Count words in a single chunk of text. diff --git a/exercises/03_concurrency/03_releasing_the_gil/Cargo.toml b/exercises/03_concurrency/03_releasing_the_gil/Cargo.toml new file mode 100644 index 0000000..107872c --- /dev/null +++ b/exercises/03_concurrency/03_releasing_the_gil/Cargo.toml @@ -0,0 +1,10 @@ +[package] +name = "release" +version = "0.1.0" +edition = "2021" + +[lib] +crate-type = ["cdylib"] + +[dependencies] +pyo3 = { workspace = true } diff --git a/exercises/03_concurrency/03_releasing_the_gil/pyproject.toml b/exercises/03_concurrency/03_releasing_the_gil/pyproject.toml new file mode 100644 index 0000000..eafe975 --- /dev/null +++ b/exercises/03_concurrency/03_releasing_the_gil/pyproject.toml @@ -0,0 +1,29 @@ +[build-system] +requires = ["maturin>=1.8,<2.0"] +build-backend = "maturin" + +[project] +name = "release" +requires-python = ">=3.11" +classifiers = [ + "Programming Language :: Rust", + "Programming Language :: Python :: Implementation :: CPython", + "Programming Language :: Python :: Implementation :: PyPy", +] +version = "0.1.0" + +[tool.maturin] +features = ["pyo3/extension-module"] + +[tool.uv.config-settings] +# Faster feedback on Rust builds +build-args = "--profile=release" + +[tool.uv] +cache-keys = ["pyproject.toml", "Cargo.toml", "src/*.rs"] + +[tool.uv.sources] +release = { workspace = true } + +[tool.uv.workspace] +members = ["sample"] diff --git a/exercises/03_concurrency/03_releasing_the_gil/sample/.gitignore b/exercises/03_concurrency/03_releasing_the_gil/sample/.gitignore new file mode 100644 index 0000000..ae8554d --- /dev/null +++ b/exercises/03_concurrency/03_releasing_the_gil/sample/.gitignore @@ -0,0 +1,10 @@ +# python generated files +__pycache__/ +*.py[oc] +build/ +dist/ +wheels/ +*.egg-info + +# venv +.venv diff --git a/exercises/03_concurrency/03_releasing_the_gil/sample/README.md b/exercises/03_concurrency/03_releasing_the_gil/sample/README.md new file mode 100644 index 0000000..e69de29 diff --git a/exercises/03_concurrency/03_releasing_the_gil/sample/pyproject.toml b/exercises/03_concurrency/03_releasing_the_gil/sample/pyproject.toml new file mode 100644 index 0000000..e1d5379 --- /dev/null +++ b/exercises/03_concurrency/03_releasing_the_gil/sample/pyproject.toml @@ -0,0 +1,19 @@ +[project] +name = "release_sample" +version = "0.1.0" +dependencies = ["release"] +readme = "README.md" +requires-python = ">=3.11" + +[build-system] +requires = ["hatchling"] +build-backend = "hatchling.build" + +[dependency-groups] +dev = ["pytest>=8.2.2"] + +[tool.hatch.metadata] +allow-direct-references = true + +[tool.hatch.build.targets.wheel] +packages = ["src/sample"] diff --git a/exercises/03_concurrency/03_releasing_the_gil/sample/src/sample/__init__.py b/exercises/03_concurrency/03_releasing_the_gil/sample/src/sample/__init__.py new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/exercises/03_concurrency/03_releasing_the_gil/sample/src/sample/__init__.py @@ -0,0 +1 @@ + diff --git a/exercises/03_concurrency/03_releasing_the_gil/sample/tests/test_sample.py b/exercises/03_concurrency/03_releasing_the_gil/sample/tests/test_sample.py new file mode 100644 index 0000000..a01dd74 --- /dev/null +++ b/exercises/03_concurrency/03_releasing_the_gil/sample/tests/test_sample.py @@ -0,0 +1,35 @@ +# Modify the Rust extension to get the test below to pass +# Do NOT modify the test itself! +import pytest +import timeit +import math + +from release import nth_prime +from concurrent.futures.thread import ThreadPoolExecutor +from concurrent.futures import wait + +def parallel(executor: 'ThreadPoolExecutor', n: int): + future1 = executor.submit(nth_prime, n) + future2 = executor.submit(nth_prime, n) + wait([future1, future2], return_when='ALL_COMPLETED') + +def serial(executor: 'ThreadPoolExecutor', n: int): + future = executor.submit(nth_prime, n) + future.result() + + +def test_timing(): + # Record how long it takes to compute the n-th prime for a sufficiently + # high `n`. + # Then time how long it takes to run two of those computations in parallel + # with the same input. Ensure that the parallel version doesn't take 2x as long + n = 1600 + n_executions = 10000 + + executor = ThreadPoolExecutor(max_workers=2) + + serial_timing = timeit.timeit(lambda: serial(executor, n), number=n_executions) / n_executions + parallel_timing = timeit.timeit(lambda: parallel(executor, n), number=n_executions) / n_executions + print(f"Serial timing: {serial_timing}") + print(f"Parallel timing: {parallel_timing}") + assert math.isclose(parallel_timing, serial_timing, rel_tol=0.10) diff --git a/exercises/03_concurrency/03_releasing_the_gil/src/lib.rs b/exercises/03_concurrency/03_releasing_the_gil/src/lib.rs new file mode 100644 index 0000000..d0c279c --- /dev/null +++ b/exercises/03_concurrency/03_releasing_the_gil/src/lib.rs @@ -0,0 +1,35 @@ +use pyo3::prelude::*; + +#[pyfunction] +// Modify this function to release the GIL while computing the nth prime number. +fn nth_prime(python: Python<'_>, n: u64) -> u64 { + python.allow_threads(|| { + let mut count = 0; + let mut num = 2; // Start checking primes from 2 + while count < n { + if is_prime(num) { + count += 1; + } + num += 1; + } + num - 1 // Subtract 1 because we increment after finding the nth prime + }) +} + +fn is_prime(n: u64) -> bool { + if n < 2 { + return false; + } + for i in 2..=(n as f64).sqrt() as u64 { + if n % i == 0 { + return false; + } + } + true +} + +#[pymodule] +fn release(m: &Bound<'_, PyModule>) -> PyResult<()> { + m.add_function(wrap_pyfunction!(nth_prime, m)?)?; + Ok(()) +} diff --git a/exercises/03_concurrency/03_releasing_the_gil/uv.lock b/exercises/03_concurrency/03_releasing_the_gil/uv.lock new file mode 100644 index 0000000..acdaf10 --- /dev/null +++ b/exercises/03_concurrency/03_releasing_the_gil/uv.lock @@ -0,0 +1,83 @@ +version = 1 +requires-python = ">=3.11" + +[manifest] +members = [ + "release", + "release-sample", +] + +[[package]] +name = "colorama" +version = "0.4.6" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d8/53/6f443c9a4a8358a93a6792e2acffb9d9d5cb0a5cfd8802644b7b1c9a02e4/colorama-0.4.6.tar.gz", hash = "sha256:08695f5cb7ed6e0531a20572697297273c47b8cae5a63ffc6d6ed5c201be6e44", size = 27697 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/d1/d6/3965ed04c63042e047cb6a3e6ed1a63a35087b6a609aa3a15ed8ac56c221/colorama-0.4.6-py2.py3-none-any.whl", hash = "sha256:4f1d9991f5acc0ca119f9d443620b77f9d6b33703e51011c16baf57afb285fc6", size = 25335 }, +] + +[[package]] +name = "iniconfig" +version = "2.0.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d7/4b/cbd8e699e64a6f16ca3a8220661b5f83792b3017d0f79807cb8708d33913/iniconfig-2.0.0.tar.gz", hash = "sha256:2d91e135bf72d31a410b17c16da610a82cb55f6b0477d1a902134b24a455b8b3", size = 4646 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/ef/a6/62565a6e1cf69e10f5727360368e451d4b7f58beeac6173dc9db836a5b46/iniconfig-2.0.0-py3-none-any.whl", hash = "sha256:b6a85871a79d2e3b22d2d1b94ac2824226a63c6b741c88f7ae975f18b6778374", size = 5892 }, +] + +[[package]] +name = "packaging" +version = "24.2" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d0/63/68dbb6eb2de9cb10ee4c9c14a0148804425e13c4fb20d61cce69f53106da/packaging-24.2.tar.gz", hash = "sha256:c228a6dc5e932d346bc5739379109d49e8853dd8223571c7c5b55260edc0b97f", size = 163950 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/88/ef/eb23f262cca3c0c4eb7ab1933c3b1f03d021f2c48f54763065b6f0e321be/packaging-24.2-py3-none-any.whl", hash = "sha256:09abb1bccd265c01f4a3aa3f7a7db064b36514d2cba19a2f694fe6150451a759", size = 65451 }, +] + +[[package]] +name = "pluggy" +version = "1.5.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/96/2d/02d4312c973c6050a18b314a5ad0b3210edb65a906f868e31c111dede4a6/pluggy-1.5.0.tar.gz", hash = "sha256:2cffa88e94fdc978c4c574f15f9e59b7f4201d439195c3715ca9e2486f1d0cf1", size = 67955 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/88/5f/e351af9a41f866ac3f1fac4ca0613908d9a41741cfcf2228f4ad853b697d/pluggy-1.5.0-py3-none-any.whl", hash = "sha256:44e1ad92c8ca002de6377e165f3e0f1be63266ab4d554740532335b9d75ea669", size = 20556 }, +] + +[[package]] +name = "pytest" +version = "8.3.4" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "colorama", marker = "sys_platform == 'win32'" }, + { name = "iniconfig" }, + { name = "packaging" }, + { name = "pluggy" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/05/35/30e0d83068951d90a01852cb1cef56e5d8a09d20c7f511634cc2f7e0372a/pytest-8.3.4.tar.gz", hash = "sha256:965370d062bce11e73868e0335abac31b4d3de0e82f4007408d242b4f8610761", size = 1445919 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/11/92/76a1c94d3afee238333bc0a42b82935dd8f9cf8ce9e336ff87ee14d9e1cf/pytest-8.3.4-py3-none-any.whl", hash = "sha256:50e16d954148559c9a74109af1eaf0c945ba2d8f30f0a3d3335edde19788b6f6", size = 343083 }, +] + +[[package]] +name = "release" +version = "0.1.0" +source = { editable = "." } + +[[package]] +name = "release-sample" +version = "0.1.0" +source = { editable = "sample" } +dependencies = [ + { name = "release" }, +] + +[package.dev-dependencies] +dev = [ + { name = "pytest" }, +] + +[package.metadata] +requires-dist = [{ name = "release", editable = "." }] + +[package.metadata.requires-dev] +dev = [{ name = "pytest", specifier = ">=8.2.2" }] From 421c803114b71d441700b92ce155cd91fc843cdc Mon Sep 17 00:00:00 2001 From: LukeMathWalker <20745048+LukeMathWalker@users.noreply.github.com> Date: Thu, 9 Jan 2025 15:53:57 +0100 Subject: [PATCH 06/10] How to re-acquire the GIL where needed --- Cargo.lock | 66 +++++++ Cargo.toml | 2 + .../03_concurrency/03_releasing_the_gil.md | 26 ++- .../03_concurrency/04_minimize_gil_locking.md | 161 ++++++++++++++++++ book/src/SUMMARY.md | 1 + .../03_releasing_the_gil/pyproject.toml | 1 - .../04_minimize_gil_locking/Cargo.toml | 12 ++ .../04_minimize_gil_locking/pyproject.toml | 29 ++++ .../04_minimize_gil_locking/sample/.gitignore | 10 ++ .../04_minimize_gil_locking/sample/README.md | 0 .../sample/pyproject.toml | 19 +++ .../sample/src/sample/__init__.py | 1 + .../sample/tests/test_sample.py | 16 ++ .../04_minimize_gil_locking/src/lib.rs | 39 +++++ .../04_minimize_gil_locking/uv.lock | 83 +++++++++ 15 files changed, 462 insertions(+), 4 deletions(-) create mode 100644 book/src/03_concurrency/04_minimize_gil_locking.md create mode 100644 exercises/03_concurrency/04_minimize_gil_locking/Cargo.toml create mode 100644 exercises/03_concurrency/04_minimize_gil_locking/pyproject.toml create mode 100644 exercises/03_concurrency/04_minimize_gil_locking/sample/.gitignore create mode 100644 exercises/03_concurrency/04_minimize_gil_locking/sample/README.md create mode 100644 exercises/03_concurrency/04_minimize_gil_locking/sample/pyproject.toml create mode 100644 exercises/03_concurrency/04_minimize_gil_locking/sample/src/sample/__init__.py create mode 100644 exercises/03_concurrency/04_minimize_gil_locking/sample/tests/test_sample.py create mode 100644 exercises/03_concurrency/04_minimize_gil_locking/src/lib.rs create mode 100644 exercises/03_concurrency/04_minimize_gil_locking/uv.lock diff --git a/Cargo.lock b/Cargo.lock index b9bf3d9..a1c885a 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -28,6 +28,31 @@ dependencies = [ "pyo3", ] +[[package]] +name = "crossbeam-deque" +version = "0.8.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9dd111b7b7f7d55b72c0a6ae361660ee5853c9af73f70c3c2ef6858b950e2e51" +dependencies = [ + "crossbeam-epoch", + "crossbeam-utils", +] + +[[package]] +name = "crossbeam-epoch" +version = "0.9.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5b82ac4a3c2ca9c3460964f020e1402edd5753411d7737aa39c3714ad1b5420e" +dependencies = [ + "crossbeam-utils", +] + +[[package]] +name = "crossbeam-utils" +version = "0.8.21" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d0a5c400df2834b80a4c3327b3aad3a4c4cd4de0629063962b03235697506a28" + [[package]] name = "duct" version = "0.13.7" @@ -40,6 +65,12 @@ dependencies = [ "shared_child", ] +[[package]] +name = "either" +version = "1.13.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "60b1af1c220855b6ceac025d3f6ecdd2b7c4894bfe9cd9bda4fbb4bc7c0d4cf0" + [[package]] name = "exceptions" version = "0.1.0" @@ -109,6 +140,15 @@ dependencies = [ "pyo3", ] +[[package]] +name = "minimize" +version = "0.1.0" +dependencies = [ + "primes", + "pyo3", + "rayon", +] + [[package]] name = "modules" version = "0.1.0" @@ -166,6 +206,12 @@ version = "1.10.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "280dc24453071f1b63954171985a0b0d30058d287960968b9b2aca264c8d4ee6" +[[package]] +name = "primes" +version = "0.4.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0466ef49edd4a5a4bc9d62804a34e89366810bd8bfc3ed537101e3d099f245c5" + [[package]] name = "proc-macro2" version = "1.0.93" @@ -254,6 +300,26 @@ dependencies = [ "proc-macro2", ] +[[package]] +name = "rayon" +version = "1.10.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b418a60154510ca1a002a752ca9714984e21e4241e804d32555251faf8b78ffa" +dependencies = [ + "either", + "rayon-core", +] + +[[package]] +name = "rayon-core" +version = "1.12.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1465873a3dfdaa8ae7cb14b4383657caab0b3e8a0aa9ae8e04b044854c8dfce2" +dependencies = [ + "crossbeam-deque", + "crossbeam-utils", +] + [[package]] name = "release" version = "0.1.0" diff --git a/Cargo.toml b/Cargo.toml index fa09075..7d80121 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -10,6 +10,8 @@ resolver = "2" anyhow = "1" duct = "0.13" pyo3 = "0.23.3" +primes = "0.4" +rayon = "1" semver = "1.0.23" serde = "1.0.204" serde_json = "1.0.120" diff --git a/book/src/03_concurrency/03_releasing_the_gil.md b/book/src/03_concurrency/03_releasing_the_gil.md index 432429a..2a83f7a 100644 --- a/book/src/03_concurrency/03_releasing_the_gil.md +++ b/book/src/03_concurrency/03_releasing_the_gil.md @@ -18,7 +18,6 @@ It waits for the Rust function to return: The schema doesn't change even if the Rust function is multithreaded: - ```ascii Time --> @@ -102,6 +101,25 @@ fn fibonacci(py: Python<'_>, n: u64) -> u64 { This frees up the Python interpreter to run other Python code, such as the `other_work` function in our example, while the Rust thread is busy calculating the nth Fibonacci number. +Using the same line diagram as before, we have the following: + +```ascii + Time --> + + +------------+--------------------+-------------------+--------------------+ + Python: | Execute | Call Rust Function | other_work() | t.join() | + +------------+--------------------+-------------------+--------------------+ + │ ▲ + ▼ │ + +------------+--------------------+-------------------+--------------------+ + Rust: | Idle | Idle | fibonacci(n) | Return to Python | + +------------+--------------------+-------------------+--------------------+ + ▲ + │ + Python and Rust code + running concurrently here +``` + ## `Ungil` `Python::allow_threads` is only sound **if the closure doesn't interact with Python objects**.\ @@ -126,8 +144,10 @@ where Unfortunately, `Ungil` is not perfect. On stable Rust, it leans on the `Send` trait, but that allows for some -[unsafe interactions with Python objects](https://github.com/PyO3/pyo3/issues/2141). The tracking is more precise on `nightly` Rust, +[unsafe interactions with Python objects](https://github.com/PyO3/pyo3/issues/2141). The tracking is more precise on `nightly` Rust[^nightly], but it doesn't catch [every possible misuse of `Python::allow_threads`](https://github.com/PyO3/pyo3/issues/3640). -My suggestion: if you're using `Python::allow_threads`, trigger an additional run of your CI pipeline using the `nightly` Rust compiler +My recommendation: if you're using `Python::allow_threads`, trigger an additional run of your CI pipeline using the `nightly` Rust compiler to catch more issues. On top of that, review your code carefully. + +[^nightly]: See the [`nightly` feature flag exposed by `pyo3`](https://pyo3.rs/v0.23.3/features.html#nightly). diff --git a/book/src/03_concurrency/04_minimize_gil_locking.md b/book/src/03_concurrency/04_minimize_gil_locking.md new file mode 100644 index 0000000..1210fa8 --- /dev/null +++ b/book/src/03_concurrency/04_minimize_gil_locking.md @@ -0,0 +1,161 @@ +# Minimize GIL locking + +All our examples so far fall into two categories: + +- The Rust function holds the GIL for the entire duration of its execution. +- The Rust function doesn't hold the GIL at all, going straight into `Python::allow_threads` mode. + +Real-world applications are often more nuanced, though.\ +You'll need to hold the GIL for some operations (e.g. passing data back to Python), but you're able to release it +for others (e.g. long-running computations). + +The goal is to minimize the time spent holding the GIL to the bare minimum, thus maximizing the potential +parallelism of your application. + +## Strategy 1: isolate the GIL-free section + +Let's look at an example: we're given a list of numbers and we need to modify it in place, +replacing each number with the result of an expensive computation that uses no Python objects. + +To minimize GIL locking, we create Rust vector from the Python list, release the GIL, and perform the computation +and then re-acquire the GIL to update the Python list in place: + +```rust +#[pyfunction] +fn update_in_place<'py>( + python: Python<'py>, + numbers: Bound<'py, PyList> +) -> PyResult<()> { + // Holding the GIL + let v: Vec = numbers.extract()?; + let updated_v: Vec<_> = python.allow_threads(|| { + v.iter().map(|&n| expensive_computation(n)).collect() + }); + // Back to holding the GIL + for (i, &n) in updated_v.iter().enumerate() { + numbers.set_item(i, n)?; + } + Ok(() +} + +fn expensive_computation(n: i32) -> i32 { + // Some heavy number crunching + // [...] +} +``` + +## Strategy 2: manually re-acquire the GIL inside the closure + +In the example above, we've created a whole new vector to decouple the GIL-free section from the GIL-holding one. +If the input data is large, this can be a significant overhead. + +Let's explore a different approach: we won't create a new pure-Rust vector. +Instead, we will re-acquire the GIL inside the closure—we'll hold it to access each list element and, after the computation is done, +update it in place. Nothing more. + +Assuming you know nothing about `Ungil`, the naive solution might look like this: + +```rust +#[pyfunction] +fn update_in_place<'py>( + python: Python<'py>, + numbers: Bound<'py, PyList> +) -> PyResult<()> { + python.allow_threads(|| -> PyResult<()> { + let n_numbers = numbers.len(); + for i in 0..n_numbers { + let n = numbers.get_item(i)?.extract::()?; + let result = expensive_computation(n); + numbers.set_item(i, result))?; + } + Ok(()) + }) +} +``` + +It won't compile, though. We're using a GIL-bound object (`numbers`) in a GIL-free section (inside `python.allow_threads`). +We need to **unbind** it first. + +### `Py` and `Bound<'py, T>` + +Using `Bound<'py, T>::unbind` we get a `Py` object back. It has no `'py` lifetime, it's no longer bound to the GIL. +We can try to use it in the GIL-free section: + +```rust +#[pyfunction] +fn update_in_place<'py>( + python: Python<'py>, + numbers: Bound<'py, PyList> +) -> PyResult<()> { + let numbers = numbers.unbind(); + python.allow_threads(|| -> PyResult<()> { + let n_numbers = numbers.len(); + for i in 0..n_numbers { + let n = numbers.get_item(i)?.extract::()?; + let result = expensive_computation(n); + numbers.set_item(i, result)?; + } + Ok(()) + }) +} +``` + +But it won't compile either. `numbers.len()`, `numbers.get_item(i)`, and `numbers.set_item(i, result)` all require the GIL. +`Py` is just a pointer to a Python object, it won't allow us to access it if we're not holding the GIL. + +We need to **re-bind** it using a `Python<'py>` token, thus getting a `Bound<'py, PyList>` back. +How do we get a `Python<'py>` token inside the closure, though? Using `Python::with_gil`: it's the opposite +of `Python::allow_threads`, it makes sure to acquire the GIL before executing the closure and release it afterwards. +The closure is given a `Python` token as argument, which we can use to re-bind the `PyList` object: + +```rust +#[pyfunction] +fn update_in_place<'py>( + python: Python<'py>, + numbers: Bound<'py, PyList> +) -> PyResult<()> { + let n_numbers = numbers.len(); + let numbers_ref = numbers.unbind(); + // Release the GIL + python.allow_threads(|| -> PyResult<()> { + for i in 0..n_numbers { + // Acquire the GIL again, to access the + // i-th element of the list + let n = Python::with_gil(|inner_py| { + numbers_ref + .bind(inner_py) + .get_item(i)? + .extract::() + })?; + // Run the computation without holding the GIL + let result = expensive_computation(n); + // Re-acquire the GIL to update the list in place + Python::with_gil(|inner_py| { + numbers_ref.bind(inner_py).set_item(i, result) + })?; + } + Ok(()) + }) +} +``` + +## Be mindful of concurrency + +The GIL is there for a reason: to protect Python objects from concurrent access.\ +Whenever you release the GIL, you're allowing other threads to run and potentially modify the +Python objects you're working with. + +In the examples above, another Python thread could modify the `numbers` list while we're computing the result. +E.g. it could remove an element, causing the index `i` to be out of bounds. + +This is a common issue in multi-threaded programming, and it's up to you to handle it.\ +Consider using synchronization primitives like [`Lock`](https://docs.python.org/3/library/threading.html#lock-objects) +to serialize access to the Python objects you're working with. +In other words, move towards fine-grained locking rather than the lock-the-world approach +you get with the GIL. + +## References + +- [`Py` struct](https://docs.rs/pyo3/0.23.3/pyo3/struct.Py.html) +- [`Python::with_gil` method](https://docs.rs/pyo3/0.23.3/pyo3/marker/struct.Python.html#method.with_gil) +- [`Bound<'py, T>::unbind` method](https://docs.rs/pyo3/0.23.3/pyo3/prelude/struct.Bound.html#method.unbind) diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md index a044a1e..bbae141 100644 --- a/book/src/SUMMARY.md +++ b/book/src/SUMMARY.md @@ -21,3 +21,4 @@ - [Python's threads](03_concurrency/01_python_threads.md) - [The GIL problem](03_concurrency/02_gil.md) - [Releasing the GIL](03_concurrency/03_releasing_the_gil.md) + - [Minimize GIL locking](03_concurrency/04_minimize_gil_locking.md) diff --git a/exercises/03_concurrency/03_releasing_the_gil/pyproject.toml b/exercises/03_concurrency/03_releasing_the_gil/pyproject.toml index eafe975..0b7af6f 100644 --- a/exercises/03_concurrency/03_releasing_the_gil/pyproject.toml +++ b/exercises/03_concurrency/03_releasing_the_gil/pyproject.toml @@ -16,7 +16,6 @@ version = "0.1.0" features = ["pyo3/extension-module"] [tool.uv.config-settings] -# Faster feedback on Rust builds build-args = "--profile=release" [tool.uv] diff --git a/exercises/03_concurrency/04_minimize_gil_locking/Cargo.toml b/exercises/03_concurrency/04_minimize_gil_locking/Cargo.toml new file mode 100644 index 0000000..363a42d --- /dev/null +++ b/exercises/03_concurrency/04_minimize_gil_locking/Cargo.toml @@ -0,0 +1,12 @@ +[package] +name = "minimize" +version = "0.1.0" +edition = "2021" + +[lib] +crate-type = ["cdylib"] + +[dependencies] +pyo3 = { workspace = true } +primes = { workspace = true } +rayon = { workspace = true } diff --git a/exercises/03_concurrency/04_minimize_gil_locking/pyproject.toml b/exercises/03_concurrency/04_minimize_gil_locking/pyproject.toml new file mode 100644 index 0000000..7ad9318 --- /dev/null +++ b/exercises/03_concurrency/04_minimize_gil_locking/pyproject.toml @@ -0,0 +1,29 @@ +[build-system] +requires = ["maturin>=1.8,<2.0"] +build-backend = "maturin" + +[project] +name = "minimize" +requires-python = ">=3.11" +classifiers = [ + "Programming Language :: Rust", + "Programming Language :: Python :: Implementation :: CPython", + "Programming Language :: Python :: Implementation :: PyPy", +] +version = "0.1.0" + +[tool.maturin] +features = ["pyo3/extension-module"] + +[tool.uv.config-settings] +# Faster feedback on Rust builds +build-args = "--profile=dev" + +[tool.uv] +cache-keys = ["pyproject.toml", "Cargo.toml", "src/*.rs"] + +[tool.uv.sources] +minimize = { workspace = true } + +[tool.uv.workspace] +members = ["sample"] diff --git a/exercises/03_concurrency/04_minimize_gil_locking/sample/.gitignore b/exercises/03_concurrency/04_minimize_gil_locking/sample/.gitignore new file mode 100644 index 0000000..ae8554d --- /dev/null +++ b/exercises/03_concurrency/04_minimize_gil_locking/sample/.gitignore @@ -0,0 +1,10 @@ +# python generated files +__pycache__/ +*.py[oc] +build/ +dist/ +wheels/ +*.egg-info + +# venv +.venv diff --git a/exercises/03_concurrency/04_minimize_gil_locking/sample/README.md b/exercises/03_concurrency/04_minimize_gil_locking/sample/README.md new file mode 100644 index 0000000..e69de29 diff --git a/exercises/03_concurrency/04_minimize_gil_locking/sample/pyproject.toml b/exercises/03_concurrency/04_minimize_gil_locking/sample/pyproject.toml new file mode 100644 index 0000000..8a0ef08 --- /dev/null +++ b/exercises/03_concurrency/04_minimize_gil_locking/sample/pyproject.toml @@ -0,0 +1,19 @@ +[project] +name = "minimize_sample" +version = "0.1.0" +dependencies = ["minimize"] +readme = "README.md" +requires-python = ">=3.11" + +[build-system] +requires = ["hatchling"] +build-backend = "hatchling.build" + +[dependency-groups] +dev = ["pytest>=8.2.2"] + +[tool.hatch.metadata] +allow-direct-references = true + +[tool.hatch.build.targets.wheel] +packages = ["src/sample"] diff --git a/exercises/03_concurrency/04_minimize_gil_locking/sample/src/sample/__init__.py b/exercises/03_concurrency/04_minimize_gil_locking/sample/src/sample/__init__.py new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/exercises/03_concurrency/04_minimize_gil_locking/sample/src/sample/__init__.py @@ -0,0 +1 @@ + diff --git a/exercises/03_concurrency/04_minimize_gil_locking/sample/tests/test_sample.py b/exercises/03_concurrency/04_minimize_gil_locking/sample/tests/test_sample.py new file mode 100644 index 0000000..56191bf --- /dev/null +++ b/exercises/03_concurrency/04_minimize_gil_locking/sample/tests/test_sample.py @@ -0,0 +1,16 @@ +# Modify the Rust extension to get the test below to pass +# Do NOT modify the test itself! +import pytest + +from minimize import compute_prime_factors + +def test_compute_prime_factors(): + numbers = [387, 2, 75, 452, 562672865058083521] + number2prime_factors = compute_prime_factors(numbers) + assert number2prime_factors == { + 387: [3, 43], + 2: [2], + 75: [3, 5], + 452: [2, 113], + 562672865058083521: [7, 11483119695062929] + } diff --git a/exercises/03_concurrency/04_minimize_gil_locking/src/lib.rs b/exercises/03_concurrency/04_minimize_gil_locking/src/lib.rs new file mode 100644 index 0000000..83a17d5 --- /dev/null +++ b/exercises/03_concurrency/04_minimize_gil_locking/src/lib.rs @@ -0,0 +1,39 @@ +use std::collections::HashMap; + +use primes::factors_uniq; +use pyo3::{ + prelude::*, + types::{IntoPyDict, PyDict, PyList}, +}; +use rayon::prelude::*; + +#[pyfunction] +// You're given a Python list of non-negative numbers. +// You need to return a Python dictionary where the keys are the numbers in the list and the values +// are the unique prime factors of each number, sorted in ascending order. +// +// Constraints: +// - Don't hold the GIL while computing the prime factors +// +// Fun additional challenge: +// - Can you use multiple threads to parallelize the computation? +// Consider using `rayon` to make it easier. +fn compute_prime_factors<'python>( + python: Python<'python>, + numbers: Bound<'python, PyList>, +) -> PyResult> { + let inputs: Vec = numbers.extract()?; + let m: HashMap> = python.allow_threads(|| { + inputs + .into_par_iter() + .map(|number| (number, factors_uniq(number))) + .collect() + }); + m.into_py_dict(python) +} + +#[pymodule] +fn minimize(m: &Bound<'_, PyModule>) -> PyResult<()> { + m.add_function(wrap_pyfunction!(compute_prime_factors, m)?)?; + Ok(()) +} diff --git a/exercises/03_concurrency/04_minimize_gil_locking/uv.lock b/exercises/03_concurrency/04_minimize_gil_locking/uv.lock new file mode 100644 index 0000000..a6c2ca4 --- /dev/null +++ b/exercises/03_concurrency/04_minimize_gil_locking/uv.lock @@ -0,0 +1,83 @@ +version = 1 +requires-python = ">=3.11" + +[manifest] +members = [ + "minimize", + "minimize-sample", +] + +[[package]] +name = "colorama" +version = "0.4.6" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d8/53/6f443c9a4a8358a93a6792e2acffb9d9d5cb0a5cfd8802644b7b1c9a02e4/colorama-0.4.6.tar.gz", hash = "sha256:08695f5cb7ed6e0531a20572697297273c47b8cae5a63ffc6d6ed5c201be6e44", size = 27697 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/d1/d6/3965ed04c63042e047cb6a3e6ed1a63a35087b6a609aa3a15ed8ac56c221/colorama-0.4.6-py2.py3-none-any.whl", hash = "sha256:4f1d9991f5acc0ca119f9d443620b77f9d6b33703e51011c16baf57afb285fc6", size = 25335 }, +] + +[[package]] +name = "iniconfig" +version = "2.0.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d7/4b/cbd8e699e64a6f16ca3a8220661b5f83792b3017d0f79807cb8708d33913/iniconfig-2.0.0.tar.gz", hash = "sha256:2d91e135bf72d31a410b17c16da610a82cb55f6b0477d1a902134b24a455b8b3", size = 4646 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/ef/a6/62565a6e1cf69e10f5727360368e451d4b7f58beeac6173dc9db836a5b46/iniconfig-2.0.0-py3-none-any.whl", hash = "sha256:b6a85871a79d2e3b22d2d1b94ac2824226a63c6b741c88f7ae975f18b6778374", size = 5892 }, +] + +[[package]] +name = "minimize" +version = "0.1.0" +source = { editable = "." } + +[[package]] +name = "minimize-sample" +version = "0.1.0" +source = { editable = "sample" } +dependencies = [ + { name = "minimize" }, +] + +[package.dev-dependencies] +dev = [ + { name = "pytest" }, +] + +[package.metadata] +requires-dist = [{ name = "minimize", editable = "." }] + +[package.metadata.requires-dev] +dev = [{ name = "pytest", specifier = ">=8.2.2" }] + +[[package]] +name = "packaging" +version = "24.2" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d0/63/68dbb6eb2de9cb10ee4c9c14a0148804425e13c4fb20d61cce69f53106da/packaging-24.2.tar.gz", hash = "sha256:c228a6dc5e932d346bc5739379109d49e8853dd8223571c7c5b55260edc0b97f", size = 163950 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/88/ef/eb23f262cca3c0c4eb7ab1933c3b1f03d021f2c48f54763065b6f0e321be/packaging-24.2-py3-none-any.whl", hash = "sha256:09abb1bccd265c01f4a3aa3f7a7db064b36514d2cba19a2f694fe6150451a759", size = 65451 }, +] + +[[package]] +name = "pluggy" +version = "1.5.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/96/2d/02d4312c973c6050a18b314a5ad0b3210edb65a906f868e31c111dede4a6/pluggy-1.5.0.tar.gz", hash = "sha256:2cffa88e94fdc978c4c574f15f9e59b7f4201d439195c3715ca9e2486f1d0cf1", size = 67955 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/88/5f/e351af9a41f866ac3f1fac4ca0613908d9a41741cfcf2228f4ad853b697d/pluggy-1.5.0-py3-none-any.whl", hash = "sha256:44e1ad92c8ca002de6377e165f3e0f1be63266ab4d554740532335b9d75ea669", size = 20556 }, +] + +[[package]] +name = "pytest" +version = "8.3.4" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "colorama", marker = "sys_platform == 'win32'" }, + { name = "iniconfig" }, + { name = "packaging" }, + { name = "pluggy" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/05/35/30e0d83068951d90a01852cb1cef56e5d8a09d20c7f511634cc2f7e0372a/pytest-8.3.4.tar.gz", hash = "sha256:965370d062bce11e73868e0335abac31b4d3de0e82f4007408d242b4f8610761", size = 1445919 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/11/92/76a1c94d3afee238333bc0a42b82935dd8f9cf8ce9e336ff87ee14d9e1cf/pytest-8.3.4-py3-none-any.whl", hash = "sha256:50e16d954148559c9a74109af1eaf0c945ba2d8f30f0a3d3335edde19788b6f6", size = 343083 }, +] From b8b766de17f8835639224de8d2c46cba5df39c9a Mon Sep 17 00:00:00 2001 From: LukeMathWalker <20745048+LukeMathWalker@users.noreply.github.com> Date: Thu, 9 Jan 2025 17:49:44 +0100 Subject: [PATCH 07/10] Immutable types --- Cargo.lock | 7 ++ book/src/03_concurrency/05_immutable_types.md | 112 ++++++++++++++++++ book/src/SUMMARY.md | 1 + .../05_immutable_types/Cargo.toml | 10 ++ .../05_immutable_types/pyproject.toml | 29 +++++ .../05_immutable_types/sample/.gitignore | 10 ++ .../05_immutable_types/sample/README.md | 0 .../05_immutable_types/sample/pyproject.toml | 19 +++ .../sample/src/sample/__init__.py | 1 + .../sample/tests/test_sample.py | 16 +++ .../05_immutable_types/src/lib.rs | 24 ++++ .../03_concurrency/05_immutable_types/uv.lock | 83 +++++++++++++ 12 files changed, 312 insertions(+) create mode 100644 book/src/03_concurrency/05_immutable_types.md create mode 100644 exercises/03_concurrency/05_immutable_types/Cargo.toml create mode 100644 exercises/03_concurrency/05_immutable_types/pyproject.toml create mode 100644 exercises/03_concurrency/05_immutable_types/sample/.gitignore create mode 100644 exercises/03_concurrency/05_immutable_types/sample/README.md create mode 100644 exercises/03_concurrency/05_immutable_types/sample/pyproject.toml create mode 100644 exercises/03_concurrency/05_immutable_types/sample/src/sample/__init__.py create mode 100644 exercises/03_concurrency/05_immutable_types/sample/tests/test_sample.py create mode 100644 exercises/03_concurrency/05_immutable_types/src/lib.rs create mode 100644 exercises/03_concurrency/05_immutable_types/uv.lock diff --git a/Cargo.lock b/Cargo.lock index a1c885a..fac7727 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -105,6 +105,13 @@ version = "0.5.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "2304e00983f87ffb38b55b444b5e3b60a884b5d30c0fca7d82fe33449bbe55ea" +[[package]] +name = "immutable" +version = "0.1.0" +dependencies = [ + "pyo3", +] + [[package]] name = "indoc" version = "2.0.5" diff --git a/book/src/03_concurrency/05_immutable_types.md b/book/src/03_concurrency/05_immutable_types.md new file mode 100644 index 0000000..bf4d9d0 --- /dev/null +++ b/book/src/03_concurrency/05_immutable_types.md @@ -0,0 +1,112 @@ +# Immutable types + +Concurrency introduces many new classes of bugs that are not present in single-threaded programs. +Data races are one of the most common: two threads try to access the same memory location at the same time, and at least one of them +is writing to it. What should happen?\ +In most programming languages, the behavior is undefined: the program could crash, or it could produce incorrect results. + +Data races can't happen in a single-threaded program, because only one thread can access the memory at a time. +That's where the GIL comes in: since it serializes the execution of code that accesses Python objects, +it prevents all kinds of data races (albeit with a significant performance cost). + +There's another way to prevent data races though: by making sure that the data is immutable. +There's no need for synchronization if the data can't change! + +## Built-in immutable types + +Python has many immutable types—e.g. `int`, `float`, `str`.\ +Whenever you modify them, you're actually creating a new object, not changing the existing one. + +```python +a = 1 +b = a +a += 1 + +assert a == 2 +# a is a new object, +# b is still 1 +assert b == 1 +``` + +Since they're immutable, they're considered **thread-safe**: you can access them from multiple threads +without worrying about data races and synchronization. + +## Frozen dataclasses + +You can define your own immutable types in Python using `dataclasses` and the `frozen` attribute. + +```python +from dataclasses import dataclass + +@dataclass(frozen=True) +class Point: + x: int + y: int + +p = Point(1, 2) +# This will raise a `FrozenInstanceError` exception +p.x = 3 +``` + +The `frozen` attribute makes the class immutable: you can't modify its attributes after creation. +This goes beyond modifying the _values_ of the existing attributes. You are also forbidden from +adding new attributes, e.g.: + +```python +# This will raise a `FrozenInstanceError` exception +# But would work if `frozen=False` or for a "normal" +# class without the `@dataclass` decorator +p.z = 3 +``` + +### In Rust + +Let's see how we can define a similar immutable type in Rust. + +```rust +use pyo3::prelude::*; + +#[pyclass(frozen)] +struct Point { + x: i32, + y: i32, +} +``` + +The above is not enough to get all the niceties of Python's `dataclasses`, but +it's sufficient to make the class immutable.\ +If a `pyclass` is marked as `frozen`, `pyo3` will allow us to access its fields without +holding the GIL—i.e. via `Py` instead of `Bound<'py, T>` + +```rust +#[pyfunction] +fn print_point<'py>(python: Python<'py>, point: Bound<'py, Point>) { + let point: Py = point.unbind(); + python.allow_threads(|| { + // We can now access the fields of the Point struct + // even though we are not holding the GIL + let point: &Point = point.get(); + println!("({}, {})", point.x, point.y); + }); +} +``` + +This wouldn't compile if `Point` wasn't marked as `frozen`, thanks to `Py::get`'s signature: + +```rust +impl Py +where + T: PyClass, +{ + pub fn get(&self) -> &T + where + // `Frozen = True` is where the magic happens! + T: PyClass + Sync, + { /* ... */ } +} +``` + +## Summary + +Immutable types significantly simplify GIL jugglery in `pyo3`. If it fits the constraints of the problem you're solving, +consider using them to make your code easier to reason about (and potentially faster!). diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md index bbae141..ea1dd89 100644 --- a/book/src/SUMMARY.md +++ b/book/src/SUMMARY.md @@ -22,3 +22,4 @@ - [The GIL problem](03_concurrency/02_gil.md) - [Releasing the GIL](03_concurrency/03_releasing_the_gil.md) - [Minimize GIL locking](03_concurrency/04_minimize_gil_locking.md) + - [Immutable types](03_concurrency/05_immutable_types.md) diff --git a/exercises/03_concurrency/05_immutable_types/Cargo.toml b/exercises/03_concurrency/05_immutable_types/Cargo.toml new file mode 100644 index 0000000..0fb9d35 --- /dev/null +++ b/exercises/03_concurrency/05_immutable_types/Cargo.toml @@ -0,0 +1,10 @@ +[package] +name = "immutable" +version = "0.1.0" +edition = "2021" + +[lib] +crate-type = ["cdylib"] + +[dependencies] +pyo3 = { workspace = true } diff --git a/exercises/03_concurrency/05_immutable_types/pyproject.toml b/exercises/03_concurrency/05_immutable_types/pyproject.toml new file mode 100644 index 0000000..3d32113 --- /dev/null +++ b/exercises/03_concurrency/05_immutable_types/pyproject.toml @@ -0,0 +1,29 @@ +[build-system] +requires = ["maturin>=1.8,<2.0"] +build-backend = "maturin" + +[project] +name = "immutable" +requires-python = ">=3.11" +classifiers = [ + "Programming Language :: Rust", + "Programming Language :: Python :: Implementation :: CPython", + "Programming Language :: Python :: Implementation :: PyPy", +] +version = "0.1.0" + +[tool.maturin] +features = ["pyo3/extension-module"] + +[tool.uv.config-settings] +# Faster feedback on Rust builds +build-args = "--profile=dev" + +[tool.uv] +cache-keys = ["pyproject.toml", "Cargo.toml", "src/*.rs"] + +[tool.uv.sources] +immutable = { workspace = true } + +[tool.uv.workspace] +members = ["sample"] diff --git a/exercises/03_concurrency/05_immutable_types/sample/.gitignore b/exercises/03_concurrency/05_immutable_types/sample/.gitignore new file mode 100644 index 0000000..ae8554d --- /dev/null +++ b/exercises/03_concurrency/05_immutable_types/sample/.gitignore @@ -0,0 +1,10 @@ +# python generated files +__pycache__/ +*.py[oc] +build/ +dist/ +wheels/ +*.egg-info + +# venv +.venv diff --git a/exercises/03_concurrency/05_immutable_types/sample/README.md b/exercises/03_concurrency/05_immutable_types/sample/README.md new file mode 100644 index 0000000..e69de29 diff --git a/exercises/03_concurrency/05_immutable_types/sample/pyproject.toml b/exercises/03_concurrency/05_immutable_types/sample/pyproject.toml new file mode 100644 index 0000000..2096236 --- /dev/null +++ b/exercises/03_concurrency/05_immutable_types/sample/pyproject.toml @@ -0,0 +1,19 @@ +[project] +name = "immutable_sample" +version = "0.1.0" +dependencies = ["immutable"] +readme = "README.md" +requires-python = ">=3.11" + +[build-system] +requires = ["hatchling"] +build-backend = "hatchling.build" + +[dependency-groups] +dev = ["pytest>=8.2.2"] + +[tool.hatch.metadata] +allow-direct-references = true + +[tool.hatch.build.targets.wheel] +packages = ["src/sample"] diff --git a/exercises/03_concurrency/05_immutable_types/sample/src/sample/__init__.py b/exercises/03_concurrency/05_immutable_types/sample/src/sample/__init__.py new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/exercises/03_concurrency/05_immutable_types/sample/src/sample/__init__.py @@ -0,0 +1 @@ + diff --git a/exercises/03_concurrency/05_immutable_types/sample/tests/test_sample.py b/exercises/03_concurrency/05_immutable_types/sample/tests/test_sample.py new file mode 100644 index 0000000..56191bf --- /dev/null +++ b/exercises/03_concurrency/05_immutable_types/sample/tests/test_sample.py @@ -0,0 +1,16 @@ +# Modify the Rust extension to get the test below to pass +# Do NOT modify the test itself! +import pytest + +from minimize import compute_prime_factors + +def test_compute_prime_factors(): + numbers = [387, 2, 75, 452, 562672865058083521] + number2prime_factors = compute_prime_factors(numbers) + assert number2prime_factors == { + 387: [3, 43], + 2: [2], + 75: [3, 5], + 452: [2, 113], + 562672865058083521: [7, 11483119695062929] + } diff --git a/exercises/03_concurrency/05_immutable_types/src/lib.rs b/exercises/03_concurrency/05_immutable_types/src/lib.rs new file mode 100644 index 0000000..2bc5839 --- /dev/null +++ b/exercises/03_concurrency/05_immutable_types/src/lib.rs @@ -0,0 +1,24 @@ +use pyo3::prelude::*; + +#[pyclass(frozen)] +struct Point { + x: i32, + y: i32, +} + +#[pyfunction] +fn print_point<'py>(python: Python<'py>, point: Bound<'py, Point>) { + let point: Py = point.unbind(); + python.allow_threads(|| { + // We can now access the fields of the Point struct + // even though we are not holding the GIL + let point: &Point = point.get(); + println!("({}, {})", point.x, point.y); + }); +} + +#[pymodule] +fn immutable(m: &Bound<'_, PyModule>) -> PyResult<()> { + m.add_function(wrap_pyfunction!(print_point, m)?)?; + Ok(()) +} diff --git a/exercises/03_concurrency/05_immutable_types/uv.lock b/exercises/03_concurrency/05_immutable_types/uv.lock new file mode 100644 index 0000000..a6c2ca4 --- /dev/null +++ b/exercises/03_concurrency/05_immutable_types/uv.lock @@ -0,0 +1,83 @@ +version = 1 +requires-python = ">=3.11" + +[manifest] +members = [ + "minimize", + "minimize-sample", +] + +[[package]] +name = "colorama" +version = "0.4.6" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d8/53/6f443c9a4a8358a93a6792e2acffb9d9d5cb0a5cfd8802644b7b1c9a02e4/colorama-0.4.6.tar.gz", hash = "sha256:08695f5cb7ed6e0531a20572697297273c47b8cae5a63ffc6d6ed5c201be6e44", size = 27697 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/d1/d6/3965ed04c63042e047cb6a3e6ed1a63a35087b6a609aa3a15ed8ac56c221/colorama-0.4.6-py2.py3-none-any.whl", hash = "sha256:4f1d9991f5acc0ca119f9d443620b77f9d6b33703e51011c16baf57afb285fc6", size = 25335 }, +] + +[[package]] +name = "iniconfig" +version = "2.0.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d7/4b/cbd8e699e64a6f16ca3a8220661b5f83792b3017d0f79807cb8708d33913/iniconfig-2.0.0.tar.gz", hash = "sha256:2d91e135bf72d31a410b17c16da610a82cb55f6b0477d1a902134b24a455b8b3", size = 4646 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/ef/a6/62565a6e1cf69e10f5727360368e451d4b7f58beeac6173dc9db836a5b46/iniconfig-2.0.0-py3-none-any.whl", hash = "sha256:b6a85871a79d2e3b22d2d1b94ac2824226a63c6b741c88f7ae975f18b6778374", size = 5892 }, +] + +[[package]] +name = "minimize" +version = "0.1.0" +source = { editable = "." } + +[[package]] +name = "minimize-sample" +version = "0.1.0" +source = { editable = "sample" } +dependencies = [ + { name = "minimize" }, +] + +[package.dev-dependencies] +dev = [ + { name = "pytest" }, +] + +[package.metadata] +requires-dist = [{ name = "minimize", editable = "." }] + +[package.metadata.requires-dev] +dev = [{ name = "pytest", specifier = ">=8.2.2" }] + +[[package]] +name = "packaging" +version = "24.2" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d0/63/68dbb6eb2de9cb10ee4c9c14a0148804425e13c4fb20d61cce69f53106da/packaging-24.2.tar.gz", hash = "sha256:c228a6dc5e932d346bc5739379109d49e8853dd8223571c7c5b55260edc0b97f", size = 163950 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/88/ef/eb23f262cca3c0c4eb7ab1933c3b1f03d021f2c48f54763065b6f0e321be/packaging-24.2-py3-none-any.whl", hash = "sha256:09abb1bccd265c01f4a3aa3f7a7db064b36514d2cba19a2f694fe6150451a759", size = 65451 }, +] + +[[package]] +name = "pluggy" +version = "1.5.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/96/2d/02d4312c973c6050a18b314a5ad0b3210edb65a906f868e31c111dede4a6/pluggy-1.5.0.tar.gz", hash = "sha256:2cffa88e94fdc978c4c574f15f9e59b7f4201d439195c3715ca9e2486f1d0cf1", size = 67955 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/88/5f/e351af9a41f866ac3f1fac4ca0613908d9a41741cfcf2228f4ad853b697d/pluggy-1.5.0-py3-none-any.whl", hash = "sha256:44e1ad92c8ca002de6377e165f3e0f1be63266ab4d554740532335b9d75ea669", size = 20556 }, +] + +[[package]] +name = "pytest" +version = "8.3.4" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "colorama", marker = "sys_platform == 'win32'" }, + { name = "iniconfig" }, + { name = "packaging" }, + { name = "pluggy" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/05/35/30e0d83068951d90a01852cb1cef56e5d8a09d20c7f511634cc2f7e0372a/pytest-8.3.4.tar.gz", hash = "sha256:965370d062bce11e73868e0335abac31b4d3de0e82f4007408d242b4f8610761", size = 1445919 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/11/92/76a1c94d3afee238333bc0a42b82935dd8f9cf8ce9e336ff87ee14d9e1cf/pytest-8.3.4-py3-none-any.whl", hash = "sha256:50e16d954148559c9a74109af1eaf0c945ba2d8f30f0a3d3335edde19788b6f6", size = 343083 }, +] From cc787957f82a3a6527c48a9ae6fb35635e5fd2fd Mon Sep 17 00:00:00 2001 From: LukeMathWalker <20745048+LukeMathWalker@users.noreply.github.com> Date: Wed, 15 Jan 2025 14:33:54 +0100 Subject: [PATCH 08/10] Immutable exercise --- .../sample/tests/test_sample.py | 13 ++----- .../05_immutable_types/src/lib.rs | 38 +++++++++++++------ .../03_concurrency/05_immutable_types/uv.lock | 30 +++++++-------- 3 files changed, 44 insertions(+), 37 deletions(-) diff --git a/exercises/03_concurrency/05_immutable_types/sample/tests/test_sample.py b/exercises/03_concurrency/05_immutable_types/sample/tests/test_sample.py index 56191bf..b5833c9 100644 --- a/exercises/03_concurrency/05_immutable_types/sample/tests/test_sample.py +++ b/exercises/03_concurrency/05_immutable_types/sample/tests/test_sample.py @@ -2,15 +2,8 @@ # Do NOT modify the test itself! import pytest -from minimize import compute_prime_factors +from immutable import compute_area, Rectangle def test_compute_prime_factors(): - numbers = [387, 2, 75, 452, 562672865058083521] - number2prime_factors = compute_prime_factors(numbers) - assert number2prime_factors == { - 387: [3, 43], - 2: [2], - 75: [3, 5], - 452: [2, 113], - 562672865058083521: [7, 11483119695062929] - } + s = Rectangle(10, 12) + assert compute_area(s) == 120 diff --git a/exercises/03_concurrency/05_immutable_types/src/lib.rs b/exercises/03_concurrency/05_immutable_types/src/lib.rs index 2bc5839..f544fac 100644 --- a/exercises/03_concurrency/05_immutable_types/src/lib.rs +++ b/exercises/03_concurrency/05_immutable_types/src/lib.rs @@ -1,24 +1,38 @@ use pyo3::prelude::*; -#[pyclass(frozen)] -struct Point { - x: i32, - y: i32, +#[pyclass] +struct Rectangle { + width: u32, + length: u32, +} + +#[pymethods] +impl Rectangle { + #[new] + fn new(width: u32, length: u32) -> Self { + Self { width, length } + } } #[pyfunction] -fn print_point<'py>(python: Python<'py>, point: Bound<'py, Point>) { - let point: Py = point.unbind(); +/// Compute the area of a rectangle while allowing Python to run other threads. +/// Fill in the body of the function. +/// Modify `Rectangle`'s definition if necessary. +/// +/// # Constraints +/// +/// Do NOT remove the `allow_threads` call. The computation must be done inside +/// the closure passed to `allow_threads`. +fn compute_area<'py>(python: Python<'py>, shape: Bound<'py, Rectangle>) -> u32 { python.allow_threads(|| { - // We can now access the fields of the Point struct - // even though we are not holding the GIL - let point: &Point = point.get(); - println!("({}, {})", point.x, point.y); - }); + let area: u32 = todo!(); + area + }) } #[pymodule] fn immutable(m: &Bound<'_, PyModule>) -> PyResult<()> { - m.add_function(wrap_pyfunction!(print_point, m)?)?; + m.add_function(wrap_pyfunction!(compute_area, m)?)?; + m.add_class::()?; Ok(()) } diff --git a/exercises/03_concurrency/05_immutable_types/uv.lock b/exercises/03_concurrency/05_immutable_types/uv.lock index a6c2ca4..762726c 100644 --- a/exercises/03_concurrency/05_immutable_types/uv.lock +++ b/exercises/03_concurrency/05_immutable_types/uv.lock @@ -3,8 +3,8 @@ requires-python = ">=3.11" [manifest] members = [ - "minimize", - "minimize-sample", + "immutable", + "immutable-sample", ] [[package]] @@ -17,25 +17,16 @@ wheels = [ ] [[package]] -name = "iniconfig" -version = "2.0.0" -source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/d7/4b/cbd8e699e64a6f16ca3a8220661b5f83792b3017d0f79807cb8708d33913/iniconfig-2.0.0.tar.gz", hash = "sha256:2d91e135bf72d31a410b17c16da610a82cb55f6b0477d1a902134b24a455b8b3", size = 4646 } -wheels = [ - { url = "https://files.pythonhosted.org/packages/ef/a6/62565a6e1cf69e10f5727360368e451d4b7f58beeac6173dc9db836a5b46/iniconfig-2.0.0-py3-none-any.whl", hash = "sha256:b6a85871a79d2e3b22d2d1b94ac2824226a63c6b741c88f7ae975f18b6778374", size = 5892 }, -] - -[[package]] -name = "minimize" +name = "immutable" version = "0.1.0" source = { editable = "." } [[package]] -name = "minimize-sample" +name = "immutable-sample" version = "0.1.0" source = { editable = "sample" } dependencies = [ - { name = "minimize" }, + { name = "immutable" }, ] [package.dev-dependencies] @@ -44,11 +35,20 @@ dev = [ ] [package.metadata] -requires-dist = [{ name = "minimize", editable = "." }] +requires-dist = [{ name = "immutable", editable = "." }] [package.metadata.requires-dev] dev = [{ name = "pytest", specifier = ">=8.2.2" }] +[[package]] +name = "iniconfig" +version = "2.0.0" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/d7/4b/cbd8e699e64a6f16ca3a8220661b5f83792b3017d0f79807cb8708d33913/iniconfig-2.0.0.tar.gz", hash = "sha256:2d91e135bf72d31a410b17c16da610a82cb55f6b0477d1a902134b24a455b8b3", size = 4646 } +wheels = [ + { url = "https://files.pythonhosted.org/packages/ef/a6/62565a6e1cf69e10f5727360368e451d4b7f58beeac6173dc9db836a5b46/iniconfig-2.0.0-py3-none-any.whl", hash = "sha256:b6a85871a79d2e3b22d2d1b94ac2824226a63c6b741c88f7ae975f18b6778374", size = 5892 }, +] + [[package]] name = "packaging" version = "24.2" From bbeec3fd76d629f6803000bcdde95d936cc2d08a Mon Sep 17 00:00:00 2001 From: LukeMathWalker <20745048+LukeMathWalker@users.noreply.github.com> Date: Wed, 15 Jan 2025 14:39:57 +0100 Subject: [PATCH 09/10] Remove solutions --- .../src/mprocessing/__init__.py | 11 +------ .../src/mthreading/__init__.py | 14 +-------- exercises/03_concurrency/02_gil/src/lib.rs | 17 +--------- .../03_releasing_the_gil/src/lib.rs | 20 ++++++------ .../04_minimize_gil_locking/src/lib.rs | 31 ++++++++----------- 5 files changed, 25 insertions(+), 68 deletions(-) diff --git a/exercises/03_concurrency/00_introduction/src/mprocessing/__init__.py b/exercises/03_concurrency/00_introduction/src/mprocessing/__init__.py index 688270c..6c9b99a 100644 --- a/exercises/03_concurrency/00_introduction/src/mprocessing/__init__.py +++ b/exercises/03_concurrency/00_introduction/src/mprocessing/__init__.py @@ -15,16 +15,7 @@ # Relevant links: # - https://docs.python.org/3/library/multiprocessing.html def word_count(text: str, n_processes: int) -> int: - result_queue = Queue() - processes = [] - for chunk in split_into_chunks(text, n_processes): - p = Process(target=word_count_task, args=(chunk, result_queue)) - p.start() - processes.append(p) - for p in processes: - p.join() - results = [result_queue.get() for _ in range(len(processes))] - return sum(results) + pass # Compute the number of words in `text` and push the result into `result_queue`. diff --git a/exercises/03_concurrency/01_python_threads/src/mthreading/__init__.py b/exercises/03_concurrency/01_python_threads/src/mthreading/__init__.py index d213a74..1d336ef 100644 --- a/exercises/03_concurrency/01_python_threads/src/mthreading/__init__.py +++ b/exercises/03_concurrency/01_python_threads/src/mthreading/__init__.py @@ -17,19 +17,7 @@ # - https://docs.python.org/3/library/threading.html # - https://docs.python.org/3/library/queue.html def word_count(text: str, n_threads: int) -> int: - result_queue = Queue() - threads = [] - - for chunk in split_into_chunks(text, n_threads): - t = Thread(target=word_count_task, args=(chunk, result_queue)) - t.start() - threads.append(t) - - for t in threads: - t.join() - - results = [result_queue.get() for _ in range(len(threads))] - return sum(results) + pass # Compute the number of words in `text` and push the result into `result_queue`. diff --git a/exercises/03_concurrency/02_gil/src/lib.rs b/exercises/03_concurrency/02_gil/src/lib.rs index 9d76cec..fdc9a82 100644 --- a/exercises/03_concurrency/02_gil/src/lib.rs +++ b/exercises/03_concurrency/02_gil/src/lib.rs @@ -20,22 +20,7 @@ fn word_count(text: Bound<'_, PyString>, n_threads: usize) -> PyResult { // directly as an argument, to avoid an extra copy of the string let text = text.to_str()?; - let chunks = split_into_chunks(text, n_threads); - let mut count = 0; - - std::thread::scope(|scope| { - let mut handles = Vec::with_capacity(n_threads); - for chunk in chunks { - let handle = scope.spawn(move || word_count_chunk(chunk)); - handles.push(handle); - } - - for handle in handles { - count += handle.join().unwrap(); - } - }); - - Ok(count) + todo!() } /// Count words in a single chunk of text. diff --git a/exercises/03_concurrency/03_releasing_the_gil/src/lib.rs b/exercises/03_concurrency/03_releasing_the_gil/src/lib.rs index d0c279c..d3aa7d9 100644 --- a/exercises/03_concurrency/03_releasing_the_gil/src/lib.rs +++ b/exercises/03_concurrency/03_releasing_the_gil/src/lib.rs @@ -2,18 +2,16 @@ use pyo3::prelude::*; #[pyfunction] // Modify this function to release the GIL while computing the nth prime number. -fn nth_prime(python: Python<'_>, n: u64) -> u64 { - python.allow_threads(|| { - let mut count = 0; - let mut num = 2; // Start checking primes from 2 - while count < n { - if is_prime(num) { - count += 1; - } - num += 1; +fn nth_prime(n: u64) -> u64 { + let mut count = 0; + let mut num = 2; // Start checking primes from 2 + while count < n { + if is_prime(num) { + count += 1; } - num - 1 // Subtract 1 because we increment after finding the nth prime - }) + num += 1; + } + num - 1 // Subtract 1 because we increment after finding the nth prime } fn is_prime(n: u64) -> bool { diff --git a/exercises/03_concurrency/04_minimize_gil_locking/src/lib.rs b/exercises/03_concurrency/04_minimize_gil_locking/src/lib.rs index 83a17d5..5ddf637 100644 --- a/exercises/03_concurrency/04_minimize_gil_locking/src/lib.rs +++ b/exercises/03_concurrency/04_minimize_gil_locking/src/lib.rs @@ -1,35 +1,30 @@ -use std::collections::HashMap; - -use primes::factors_uniq; use pyo3::{ prelude::*, - types::{IntoPyDict, PyDict, PyList}, + types::{PyDict, PyList}, }; -use rayon::prelude::*; #[pyfunction] // You're given a Python list of non-negative numbers. // You need to return a Python dictionary where the keys are the numbers in the list and the values // are the unique prime factors of each number, sorted in ascending order. // -// Constraints: -// - Don't hold the GIL while computing the prime factors +// # Resources +// +// You can use `factors_uniq` from the `primes` crate to compute the prime factors of a number. +// +// # Constraints +// +// Don't hold the GIL while computing the prime factors +// +// # Fun additional challenge // -// Fun additional challenge: -// - Can you use multiple threads to parallelize the computation? -// Consider using `rayon` to make it easier. +// Can you use multiple threads to parallelize the computation? +// Consider using `rayon` to make it easier. fn compute_prime_factors<'python>( python: Python<'python>, numbers: Bound<'python, PyList>, ) -> PyResult> { - let inputs: Vec = numbers.extract()?; - let m: HashMap> = python.allow_threads(|| { - inputs - .into_par_iter() - .map(|number| (number, factors_uniq(number))) - .collect() - }); - m.into_py_dict(python) + todo!() } #[pymodule] From 246ed3b6612fd0efcf7eb9f8f63ba080da3f0855 Mon Sep 17 00:00:00 2001 From: LukeMathWalker <20745048+LukeMathWalker@users.noreply.github.com> Date: Wed, 15 Jan 2025 14:42:50 +0100 Subject: [PATCH 10/10] Formatting --- Cargo.toml | 6 +++--- book/src/03_concurrency/01_python_threads.md | 15 +++++++-------- book/src/03_concurrency/02_gil.md | 10 +++++----- exercises/03_concurrency/02_gil/pyproject.toml | 6 +++--- .../03_releasing_the_gil/pyproject.toml | 6 +++--- .../04_minimize_gil_locking/Cargo.toml | 2 +- .../04_minimize_gil_locking/pyproject.toml | 6 +++--- .../05_immutable_types/pyproject.toml | 6 +++--- 8 files changed, 28 insertions(+), 29 deletions(-) diff --git a/Cargo.toml b/Cargo.toml index 7d80121..404fc0c 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -1,16 +1,16 @@ [workspace] members = ["exercises/*/*", "verifier"] exclude = [ - "exercises/03_concurrency/00_introduction", - "exercises/03_concurrency/01_python_threads", + "exercises/03_concurrency/00_introduction", + "exercises/03_concurrency/01_python_threads", ] resolver = "2" [workspace.dependencies] anyhow = "1" duct = "0.13" -pyo3 = "0.23.3" primes = "0.4" +pyo3 = "0.23.3" rayon = "1" semver = "1.0.23" serde = "1.0.204" diff --git a/book/src/03_concurrency/01_python_threads.md b/book/src/03_concurrency/01_python_threads.md index 878c710..dc2bf9e 100644 --- a/book/src/03_concurrency/01_python_threads.md +++ b/book/src/03_concurrency/01_python_threads.md @@ -23,7 +23,7 @@ def word_count(text: str, n_processes: int) -> int: Let's focus, in particular, on process creation: ```python - p = Process(target=word_count_task, args=(chunk, result_queue)) +p = Process(target=word_count_task, args=(chunk, result_queue)) ``` The parent process (the one executing `word_count`) doesn't share memory with the child process (the one @@ -42,7 +42,6 @@ A **thread** is an execution context **within a process**.\ Threads share the same memory space and resources as the process that spawned them, thus allowing them to communicate and share data with one another more easily than processes can. - ```ascii +------------------------+ | Memory | @@ -87,11 +86,11 @@ The API of the `Thread` class, in particular, mirrors what you already know from - [`Queue` class](https://docs.python.org/3/library/queue.html) [^pickle]: To be more precise, the `multiprocessing` module uses the `pickle` module to serialize the objects - that must be passed as arguments to the child process. - The serialized data is then sent to the child process, as a byte stream, over an operating system pipe. - On the other side of the pipe, the child process deserializes the byte stream back into Python objects using `pickle` - and passes them to the target function.\ - This all system has higher overhead than a "simple" deep copy. +that must be passed as arguments to the child process. +The serialized data is then sent to the child process, as a byte stream, over an operating system pipe. +On the other side of the pipe, the child process deserializes the byte stream back into Python objects using `pickle` +and passes them to the target function.\ +This all system has higher overhead than a "simple" deep copy. [^mmap]: Common workarounds include memory-mapped files and shared-memory objects, but these can be quite - difficult to work with. They also suffer from portability issues, as they rely on OS-specific features. +difficult to work with. They also suffer from portability issues, as they rely on OS-specific features. diff --git a/book/src/03_concurrency/02_gil.md b/book/src/03_concurrency/02_gil.md index 6f847ed..086f58a 100644 --- a/book/src/03_concurrency/02_gil.md +++ b/book/src/03_concurrency/02_gil.md @@ -27,7 +27,7 @@ def word_count(text: str, n_threads: int) -> int: When a thread is created, we are no longer cloning the text chunk nor incurring the overhead of inter-process communication: ```python - t = Thread(target=word_count_task, args=(chunk, result_queue)) +t = Thread(target=word_count_task, args=(chunk, result_queue)) ``` Since the spawned threads share the same memory space as the parent thread, they can access the `chunk` and `result_queue` directly. @@ -60,7 +60,7 @@ pure Rust threads are not affected by the GIL, as long as they don't need to int Let's rewrite again our `word_count` function, this time in Rust! [^free-threading]: This is the current state of Python's concurrency model. There are some exciting changes on the horizon, though! - [`CPython`'s free-threading mode](https://docs.python.org/3/howto/free-threading-python.html) is an experimental feature - that aims to remove the GIL entirely. - It would allow multiple threads to execute Python code simultaneously, without forcing developers to rely on multiprocessing. - We won't cover the new free-threading mode in this course, but it's worth keeping an eye on it as it matures out of the experimental phase. +[`CPython`'s free-threading mode](https://docs.python.org/3/howto/free-threading-python.html) is an experimental feature +that aims to remove the GIL entirely. +It would allow multiple threads to execute Python code simultaneously, without forcing developers to rely on multiprocessing. +We won't cover the new free-threading mode in this course, but it's worth keeping an eye on it as it matures out of the experimental phase. diff --git a/exercises/03_concurrency/02_gil/pyproject.toml b/exercises/03_concurrency/02_gil/pyproject.toml index 8dff496..f1a63b2 100644 --- a/exercises/03_concurrency/02_gil/pyproject.toml +++ b/exercises/03_concurrency/02_gil/pyproject.toml @@ -6,9 +6,9 @@ build-backend = "maturin" name = "gil2" requires-python = ">=3.11" classifiers = [ - "Programming Language :: Rust", - "Programming Language :: Python :: Implementation :: CPython", - "Programming Language :: Python :: Implementation :: PyPy", + "Programming Language :: Rust", + "Programming Language :: Python :: Implementation :: CPython", + "Programming Language :: Python :: Implementation :: PyPy", ] version = "0.1.0" diff --git a/exercises/03_concurrency/03_releasing_the_gil/pyproject.toml b/exercises/03_concurrency/03_releasing_the_gil/pyproject.toml index 0b7af6f..54f8585 100644 --- a/exercises/03_concurrency/03_releasing_the_gil/pyproject.toml +++ b/exercises/03_concurrency/03_releasing_the_gil/pyproject.toml @@ -6,9 +6,9 @@ build-backend = "maturin" name = "release" requires-python = ">=3.11" classifiers = [ - "Programming Language :: Rust", - "Programming Language :: Python :: Implementation :: CPython", - "Programming Language :: Python :: Implementation :: PyPy", + "Programming Language :: Rust", + "Programming Language :: Python :: Implementation :: CPython", + "Programming Language :: Python :: Implementation :: PyPy", ] version = "0.1.0" diff --git a/exercises/03_concurrency/04_minimize_gil_locking/Cargo.toml b/exercises/03_concurrency/04_minimize_gil_locking/Cargo.toml index 363a42d..574487f 100644 --- a/exercises/03_concurrency/04_minimize_gil_locking/Cargo.toml +++ b/exercises/03_concurrency/04_minimize_gil_locking/Cargo.toml @@ -7,6 +7,6 @@ edition = "2021" crate-type = ["cdylib"] [dependencies] -pyo3 = { workspace = true } primes = { workspace = true } +pyo3 = { workspace = true } rayon = { workspace = true } diff --git a/exercises/03_concurrency/04_minimize_gil_locking/pyproject.toml b/exercises/03_concurrency/04_minimize_gil_locking/pyproject.toml index 7ad9318..bc26c00 100644 --- a/exercises/03_concurrency/04_minimize_gil_locking/pyproject.toml +++ b/exercises/03_concurrency/04_minimize_gil_locking/pyproject.toml @@ -6,9 +6,9 @@ build-backend = "maturin" name = "minimize" requires-python = ">=3.11" classifiers = [ - "Programming Language :: Rust", - "Programming Language :: Python :: Implementation :: CPython", - "Programming Language :: Python :: Implementation :: PyPy", + "Programming Language :: Rust", + "Programming Language :: Python :: Implementation :: CPython", + "Programming Language :: Python :: Implementation :: PyPy", ] version = "0.1.0" diff --git a/exercises/03_concurrency/05_immutable_types/pyproject.toml b/exercises/03_concurrency/05_immutable_types/pyproject.toml index 3d32113..4e64a02 100644 --- a/exercises/03_concurrency/05_immutable_types/pyproject.toml +++ b/exercises/03_concurrency/05_immutable_types/pyproject.toml @@ -6,9 +6,9 @@ build-backend = "maturin" name = "immutable" requires-python = ">=3.11" classifiers = [ - "Programming Language :: Rust", - "Programming Language :: Python :: Implementation :: CPython", - "Programming Language :: Python :: Implementation :: PyPy", + "Programming Language :: Rust", + "Programming Language :: Python :: Implementation :: CPython", + "Programming Language :: Python :: Implementation :: PyPy", ] version = "0.1.0"