-
Notifications
You must be signed in to change notification settings - Fork 47
Description
This script reproduces the deadlock pretty reliably with python 3.7.13, and pulsar-client-3.0 on an x86 Macbook Pro. We originally noticed this in much more complex producer logic in our CI pipeline on linux, but this minimally seems to get into the same deadlocked state.
import pulsar
from time import sleep
client = pulsar.Client('pulsar://localhost:6650')
producer = client.create_producer('persistent://sample/standalone/ns/my-topic')
def send_callback(res, msg):
print(f"Message '{msg}' published res={res}")
for i in range(30):
producer.send_async(f"Hello-{i}".encode('utf-8'), callback=send_callback)
# Sleep to allow sends to complete concurrently before closing the connection
sleep(0.5)
client.close()
With an older pulsar-client version 2.10.2
, this works as expected:
❯ python deadlock_repro.py
2023-01-24 15:45:26.883 INFO [0x700009c56000] ExecutorService:41 | Run io_service in a single thread
2023-01-24 15:45:26.883 INFO [0x10f520600] ClientConnection:189 | [<none> -> pulsar://127.0.0.1:6660] Create ClientConnection, timeout=10000
2023-01-24 15:45:26.883 INFO [0x10f520600] ConnectionPool:96 | Created connection for pulsar://127.0.0.1:6660
2023-01-24 15:45:26.884 INFO [0x700009c56000] ClientConnection:375 | [127.0.0.1:50620 -> 127.0.0.1:6660] Connected to broker
2023-01-24 15:45:26.887 INFO [0x700009d5c000] ExecutorService:41 | Run io_service in a single thread
2023-01-24 15:45:26.887 INFO [0x700009c56000] HandlerBase:61 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-0, ] Getting connection from pool
2023-01-24 15:45:26.887 INFO [0x700009c56000] HandlerBase:61 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-1, ] Getting connection from pool
2023-01-24 15:45:26.887 INFO [0x700009c56000] HandlerBase:61 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-2, ] Getting connection from pool
2023-01-24 15:45:26.887 INFO [0x700009c56000] HandlerBase:61 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-3, ] Getting connection from pool
2023-01-24 15:45:26.888 INFO [0x700009c56000] ClientConnection:189 | [<none> -> pulsar://127.0.0.1:6660] Create ClientConnection, timeout=10000
2023-01-24 15:45:26.888 INFO [0x700009c56000] ConnectionPool:96 | Created connection for pulsar://localhost:6650
2023-01-24 15:45:26.889 INFO [0x700009c56000] ClientConnection:377 | [127.0.0.1:50621 -> 127.0.0.1:6660] Connected to broker through proxy. Logical broker: pulsar://localhost:6650
2023-01-24 15:45:26.892 INFO [0x700009c56000] ProducerImpl:174 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-0, ] Created producer on broker [127.0.0.1:50621 -> 127.0.0.1:6660]
2023-01-24 15:45:26.892 INFO [0x700009c56000] ProducerImpl:174 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-1, ] Created producer on broker [127.0.0.1:50621 -> 127.0.0.1:6660]
2023-01-24 15:45:26.892 INFO [0x700009c56000] ProducerImpl:174 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-2, ] Created producer on broker [127.0.0.1:50621 -> 127.0.0.1:6660]
2023-01-24 15:45:26.893 INFO [0x700009c56000] ProducerImpl:174 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-3, ] Created producer on broker [127.0.0.1:50621 -> 127.0.0.1:6660]
Message '(1160,53,1,-1)' published res=Ok
Message '(1160,54,1,-1)' published res=Ok
Message '(1160,55,1,-1)' published res=Ok
Message '(1160,56,1,-1)' published res=Ok
Message '(1160,57,1,-1)' published res=Ok
Message '(1160,58,1,-1)' published res=Ok
Message '(1160,59,1,-1)' published res=Ok
Message '(1160,60,1,-1)' published res=Ok
Message '(1159,56,3,-1)' published res=Ok
Message '(1159,57,3,-1)' published res=Ok
Message '(1159,58,3,-1)' published res=Ok
Message '(1159,59,3,-1)' published res=Ok
Message '(1158,58,0,-1)' published res=Ok
Message '(1158,59,0,-1)' published res=Ok
Message '(1158,60,0,-1)' published res=Ok
Message '(1158,61,0,-1)' published res=Ok
Message '(1158,62,0,-1)' published res=Ok
Message '(1158,63,0,-1)' published res=Ok
Message '(1158,64,0,-1)' published res=Ok
Message '(1158,65,0,-1)' published res=Ok
Message '(1161,53,2,-1)' published res=Ok
Message '(1161,54,2,-1)' published res=Ok
Message '(1161,55,2,-1)' published res=Ok
Message '(1161,56,2,-1)' published res=Ok
Message '(1159,60,3,-1)' published res=Ok
Message '(1161,57,2,-1)' published res=Ok
Message '(1161,58,2,-1)' published res=Ok
Message '(1161,59,2,-1)' published res=Ok
Message '(1159,61,3,-1)' published res=Ok
Message '(1159,62,3,-1)' published res=Ok
2023-01-24 15:45:27.398 INFO [0x10f520600] ClientImpl:505 | Closing Pulsar client with 1 producers and 0 consumers
2023-01-24 15:45:27.398 INFO [0x10f520600] ProducerImpl:651 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-0, standalone-0-80] Closing producer for topic persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-0
2023-01-24 15:45:27.398 INFO [0x10f520600] ProducerImpl:651 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-1, standalone-0-81] Closing producer for topic persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-1
2023-01-24 15:45:27.398 INFO [0x10f520600] ProducerImpl:651 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-2, standalone-0-82] Closing producer for topic persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-2
2023-01-24 15:45:27.398 INFO [0x10f520600] ProducerImpl:651 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-3, standalone-0-83] Closing producer for topic persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-3
2023-01-24 15:45:27.400 INFO [0x700009c56000] ProducerImpl:691 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-0, standalone-0-80] Closed producer
2023-01-24 15:45:27.400 INFO [0x700009c56000] ProducerImpl:691 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-1, standalone-0-81] Closed producer
2023-01-24 15:45:27.400 INFO [0x700009c56000] ProducerImpl:691 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-2, standalone-0-82] Closed producer
2023-01-24 15:45:27.401 INFO [0x700009c56000] ProducerImpl:691 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-3, standalone-0-83] Closed producer
2023-01-24 15:45:27.401 INFO [0x700009ddf000] ClientConnection:1563 | [127.0.0.1:50620 -> 127.0.0.1:6660] Connection closed
2023-01-24 15:45:27.401 INFO [0x700009ddf000] ClientConnection:263 | [127.0.0.1:50620 -> 127.0.0.1:6660] Destroyed connection
2023-01-24 15:45:27.401 INFO [0x700009ddf000] ClientConnection:1563 | [127.0.0.1:50621 -> 127.0.0.1:6660] Connection closed
2023-01-24 15:45:27.401 INFO [0x700009c56000] ExecutorService:47 | Event loop of ExecutorService exits successfully
2023-01-24 15:45:27.401 INFO [0x700009d5c000] ExecutorService:47 | Event loop of ExecutorService exits successfully
However, if I upgrade to pulsar-client==3.0
, that script gets deadlocked, and does not respond to SIGINT:
❯ python deadlock_repro.py
2023-01-24 15:41:21.192 INFO [0x10db56600] ClientConnection:189 | [<none> -> pulsar://127.0.0.1:6660] Create ClientConnection, timeout=10000
2023-01-24 15:41:21.192 INFO [0x10db56600] ConnectionPool:97 | Created connection for pulsar://127.0.0.1:6660
2023-01-24 15:41:21.194 INFO [0x7000048cd000] ClientConnection:379 | [127.0.0.1:50470 -> 127.0.0.1:6660] Connected to broker
2023-01-24 15:41:21.208 INFO [0x7000048cd000] HandlerBase:72 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-0, ] Getting connection from pool
2023-01-24 15:41:21.208 INFO [0x7000048cd000] HandlerBase:72 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-1, ] Getting connection from pool
2023-01-24 15:41:21.208 INFO [0x7000048cd000] HandlerBase:72 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-2, ] Getting connection from pool
2023-01-24 15:41:21.209 INFO [0x7000048cd000] HandlerBase:72 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-3, ] Getting connection from pool
2023-01-24 15:41:21.210 INFO [0x7000048cd000] ClientConnection:189 | [<none> -> pulsar://127.0.0.1:6660] Create ClientConnection, timeout=10000
2023-01-24 15:41:21.210 INFO [0x7000048cd000] ConnectionPool:97 | Created connection for pulsar://localhost:6650
2023-01-24 15:41:21.211 INFO [0x7000048cd000] ClientConnection:381 | [127.0.0.1:50471 -> 127.0.0.1:6660] Connected to broker through proxy. Logical broker: pulsar://localhost:6650
2023-01-24 15:41:21.217 INFO [0x7000048cd000] ProducerImpl:190 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-3, ] Created producer on broker [127.0.0.1:50471 -> 127.0.0.1:6660]
2023-01-24 15:41:21.217 INFO [0x7000048cd000] ProducerImpl:190 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-0, ] Created producer on broker [127.0.0.1:50471 -> 127.0.0.1:6660]
2023-01-24 15:41:21.217 INFO [0x7000048cd000] ProducerImpl:190 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-1, ] Created producer on broker [127.0.0.1:50471 -> 127.0.0.1:6660]
2023-01-24 15:41:21.217 INFO [0x7000048cd000] ProducerImpl:190 | [persistent://chariot1/chariot_ns_sre--heartbeat/chariot_topic_heartbeat-partition-2, ] Created producer on broker [127.0.0.1:50471 -> 127.0.0.1:6660]
We have observed this on x86 Mac laptops, and on linux (in our CI system, testing a much more complex producer than in the script above).
lldb
on mac shows the following thread dump of the deadlocked process:
(lldb) thread list
Process 81000 stopped
* thread #1: tid = 0x167b444, 0x00007ff80a193bd2 libsystem_kernel.dylib`__psynch_mutexwait + 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
thread #2: tid = 0x167b445, 0x00007ff80a1943ea libsystem_kernel.dylib`__psynch_cvwait + 10
thread #3: tid = 0x167b446, 0x00007ff80a1943ea libsystem_kernel.dylib`__psynch_cvwait + 10
thread #4: tid = 0x167b447, 0x00007ff80a19634e libsystem_kernel.dylib`kevent + 10
(lldb)
(lldb) bt all
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00007ff80a193bd2 libsystem_kernel.dylib`__psynch_mutexwait + 10
frame #1: 0x00007ff80a1cbe7e libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_wait + 76
frame #2: 0x00007ff80a1c9cbb libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_slow + 205
frame #3: 0x00007ff80a12e739 libc++.1.dylib`std::__1::mutex::lock() + 9
frame #4: 0x00000001087b7c25 _pulsar.cpython-37m-darwin.so`pulsar::ClientConnection::sendMessage(pulsar::OpSendMsg const&) + 53
frame #5: 0x00000001088d411a _pulsar.cpython-37m-darwin.so`pulsar::ProducerImpl::sendMessage(pulsar::OpSendMsg const&) + 298
frame #6: 0x00000001088d2916 _pulsar.cpython-37m-darwin.so`pulsar::ProducerImpl::sendAsyncWithStatsUpdate(pulsar::Message const&, std::__1::function<void (pulsar::Result, pulsar::MessageId const&)> const&) + 3526
frame #7: 0x00000001088d19bc _pulsar.cpython-37m-darwin.so`pulsar::ProducerImpl::sendAsync(pulsar::Message const&, std::__1::function<void (pulsar::Result, pulsar::MessageId const&)>) + 364
frame #8: 0x00000001088b9a82 _pulsar.cpython-37m-darwin.so`pulsar::PartitionedProducerImpl::sendAsync(pulsar::Message const&, std::__1::function<void (pulsar::Result, pulsar::MessageId const&)>) + 914
frame #9: 0x00000001088c8eb5 _pulsar.cpython-37m-darwin.so`pulsar::Producer::sendAsync(pulsar::Message const&, std::__1::function<void (pulsar::Result, pulsar::MessageId const&)>) + 149
frame #10: 0x0000000108771ac4 _pulsar.cpython-37m-darwin.so`void pybind11::detail::argument_loader<pulsar::Producer*, pulsar::Message const&, std::__1::function<void (pulsar::Result, pulsar::MessageId const&)> >::call_impl<void, pybind11::cpp_function::cpp_function<void, pulsar::Producer, pulsar::Message const&, std::__1::function<void (pulsar::Result, pulsar::MessageId const&)>, pybind11::name, pybind11::is_method, pybind11::sibling>(void (pulsar::Producer::*)(pulsar::Message const&, std::__1::function<void (pulsar::Result, pulsar::MessageId const&)>), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(pulsar::Producer*, pulsar::Message const&, std::__1::function<void (pulsar::Result, pulsar::MessageId const&)>)&, 0ul, 1ul, 2ul, pybind11::detail::void_type>(pulsar::Producer&&, pybind11::detail::index_sequence<0ul, 1ul, 2ul>, pybind11::detail::void_type&&) && + 212
frame #11: 0x0000000108770f2f _pulsar.cpython-37m-darwin.so`void pybind11::cpp_function::initialize<pybind11::cpp_function::cpp_function<void, pulsar::Producer, pulsar::Message const&, std::__1::function<void (pulsar::Result, pulsar::MessageId const&)>, pybind11::name, pybind11::is_method, pybind11::sibling>(void (pulsar::Producer::*)(pulsar::Message const&, std::__1::function<void (pulsar::Result, pulsar::MessageId const&)>), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(pulsar::Producer*, pulsar::Message const&, std::__1::function<void (pulsar::Result, pulsar::MessageId const&)>), void, pulsar::Producer*, pulsar::Message const&, std::__1::function<void (pulsar::Result, pulsar::MessageId const&)>, pybind11::name, pybind11::is_method, pybind11::sibling>(void&&, pulsar::Producer (*)(pulsar::Message const&, std::__1::function<void (pulsar::Result, pulsar::MessageId const&)>), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(pybind11::detail::function_call&)::operator()(pybind11::detail::function_call&) const + 255
frame #12: 0x000000010871420d _pulsar.cpython-37m-darwin.so`pybind11::cpp_function::dispatcher(_object*, _object*, _object*) + 4733
frame #13: 0x000000010597218a python3.7`_PyMethodDef_RawFastCallKeywords + 714
frame #14: 0x00000001059711ac python3.7`_PyObject_FastCallKeywords + 332
frame #15: 0x0000000105a47ca5 python3.7`call_function + 773
frame #16: 0x0000000105a44428 python3.7`_PyEval_EvalFrameDefault + 28344
frame #17: 0x0000000105a489a8 python3.7`_PyEval_EvalCodeWithName + 2888
frame #18: 0x0000000105971444 python3.7`_PyFunction_FastCallKeywords + 228
frame #19: 0x0000000105a47cac python3.7`call_function + 780
frame #20: 0x0000000105a44574 python3.7`_PyEval_EvalFrameDefault + 28676
frame #21: 0x0000000105a489a8 python3.7`_PyEval_EvalCodeWithName + 2888
frame #22: 0x0000000105a3d4d0 python3.7`PyEval_EvalCode + 48
frame #23: 0x0000000105a801de python3.7`PyRun_FileExFlags + 174
frame #24: 0x0000000105a7f77e python3.7`PyRun_SimpleFileExFlags + 270
frame #25: 0x0000000105aa165e python3.7`pymain_main + 6622
frame #26: 0x0000000105aa202f python3.7`_Py_UnixMain + 111
frame #27: 0x000000010dadb52e dyld`start + 462
thread #2
frame #0: 0x00007ff80a1943ea libsystem_kernel.dylib`__psynch_cvwait + 10
frame #1: 0x00007ff80a1cea6f libsystem_pthread.dylib`_pthread_cond_wait + 1249
frame #2: 0x0000000105a3cb9f python3.7`take_gil + 255
frame #3: 0x0000000105a3cfb3 python3.7`PyEval_AcquireThread + 19
frame #4: 0x000000010870be43 _pulsar.cpython-37m-darwin.so`pybind11::gil_scoped_acquire::gil_scoped_acquire() + 83
frame #5: 0x00000001087714f3 _pulsar.cpython-37m-darwin.so`pybind11::detail::type_caster<std::__1::function<void (pulsar::Result, pulsar::MessageId const&)>, void>::load(pybind11::handle, bool)::func_handle::func_handle(func_handle const&) + 35
frame #6: 0x00000001087715f1 _pulsar.cpython-37m-darwin.so`std::__1::__function::__func<pybind11::detail::type_caster<std::__1::function<void (pulsar::Result, pulsar::MessageId const&)>, void>::load(pybind11::handle, bool)::func_wrapper, std::__1::allocator<pybind11::detail::type_caster<std::__1::function<void (pulsar::Result, pulsar::MessageId const&)>, void>::load(pybind11::handle, bool)::func_wrapper>, void (pulsar::Result, pulsar::MessageId const&)>::__clone() const + 49
frame #7: 0x00000001088daccd _pulsar.cpython-37m-darwin.so`std::__1::__function::__func<pulsar::ProducerImpl::sendAsync(pulsar::Message const&, std::__1::function<void (pulsar::Result, pulsar::MessageId const&)>)::$_2, std::__1::allocator<pulsar::ProducerImpl::sendAsync(pulsar::Message const&, std::__1::function<void (pulsar::Result, pulsar::MessageId const&)>)::$_2>, void (pulsar::Result, pulsar::MessageId const&)>::__clone() const + 93
frame #8: 0x000000010878f7de _pulsar.cpython-37m-darwin.so`pulsar::OpSendMsg::OpSendMsg(pulsar::OpSendMsg const&) + 126
frame #9: 0x00000001087f0f85 _pulsar.cpython-37m-darwin.so`boost::any::holder<pulsar::OpSendMsg>::clone() const + 53
frame #10: 0x00000001087b864a _pulsar.cpython-37m-darwin.so`pulsar::ClientConnection::sendPendingCommands() + 106
frame #11: 0x00000001087f6812 _pulsar.cpython-37m-darwin.so`boost::asio::detail::write_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor>, pulsar::CompositeSharedBuffer<2>, boost::asio::const_buffer const*, boost::asio::detail::transfer_all_t, AllocHandler<std::__1::__bind<void (pulsar::ClientConnection::*)(boost::system::error_code const&), std::__1::shared_ptr<pulsar::ClientConnection>, std::__1::placeholders::__ph<1> const&> > >::operator()(boost::system::error_code, unsigned long, int) + 434
frame #12: 0x00000001087f6b60 _pulsar.cpython-37m-darwin.so`boost::asio::detail::reactive_socket_send_op<boost::asio::detail::prepared_buffers<boost::asio::const_buffer, 64ul>, boost::asio::detail::write_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::any_io_executor>, pulsar::CompositeSharedBuffer<2>, boost::asio::const_buffer const*, boost::asio::detail::transfer_all_t, AllocHandler<std::__1::__bind<void (pulsar::ClientConnection::*)(boost::system::error_code const&), std::__1::shared_ptr<pulsar::ClientConnection>, std::__1::placeholders::__ph<1> const&> > >, boost::asio::any_io_executor>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) + 320
frame #13: 0x00000001087c0946 _pulsar.cpython-37m-darwin.so`boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) + 694
frame #14: 0x00000001087c0481 _pulsar.cpython-37m-darwin.so`boost::asio::detail::scheduler::run(boost::system::error_code&) + 321
frame #15: 0x0000000108866ca7 _pulsar.cpython-37m-darwin.so`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, pulsar::ExecutorService::start()::$_0> >(void*) + 119
frame #16: 0x00007ff80a1ce4e1 libsystem_pthread.dylib`_pthread_start + 125
frame #17: 0x00007ff80a1c9f6b libsystem_pthread.dylib`thread_start + 15
thread #3
frame #0: 0x00007ff80a1943ea libsystem_kernel.dylib`__psynch_cvwait + 10
frame #1: 0x00007ff80a1cea6f libsystem_pthread.dylib`_pthread_cond_wait + 1249
frame #2: 0x00000001087c0873 _pulsar.cpython-37m-darwin.so`boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) + 483
frame #3: 0x00000001087c0481 _pulsar.cpython-37m-darwin.so`boost::asio::detail::scheduler::run(boost::system::error_code&) + 321
frame #4: 0x00000001087c0334 _pulsar.cpython-37m-darwin.so`boost::asio::detail::posix_thread::func<boost::asio::detail::resolver_service_base::work_scheduler_runner>::run() + 36
frame #5: 0x00000001087c02e0 _pulsar.cpython-37m-darwin.so`boost_asio_detail_posix_thread_function + 16
frame #6: 0x00007ff80a1ce4e1 libsystem_pthread.dylib`_pthread_start + 125
frame #7: 0x00007ff80a1c9f6b libsystem_pthread.dylib`thread_start + 15
thread #4
frame #0: 0x00007ff80a19634e libsystem_kernel.dylib`kevent + 10
frame #1: 0x00000001087bf587 _pulsar.cpython-37m-darwin.so`boost::asio::detail::kqueue_reactor::run(long, boost::asio::detail::op_queue<boost::asio::detail::scheduler_operation>&) + 327
frame #2: 0x00000001087c07b4 _pulsar.cpython-37m-darwin.so`boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) + 292
frame #3: 0x00000001087c0481 _pulsar.cpython-37m-darwin.so`boost::asio::detail::scheduler::run(boost::system::error_code&) + 321
frame #4: 0x0000000108866ca7 _pulsar.cpython-37m-darwin.so`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, pulsar::ExecutorService::start()::$_0> >(void*) + 119
frame #5: 0x00007ff80a1ce4e1 libsystem_pthread.dylib`_pthread_start + 125
frame #6: 0x00007ff80a1c9f6b libsystem_pthread.dylib`thread_start + 15
I'm not really a pybind/boost expert, but it looks to me like maybe PyBind11 is trying to acquire the GIL in a way that ends up causing deadlocks that didn't occur before PyBind11 was introduced.
Do you have a sense of what that deadlock may be caused by, and how to fix it?