Skip to content

Conversation

thildenbrand
Copy link

The timer queues in the ACE_Proactor work on raw pointer of ACE_Handler. In our multicore environment this can lead to situations, where the timer element is already being worked on, while the ACE_Handler is destructed on a different core.

These situations need to be handled correctly by every single user of the timer system, which has proven itself to be error prone.

For example one core could execute ACE_Proactor_Handle_Timeout_Upcall::timeout and access the ACE_Handler::Proxy_Ptr via the ACE_Handler*, while the ACE_Handler was destructed concurrently on another core. This resulted in a double free of the memory managed via the ACE_Handler::Proxy_Ptr.

ACE_Handler::Proxy_Ptr is already a smart pointer and the situation could be improved by storing and ACE_Handler::Proxy_Ptr instead of an ACE_Handler* in the queues of the ACE_Proactor.

Applications could then savely delete the ACE_Handler, while the memory of the ACE_Handler::Proxy would still be valid. Centra application code could could then try to retrieve the ACE_Handler* in a thread-safe way and the code using the timer system could free ACE_Handler at any time, without having to care for potentially outstanding callbacks.

void reset ()
{
acquire ();
this->handler_ = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use a guard for the acquire/release, no need for public acquire/release, use nullptr,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use std::atomic instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered using a guard here, but I ran into problems, when doing the actual callback to the ACE_Handler for the timeout. A possible solution was to allow the code that actually calls the ACE_Handler to lock the mutex.
This results in only two possibilities. Either I get the timeout callback, or I got to set back the handler pointer to the nullptr.

ACE_Handler::Proxy *p = this->proxy_.get ();
if (p)
p->reset ();
deregister_callback ();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why move this to a new public method which is only used here?

Copy link
Author

@thildenbrand thildenbrand Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be able to use it from derived classes. It could therefore be made protected.
The order of operations in destructors would only execute this after the destructor of the derived class had already been executed, enabling callbacks to a semy-destructed object.

@jwillemsen
Copy link
Member

What about a test extension as reproducer?

this->handler_ = 0;
release ();
}
ACE_Handler *handler () { return this->handler_; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is reading itself thread safe?

Copy link
Author

@thildenbrand thildenbrand Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I added the acquire/release methods. Reading the pointer does not completely solve the problem, since I just end up with another raw pointer and the other core could again be in the process of destructing the object.

The timeout code looks something like this:

proxy->acquire();
if (proxy->handler() != nullptr)
   proxy->handler()->callback();
proxy->release();

I'm very open to a better solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is risky, keeping a lock during a callback. In taox11 we use shared/weak pointers

Copy link
Author

@thildenbrand thildenbrand Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess replacing the ACE_Handler* completely with some kind of shared_pointer could also solve the problem. I'd have to test it out with our code base.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be the bigger change for users, since it requires them to use shared_pointers to manage the life time of their ACE_Handlers, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the public API is pretty impossible given the amount of users there. Your current solution doesn't seem to be complete, so not something to merge.

A long time ago to ACE_Event_Handler reference counting was added in such a way that the user had to enable it, maybe something for ACE_Handler? It has been some time ago that I worked on the proactor and I haven't dived into your issue in detail, it was just some high level comments, so not sure whether that is the correct path forward. Any possibility you could solve this race condition in application code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding reference counting similar to ACE_Event_Handler sounds like a much better idea than mine.

@jwillemsen jwillemsen added the needs review Needs to be reviewed label Sep 19, 2025
@thildenbrand
Copy link
Author

What about a test extension as reproducer?

I'll have a look at the existing tests and try to add one to reproduce my scenario.

@jwillemsen jwillemsen marked this pull request as draft September 22, 2025 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs review Needs to be reviewed
Development

Successfully merging this pull request may close these issues.

2 participants