Skip to content

Dead lock using SemaphoreSlim inside an akka actor #7835

@Yorie

Description

@Yorie

Version Information
Akka v1.5.49

Describe the bug
Hi, recently I was intended to write some sort of timer manager that relies on system timers. I've turn into use SemaphoreSlim as synchronization mechanism there. And while everything worked fine in pure tests, something went wrong when I started adding timers from the inside of an akka actor.

I found that Akka uses ActorTaskScheduler which has maxConcurrency=1. I can understand such design decision and would not argue on its restrictions in server applications. But according to my investigation, because some parts of semaphore code and continuations are executed on the ActorTaskScheduler, it leads to undefined behavior and semaphore dead lock.

I'm not very sure what to think about that issue. Would be glad to discuss it to make things clear. It may seem obvious that such limited task scheduler may dead lock a semaphore (note: I was not using things like classical .GetAwaiter().GetResult() which will deadlock for sure). But I think at least there is not enough information in the documentation (or, may be, I missed it?) that we could have issues with SemaphoreSlim, and probably with different locking primitives.

First of all, I will attach test example that dead locks below.
Also, I've tested the same code but used a very minimal task scheduler with limited concurrency. With max concurrency set to 1 code doesn't dead lock.

Also, what I see in test log:

Lock s1 by AddAsync/add. CurrentCount=0
Release s1 by AddAsync. CurrentCount=0
Lock s1 by ProcessAsync/process. CurrentCount=0
Release s1 by ProcessAsync. CurrentCount=0
Released s1 by ProcessAsync. CurrentCount=1
Released prev=0 s1 by AddAsync. CurrentCount=0

Usually, locking/unlocking should look like locked-unlocked-locked-unlocked.
But here some task continuations invoked directly from inside a SemaphoreSlim.Release() that's why we can see locked-locked-unlocked-unlocked (because test can write log only after call to Release is completed).

Number of locks and unlocks seems correct: 2 locks and 2 unlocks passed. But right after that semaphore goes into state where its CurrentCount=0 (number of remaining threads that can enter), and next semaphore.WaitAsync() dead locks.
Not sure what a continuation it executing last. It is hard to mine from the semaphore state: it has CurrentCount=0, not async head/tail continuations registered. Seems like it ran some continuation task and didn't went out to refresh CurrentCount property.

To Reproduce
Next test reproduces an error:

[Fact]
public async Task Test1()
{
	var system = ActorSystem.Create("test");
	var actor = system.ActorOf(Props.Create(() => new Actor()));
	var api = new Api();
	var r2 = new Runner(output, "r1", new State("s2", output), async () => await Task.Delay(500));
	var r1 = new Runner(output, "r2", new State("s1", output), async () => await api.AddAsync(r2));
	actor.Tell(new StartMessage(api, r1));

	try
	{
		var cts = new CancellationTokenSource(TimeSpan.FromSeconds(10));

		for (;;)
		{
			if (r1.Finished && r2.Finished)
				break;

			cts.Token.ThrowIfCancellationRequested();
			await Task.Delay(1000, cts.Token);
		}
	}
	finally
	{
		output.WriteLine($"R1 finished: {r1.Finished}"); // Should be true
		output.WriteLine($"R2 finished: {r2.Finished}"); // Should be true
	}
}

private record StartMessage(Api Api, Runner Runner);

private class Actor : ReceiveActor
{
	public Actor()
	{
		ReceiveAsync<StartMessage>(async msg =>
		{
			await msg.Api.AddAsync(msg.Runner);
		});
	}
}

private class Api
{
	public async Task AddAsync(Runner r)
	{
		await r.State.LockAsync("add");

		try
		{
			await Task.Delay(500);
			_ = ProcessAsync(r);
			// Replacing previous line with task run eliminates a dead lock.
			//_ = Task.Run(() => ProcessAsync(r));
		}
		finally
		{
			r.State.Release();
		}
	}

	private async Task ProcessAsync(Runner r)
	{
		await r.State.LockAsync("process");

		try
		{}
		finally
		{
			r.State.Release();
		}

		await r.RunAsync();
		
		// await r.State.LockAsync("process-finish");
		//
		// try
		// {}
		// finally
		// {
		// 	r.State.Release();
		// }
	}
}

private class Runner(ITestOutputHelper output, string name, State state, Func<Task> callback)
{
	public readonly State State = state;
	public bool Finished;

	public async Task RunAsync()
	{
		await callback();
		
		output.WriteLine($"Acquire lock in runner {name}. Lock CurrentCount={State.MutexAvailableThreads}");
		await State.LockAsync("run");
		output.WriteLine($"ACQUIRED lock in runner {name}");

		try
		{
			await Task.Delay(500);
		}
		finally
		{
			State.Release();
		}
		Finished = true;
	}
}

private class State(string name, ITestOutputHelper output)
{
	private readonly SemaphoreSlim _mutex = new(1, 1);
	public int MutexAvailableThreads => _mutex.CurrentCount;
	public string? LockedBy { get; private set; }

	public async Task LockAsync(string source, [CallerMemberName] string? caller = null)
	{
		await _mutex.WaitAsync();
		LockedBy = source;
		output.WriteLine($"Lock {name} by {caller}/{source}. CurrentCount={_mutex.CurrentCount}");
	}

	public void Release([CallerMemberName] string? caller = null)
	{
		output.WriteLine($"Release {name} by {caller}. CurrentCount={_mutex.CurrentCount}");
		int released = _mutex.Release();
		output.WriteLine($"Released prev={released} {name} by {caller}. CurrentCount={_mutex.CurrentCount}");
		LockedBy = null;
	}
}

Expected behavior
Code doesn't deadlock.

Actual behavior
Code dead locks.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment
Windows 11, .NET 9.

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions