Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions com.unity.ml-agents/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ and this project adheres to
- Removed several memory allocations that happened during inference. On a test scene, this
reduced the amount of memory allocated by approximately 25%. (#4887)
- Properly catch permission errors when writing timer files. (#4921)
- Unexpected gRPC exceptions during training are now logged before stopping training. If you see
"noisy" log, please let us know! (#4930)

#### ml-agents / ml-agents-envs / gym-unity (Python)
- Fixed a bug that would cause an exception when `RunOptions` was deserialized via `pickle`. (#4842)
Expand Down
28 changes: 27 additions & 1 deletion com.unity.ml-agents/Runtime/Communicator/RpcCommunicator.cs
Original file line number Diff line number Diff line change
Expand Up @@ -440,6 +440,7 @@ UnityInputProto Exchange(UnityOutputProto unityOutput)
{
return null;
}

try
{
var message = m_Client.Exchange(WrapMessage(unityOutput, 200));
Expand All @@ -455,8 +456,33 @@ UnityInputProto Exchange(UnityOutputProto unityOutput)
QuitCommandReceived?.Invoke();
return message.UnityInput;
}
catch
catch (RpcException rpcException)
{
// Log more verbose errors if they're something the user can possibly do something about.
switch (rpcException.Status.StatusCode)
{
case StatusCode.Unavailable:
// This can happen when python disconnects. Ignore it to avoid noisy logs.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happens roughly 20%-25% of the time when hitting Ctrl-C from python. Not exactly sure why it happens instead of the non-200 status (handled above).

break;
case StatusCode.ResourceExhausted:
// This happens is the message body is too large. There's no way to
// gracefully handle this, but at least we can show the message and the
// user can try to reduce the number of agents or observation sizes.
Debug.LogError($"GRPC Exception: {rpcException.Message}. Disconnecting from trainer.");
break;
default:
// Other unknown errors. Log at INFO level.
Debug.Log($"GRPC Exception: {rpcException.Message}. Disconnecting from trainer.");
break;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be convinced to log other exceptions here (maybe as Info instead of Error). That would at least help us for future unknown unknowns.

}
m_IsOpen = false;
QuitCommandReceived?.Invoke();
return null;
}
catch (Exception ex)
{
// Fall-through for other error types
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about logging here - I don't think we ever hit it, so it would be nice to know when we do, but it's also potentially noisy to users.

Debug.LogError($"GRPC Exception: {ex.Message}. Disconnecting from trainer.");
m_IsOpen = false;
QuitCommandReceived?.Invoke();
return null;
Expand Down