Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Previous PR to upgrade EFA installer version switched which ARG we use to determine whether to install EFA, but this was not propagated to the build args in the github actions, so all images were built with EFA, which causes issues for RDMA on non AWS clusters.
Current image (broken):
rdma-broken-1-pWWCyG
New image from this PR (fixed):
rdma-fixed-1-ecrVG0
For the broken run, you can see a bunch of warnings in the logs
which are not present for the fixed run.
Previous action: https://github.com/mosaicml/composer/actions/runs/15005222089/job/42162037261
Action on this PR: https://github.com/mosaicml/composer/actions/runs/15127189449/job/42521298584?pr=3857
You can see in the logs that the previous action installed EFA even though it wasn't supposed to since that action is for a non AWS image
whereas in the new action the EFA installation is skipped
And you can also see that all the AWS docker build actions on this pr were fully cached, since nothing is changing for them.