-
Notifications
You must be signed in to change notification settings - Fork 454
Removed most s3 bucket based tests (replaced with UC Volumes) #3869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
removed s3 bucket usage formatted moved to read_only formatted hopefully works this should work hopefully cpu passes moved tests to gpu since they don't need cpu
Ah I think its because dist is initialized with gpu (i.e. nccl) rather than cpu (i.e gloo) when running the gpu tests. its probably not worth adjusting test set up to support this, and just let the test run on gpu instead of cpu |
I'm ok just adjusting the tolerance for 2.6 if thats sufficient for the test to pass. Not super critical to keep exact numerical determinism across torch versions - and in fact on GPU I'd guess this might not be possible. |
@dakinggg the daily tests pass when I use a separate checkpoint for 2.6 vs 2.7 |
ok, good enough for me. deterministically resuming a run from an older version of torch is a serious edge case. |
Reran daily tests to make sure with latest fixes: https://github.com/mosaicml/composer/actions/runs/15500553934 |
What does this PR do?
Essentially, all usages of
s3_bucket
has been removed outside oftest_object_store.py
andtest_s3_object_store.py
since we have the ephemeral and read_only paths in UC Volumes now.