Skip to content

Conversation

Flamefire
Copy link
Collaborator

The test may fail due to slightly different values caused by different order of matrizes in SGEMM:

Mismatched elements: 1 / 50 (2.0%)
Greatest absolute difference: 1.430511474609375e-05 at index (4, 5) (up to 1e-05 allowed)
Greatest relative difference: 4.65393206065873e-06 at index (4, 5) (up to 1.3e-06 allowed)

Observed on POWER (ppc64le)

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 6, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86365

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c4e94d5:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 6, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: Flamefire / name: Alexander Grund (c4e94d5)

Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock testing -- can we guard this decorator using active_if to only apply when running on POWER?

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 6, 2022
@Flamefire
Copy link
Collaborator Author

can we guard this decorator using active_if to only apply when running on POWER?

I don't think this is worth it as POWER may only be one platform where this fails and using other compilers, BLAS libs, etc may make it fail on others too.
Also the tolerance is not unreasonable:

  • (Existing) 'TestCommon.test_numpy_refs': tol(atol=1.3e-05, rtol=1.3e-05)
  • (Existing) 'TestConsistency.test_output_match': tol(atol=1e-5, rtol=1e-5)
  • (New) 'TestCommon.test_out':' tol(atol=1.5e-05, rtol=1e-05)

So the new tolerance is very close to that of the numpy reference test with the relative being even lower. So one could argue to make it tol(atol=1.5e-05, rtol=1.3e-05) for all 3 tests for consistency

The test may fail due to slightly different values caused by different order of matrizes in SGEMM:

> Mismatched elements: 1 / 50 (2.0%)
> Greatest absolute difference: 1.430511474609375e-05 at index (4, 5) (up to 1e-05 allowed)
> Greatest relative difference: 4.65393206065873e-06 at index (4, 5) (up to 1.3e-06 allowed)
@Flamefire
Copy link
Collaborator Author

Rebased and new CLA signed

@kit1980
Copy link
Contributor

kit1980 commented Nov 22, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@Flamefire Flamefire deleted the fix-baddbmm-prec branch November 23, 2022 10:33
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
The test may fail due to slightly different values caused by different order of matrizes in SGEMM:

> Mismatched elements: 1 / 50 (2.0%)
> Greatest absolute difference: 1.430511474609375e-05 at index (4, 5) (up to 1e-05 allowed)
> Greatest relative difference: 4.65393206065873e-06 at index (4, 5) (up to 1.3e-06 allowed)

Observed on POWER (ppc64le)

Pull Request resolved: pytorch#86365
Approved by: https://github.com/mruberry, https://github.com/kit1980
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request cla signed Merged open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants