Skip to content

Conversation

mgrabban
Copy link
Contributor

@mgrabban mgrabban commented Apr 2, 2025

Summary

Tuning on XPU: In cross-entropy, if device is xpu, set MAX_FUSED_SIZE to 4096 instead of default 65536 // 2. This gives slightly better performance on xpu.

Testing Done

  • Hardware Type: Intel(R) Data Center GPU Max 1550
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

@mgrabban mgrabban mentioned this pull request Apr 2, 2025
3 tasks
@mgrabban
Copy link
Contributor Author

mgrabban commented Apr 2, 2025

Very similar to #647

@shivam15s shivam15s merged commit bebe030 into linkedin:main Apr 2, 2025
4 of 8 checks passed
shivam15s pushed a commit that referenced this pull request Apr 2, 2025
## Summary
Tuning on XPU: In fused linear JSD, if device is xpu, set MAX_FUSED_SIZE
to 4096 instead of default 65536 // 2. This gives slightly better
performance on xpu.
Very similar to #645 

## Testing Done

- Hardware Type: Intel(R) Data Center GPU Max 1550
- [x] run `make test` to ensure correctness
- [x] run `make checkstyle` to ensure code style
- [x] run `make test-convergence` to ensure convergence
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants