-
Notifications
You must be signed in to change notification settings - Fork 58
fbgemm async complete cumsum op, jagged and dense conversion ops, jagged_dense_elementwise_add_jagged_output, reorder ops and permute_2d_sparse_data op #2065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev/sycl-free-func
Are you sure you want to change the base?
Conversation
Signed-off-by: Jiafu Zhang <[email protected]>
…_indices Signed-off-by: Jiafu Zhang <[email protected]>
@majing921201 @fengyuan14 @gujinghui , All necessary fbgemm ops are supported in xpu. please help review. |
const OptionalDeviceGuard device_guard(device_of(TENSOR)); | ||
|
||
Tensor asynchronous_complete_cumsum_xpu(const Tensor& t_in) { | ||
TORCH_CHECK(t_in.is_contiguous()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the limitation in XPU implementation aligned with CUDA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which limitation do you refer to?
src/ATen/native/xpu/FbgemmOps.cpp
Outdated
namespace { | ||
|
||
TORCH_LIBRARY(fbgemm, m) { | ||
m.def("asynchronous_complete_cumsum(Tensor t_in) -> Tensor"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there conflict here? when FBGEMM define the same symbol?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it could be a problem. I'll verify and figure it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As verified, fbgemm ops schema cannot be registered here. Otherwise, it'll fail fbgemm lib. And it's not proper adding fbgemm lib dep here. My solution is to add schema definition in test_fbgemm_ops_xpu.py, like below,
`lib = torch.library.Library("fbgemm", "DEF")
lib.define("asynchronous_complete_cumsum(Tensor t_in) -> Tensor")
...`
Then, I can run all fbgemm ops test successfully.
In user side, they need to 'import fbgemm_gpu' as normal. Then they can use these xpu ops. Considering the schemas unlikely changing, it should be good for us to use official fbgemm's schema.
"permute_2D_sparse_data(Tensor permute, Tensor lengths, Tensor indices, Tensor? weights=None, int? permuted_lengths_sum=None) -> (Tensor, Tensor, Tensor?)"); | ||
} | ||
|
||
TORCH_LIBRARY_IMPL(fbgemm, XPU, m) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use one TORCH_LIBRARY_IMPL
to contain all impl definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -0,0 +1,1122 @@ | |||
# Owner(s): ["module: intel"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not take CPU result as ref like PyTorch UT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, we need to install fbgemm cpu which is not desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you remove schema define as @fengyuan14 suggestion, then you must install fbgemm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct. But it also means our torch-xpu-ops repo has dependency to fbgemm. sounds good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my reply to yuanfeng's comments.
Signed-off-by: Jiafu Zhang <[email protected]>
Signed-off-by: Jiafu Zhang <[email protected]>
Signed-off-by: Jiafu Zhang <[email protected]>
supported below four fbgemm ops on xpu.
fbgemm::asynchronous_complete_cumsum
fbgemm::jagged_to_padded_dense_forward
fbgemm::jagged_to_padded_dense
fbgemm::dense_to_jagged_forward
fbgemm::jagged_dense_elementwise_add_jagged_output
Please make sure you have below env vars set correctly for running the UT.