fbgemm async complete cumsum op, jagged and dense conversion ops, jagged_dense_elementwise_add_jagged_output, reorder ops and permute_2d_sparse_data op #2065

jiafuzha · 2025-09-18T01:32:29Z

supported below four fbgemm ops on xpu.

fbgemm::asynchronous_complete_cumsum
fbgemm::jagged_to_padded_dense_forward
fbgemm::jagged_to_padded_dense
fbgemm::dense_to_jagged_forward
fbgemm::jagged_dense_elementwise_add_jagged_output

Please make sure you have below env vars set correctly for running the UT.

# make sure ONEAPI_ROOT is set since it's referenced in umf's vars.sh. Otherwise, you may not able to see any device.
export ONEAPI_ROOT=.../intel/oneapi/
# DPCPP 2025.3
source .../DPCPP/env/vars.sh
source ~/intel/oneapi/mkl/latest/env/vars.sh
source .../pti_0.12/env/vars.sh
source .../umf/1.0.2/env/vars.sh
export BUILD_SEPARATE_OPS=ON
export BUILD_WITH_CPU=ON
export TORCH_XPU_ARCH_LIST='pvc'
export USE_PTI=ON
export USE_KINETO=ON
export USE_XETLA=OFF

Signed-off-by: Jiafu Zhang <[email protected]>

…_indices Signed-off-by: Jiafu Zhang <[email protected]>

jiafuzha · 2025-09-29T07:55:01Z

@majing921201 @fengyuan14 @gujinghui , All necessary fbgemm ops are supported in xpu. please help review.

fengyuan14 · 2025-09-29T08:05:15Z

src/ATen/native/xpu/FbgemmOps.cpp

+  const OptionalDeviceGuard device_guard(device_of(TENSOR));
+
+Tensor asynchronous_complete_cumsum_xpu(const Tensor& t_in) {
+  TORCH_CHECK(t_in.is_contiguous());


Is the limitation in XPU implementation aligned with CUDA?

which limitation do you refer to?

fengyuan14 · 2025-09-29T08:16:21Z

src/ATen/native/xpu/FbgemmOps.cpp

+namespace {
+
+TORCH_LIBRARY(fbgemm, m) {
+  m.def("asynchronous_complete_cumsum(Tensor t_in) -> Tensor");


Is there conflict here? when FBGEMM define the same symbol?

yes, it could be a problem. I'll verify and figure it out.

As verified, fbgemm ops schema cannot be registered here. Otherwise, it'll fail fbgemm lib. And it's not proper adding fbgemm lib dep here. My solution is to add schema definition in test_fbgemm_ops_xpu.py, like below,

`lib = torch.library.Library("fbgemm", "DEF")

lib.define("asynchronous_complete_cumsum(Tensor t_in) -> Tensor")
...`

Then, I can run all fbgemm ops test successfully.

In user side, they need to 'import fbgemm_gpu' as normal. Then they can use these xpu ops. Considering the schemas unlikely changing, it should be good for us to use official fbgemm's schema.

fengyuan14 · 2025-09-29T08:18:04Z

src/ATen/native/xpu/FbgemmOps.cpp

+      "permute_2D_sparse_data(Tensor permute, Tensor lengths, Tensor indices, Tensor? weights=None, int? permuted_lengths_sum=None) -> (Tensor, Tensor, Tensor?)");
+}
+
+TORCH_LIBRARY_IMPL(fbgemm, XPU, m) {


Please use one TORCH_LIBRARY_IMPL to contain all impl definitions.

majing921201 · 2025-09-29T08:45:28Z

test/xpu/test_fbgemm_ops_xpu.py

@@ -0,0 +1,1122 @@
+# Owner(s): ["module: intel"]


Why not take CPU result as ref like PyTorch UT ?

Then, we need to install fbgemm cpu which is not desired.

if you remove schema define as @fengyuan14 suggestion, then you must install fbgemm

correct. But it also means our torch-xpu-ops repo has dependency to fbgemm. sounds good?

see my reply to yuanfeng's comments.

Signed-off-by: Jiafu Zhang <[email protected]>

jiafuzha requested review from majing921201, fengyuan14 and gujinghui September 18, 2025 01:32

fbgemm async complete cumsum op, jagged and dense conversion ops

afb4b68

Signed-off-by: Jiafu Zhang <[email protected]>

jiafuzha changed the title ~~fbgemm async complete cumsum op, jagged and dense conversion ops~~ fbgemm async complete cumsum op, jagged and dense conversion ops, jagged_dense_elementwise_add_jagged_output op Sep 22, 2025

jiafuzha added 2 commits September 22, 2025 18:56

Merge branch 'dev/sycl-free-func' into free-func-fbgemm

493cc9c

add xpu support for jagged_dense_elementwise_add_jagged_output

37805cb

Signed-off-by: Jiafu Zhang <[email protected]>

add xpu support for reorder_batched_ad_lengths and reorder_batched_ad…

72d2df6

…_indices Signed-off-by: Jiafu Zhang <[email protected]>

fengyuan14 reviewed Sep 29, 2025

View reviewed changes

majing921201 reviewed Sep 29, 2025

View reviewed changes

jiafuzha added 3 commits September 30, 2025 00:49

add xpu support for permute_2D_sparse_data

a862291

Signed-off-by: Jiafu Zhang <[email protected]>

remove fbgemm ops schema registration

02e52a1

Signed-off-by: Jiafu Zhang <[email protected]>

remove fbgemm ops schema registration

3391a08

Signed-off-by: Jiafu Zhang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fbgemm async complete cumsum op, jagged and dense conversion ops, jagged_dense_elementwise_add_jagged_output, reorder ops and permute_2d_sparse_data op #2065

fbgemm async complete cumsum op, jagged and dense conversion ops, jagged_dense_elementwise_add_jagged_output, reorder ops and permute_2d_sparse_data op #2065

jiafuzha commented Sep 18, 2025 •

edited

Loading

Uh oh!

jiafuzha commented Sep 29, 2025

Uh oh!

fengyuan14 Sep 29, 2025

Uh oh!

jiafuzha Sep 29, 2025

Uh oh!

fengyuan14 Sep 29, 2025

Uh oh!

jiafuzha Sep 29, 2025

Uh oh!

jiafuzha Sep 30, 2025

Uh oh!

fengyuan14 Sep 29, 2025

Uh oh!

jiafuzha Sep 29, 2025

Uh oh!

jiafuzha Sep 30, 2025

Uh oh!

majing921201 Sep 29, 2025

Uh oh!

jiafuzha Sep 29, 2025

Uh oh!

majing921201 Sep 29, 2025

Uh oh!

jiafuzha Sep 29, 2025

Uh oh!

jiafuzha Sep 30, 2025

Uh oh!

Uh oh!

fbgemm async complete cumsum op, jagged and dense conversion ops, jagged_dense_elementwise_add_jagged_output, reorder ops and permute_2d_sparse_data op #2065

Are you sure you want to change the base?

fbgemm async complete cumsum op, jagged and dense conversion ops, jagged_dense_elementwise_add_jagged_output, reorder ops and permute_2d_sparse_data op #2065

Conversation

jiafuzha commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiafuzha commented Sep 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jiafuzha commented Sep 18, 2025 •

edited

Loading