[FFI][FEAT] AutoDLPack for taking external tensor objects #17927

tqchen · 2025-05-07T15:29:37Z

This PR introduces autodlpack feature to the tvm ffi. When an ffi Function takes Tensor argument that conforms to DLPack it automatically imports into NDArray and pass as argument.

The feature will allow compiled function to directly take torch.Tensor as input argument without extra set of changes.

We also added a benchmark script to measure the overall ffi overhead. One thing to note is that there is still continuguous and alignment requirement that is needed by underlying DSL compiler, as of now we use a global value. So x.continugous is still needed before passing the argument if tranpose or other ops are performed.

tqchen · 2025-05-07T15:59:15Z

Benchmark

Env CPU: AMD Ryzen 9 7950X

> python ffi/scripts/benchmark_dlpack.py

-----------------------------
Benchmark f(x, y, z) overhead
-----------------------------
numpy.add                                1.921653747558594e-07 sec/call
torch.add[cpu]                           6.330013275146484e-07 sec/call
torch.add[cuda]                          2.330756187438965e-06 sec/call
tvm.ffi.nop                              3.983736038208008e-07 sec/call
tvm.ffi.nop+from_dlpack(torch)           4.368019104003906e-06 sec/call
tvm.ffi.nop+from_dlpack(numpy)           1.1694192886352538e-06 sec/call
tvm.ffi.nop+from_dlpack(tvm)             1.4580249786376954e-06 sec/call
tvm.ffi.nop+from_dlpack(torch.utils)     3.2754182815551756e-06 sec/call
tvm.ffi.nop.autodlpack(torch[cpu])       3.567361831665039e-06 sec/call
tvm.ffi.nop.autodlpack(torch[cuda])      3.5606861114501952e-06 sec/call
tvm.ffi.nop.autodlpack(numpy)            1.6696929931640624e-06 sec/call
-------------------------------
Benchmark x.__dlpack__ overhead
-------------------------------
torch.utils.dlpack.to_dlpack             4.5762062072753906e-07 sec/call
torch.__dlpack__                         9.840965270996094e-07 sec/call
numpy.__dlpack__                         5.011558532714844e-08 sec/call
tvm.__dlpack__                           1.5852451324462892e-07 sec/call
---------------------------------------------------
Benchmark x.__dlpack__(max_version=(1,1)) overhead
---------------------------------------------------
torch.__dlpack__(max_version=(1,1))      Tensor.__dlpack__() got an unexpected keyword 'max_version'
numpy.__dlpack__(max_version=(1,1))      6.172657012939454e-08 sec/call
tvm.__dlpack__(max_version=(1,1))        1.720428466796875e-07 sec/call

Discussions

First, we can see that the overall FFI overhead of python c++ is roughly at 0.2us -3us level
- Notably, each torch.add eager call in cuda is around 2.4 us
AutoDLPack as of now can get to about 3.6us for a call of f(x, y, z) that needs three import calls, which aligns reasonably well with the torch eager cuda overhead.
One can observe that torch.__dlpack__ overhead is larger than numpy.__dlpack__
- torch.__dlpack__ could use some improvement, tvm.__dlpack__ is based on a c++ impl and likely can provide some estimate on what can be done
AutoDLPack from numpy arguments have about 1.6us overhead

This PR introduces autodlpack feature to the tvm ffi. When an ffi Function takes Tensor argument that conforms to DLPack it automatically imports into NDArray and pass as argument. The feature will allow compiled function to directly take torch.Tensor as input argument without extra set of changes. When a function returns NDArray, the return value still needs to be converted back via torch.from_dlpack. However, a common use case is the destination passing, where all inputs outputs are pre-allocated and passed into the function. AutoDLPack effectively enables zero overhead support for a wide range of python arrays. We also added a benchmark script to measure the overall ffi overhead. One thing to note is that there is still continuguous and alignment requirement that is needed by underlying DSL compiler, as of now we use a global value. So x.continugous is still needed before passing the argument if tranpose or other ops are performed.

tqchen · 2025-05-07T16:27:11Z

cc @MasterJH5574 @Hzfengsy @LeiWang1999

[FFI][FEAT] AutoDLPack to enable external tensor args. This PR introduces autodlpack feature to the tvm ffi. When an ffi Function takes Tensor argument that conforms to DLPack it automatically imports into NDArray and pass as argument. The feature will allow compiled function to directly take torch.Tensor as input argument without extra set of changes. When a function returns NDArray, the return value still needs to be converted back via torch.from_dlpack. However, a common use case is the destination passing, where all inputs outputs are pre-allocated and passed into the function. AutoDLPack effectively enables zero overhead support for a wide range of python arrays. We also added a benchmark script to measure the overall ffi overhead. One thing to note is that there is still continuguous and alignment requirement that is needed by underlying DSL compiler, as of now we use a global value. So x.continugous is still needed before passing the argument if tranpose or other ops are performed.

tqchen changed the title ~~[FFI][FEAT] AutoDLPack for taking external tensor objects.~~ [FFI][FEAT] AutoDLPack for taking external tensor objects May 7, 2025

tqchen force-pushed the autodlpack branch from 5114b0c to d5ce386 Compare May 7, 2025 15:51

tqchen force-pushed the autodlpack branch from d5ce386 to 0e1cd88 Compare May 7, 2025 16:24

Hzfengsy approved these changes May 8, 2025

View reviewed changes

Hzfengsy merged commit da6d510 into apache:main May 8, 2025
13 checks passed

ysh329 mentioned this pull request Jul 16, 2025

[Release] v0.21.0 Release Candidate Notes #18150

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FFI][FEAT] AutoDLPack for taking external tensor objects #17927

[FFI][FEAT] AutoDLPack for taking external tensor objects #17927

Uh oh!

tqchen commented May 7, 2025

Uh oh!

tqchen commented May 7, 2025 •

edited

Loading

Uh oh!

tqchen commented May 7, 2025

Uh oh!

Uh oh!

Uh oh!

[FFI][FEAT] AutoDLPack for taking external tensor objects #17927

[FFI][FEAT] AutoDLPack for taking external tensor objects #17927

Uh oh!

Conversation

tqchen commented May 7, 2025

Uh oh!

tqchen commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark

Discussions

Uh oh!

tqchen commented May 7, 2025

Uh oh!

Uh oh!

Uh oh!

tqchen commented May 7, 2025 •

edited

Loading