You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Signed-off-by: Vladimir Cherepanov <[email protected]>
Signed-off-by: Varun Thumbe <[email protected]>
[PyTorch] Disable determinism for sm100 (NVIDIA#2130)
* disable determinism for sm100+ and cudnn<9.14
Signed-off-by: Charlene Yang <[email protected]>
* fix remaining CI failures
Signed-off-by: Charlene Yang <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* revert some changes
Signed-off-by: Charlene Yang <[email protected]>
* revert more changes
Signed-off-by: Charlene Yang <[email protected]>
* remove sm100 from determinism table
Signed-off-by: Charlene Yang <[email protected]>
---------
Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Varun Thumbe <[email protected]>
[PyTorch] ONNX export of FP8 Current Scaling (NVIDIA#2068)
* Compute amax in normalization forward in current scaling in untuned kernels
Signed-off-by: Jan Bielak <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix
Signed-off-by: Pawel Gadzinski <[email protected]>
* fix
Signed-off-by: Pawel Gadzinski <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix
Signed-off-by: Pawel Gadzinski <[email protected]>
* code drop
Signed-off-by: Pawel Gadzinski <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix
Signed-off-by: Pawel Gadzinski <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix
Signed-off-by: Pawel Gadzinski <[email protected]>
* apply tims suggestions
Signed-off-by: Pawel Gadzinski <[email protected]>
---------
Signed-off-by: Jan Bielak <[email protected]>
Signed-off-by: Pawel Gadzinski <[email protected]>
Co-authored-by: Jan Bielak <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Varun Thumbe <[email protected]>
[PyTorch][MOE] Tentative Fix For Replacing from_blob with empty for experts receiving zero tokens (NVIDIA#2134)
use torch empty for empty shape instead of from_blob
Signed-off-by: zhongboz <[email protected]>
Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
Signed-off-by: Varun Thumbe <[email protected]>
build: pull cached wheels (NVIDIA#2127)
* build: pull cached wheels
Signed-off-by: oliver könig <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update setup.py
Signed-off-by: oliver könig <[email protected]>
---------
Signed-off-by: oliver könig <[email protected]>
Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
Signed-off-by: Varun Thumbe <[email protected]>
feat: Add support for multiple quantization modes in the UB communicators (NVIDIA#2043)
Signed-off-by: Varun Thumbe <[email protected]>
[Common] Add checks to CUDA kernel launch and CUDA API calls (NVIDIA#2074)
* add checks to cuda kernel launch and cuda API calls
Signed-off-by: Xin Yao <[email protected]>
* Remove exceptions from destructors
Signed-off-by: Tim Moon <[email protected]>
* fix weired dispatch in ln/rmsnorm
Signed-off-by: Xin Yao <[email protected]>
---------
Signed-off-by: Xin Yao <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Signed-off-by: Varun Thumbe <[email protected]>
[PyTorch] Support bf16+fp8 cudagraph (NVIDIA#2098)
* support bf16+fp8 model
Signed-off-by: Robin Zhang <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update
Signed-off-by: Robin Zhang <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update
Signed-off-by: Robin Zhang <[email protected]>
---------
Signed-off-by: Robin Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tim Moon <[email protected]>
Signed-off-by: Varun Thumbe <[email protected]>
Dropout with 8-bit RNG (NVIDIA#2014)
* Add dropout kernel with 8-bit RNG
Co-authored-by: Vasudevan Rengasamy <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix license
Signed-off-by: Tim Moon <[email protected]>
* Avoid ambiguous types
Signed-off-by: Tim Moon <[email protected]>
* Do not enforce dropout prob is representable in 8 bits
Signed-off-by: Tim Moon <[email protected]>
* Expand error message
Signed-off-by: Tim Moon <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix small statistical bug from using less-equal instead of less-than
Refactor kernel implementations and add comments. Interpret masks as bytes rather than 16-bit uints.
Signed-off-by: Tim Moon <[email protected]>
* Fix linter warning
Signed-off-by: Tim Moon <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Remove unnecessary helper function in PyTorch extensions
Signed-off-by: Tim Moon <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Varun Thumbe <[email protected]>
Create GPU reload buffers on main stream (NVIDIA#2131)
* Create GPU relaod buffers on main stream
Signed-off-by: Selvaraj Anandaraj <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fixed typo
Signed-off-by: Selvaraj Anandaraj <[email protected]>
* Fixed typo
Signed-off-by: Selvaraj Anandaraj <[email protected]>
---------
Signed-off-by: Selvaraj Anandaraj <[email protected]>
Signed-off-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Paweł Gadziński <[email protected]>
Signed-off-by: Varun Thumbe <[email protected]>
mxfp8 unfused quant support, refined unit test, remove unecessary quantization code
Signed-off-by: Varun Thumbe <[email protected]>
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Signed-off-by: Varun Thumbe <[email protected]>
missed a quant code removal
Signed-off-by: Varun Thumbe <[email protected]>
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Signed-off-by: Varun Thumbe <[email protected]>
minor bug fix
Signed-off-by: Varun Thumbe <[email protected]>
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Add cuBLASMp-backed GEMM-like API to TE common (NVIDIA#1824)
* Pick up cuBLASMp during build
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Saving...
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Change lib order to fix link error
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Saving...
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Context creation, incomplete...
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Test fixure
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Saving...
Signed-off-by: Vladimir Cherepanov <[email protected]>
* A sanity AgGemm test, failing...
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Saving...
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Fix axes
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Take care of uneven distribution
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Use MPI to get position of local matrices
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Refactor
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Refactor & fixes
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Saving...
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Gemm-RS
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Gemm-AR, not working...
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Fixes
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Setting all-reduce epilogue for gemm-ar
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Use supported shapes for GEMM-AR
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Tweak tolerance
Signed-off-by: Vladimir Cherepanov <[email protected]>
* First shot at fp8
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Use TensorHolder in tests
Signed-off-by: Vladimir Cherepanov <[email protected]>
* More test configs
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Support comm_sm_count
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Parametrize dtypes for A, B and D separately
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Tweak scaling
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Amax ptr
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Flags parity with cublas_gemm, saving...
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Cleanup
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Bias tests
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Fix bias test
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Aux, saving...
Signed-off-by: Vladimir Cherepanov <[email protected]>
* aux_ld
Signed-off-by: Vladimir Cherepanov <[email protected]>
* A fix
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Use test::Tensor
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Set scale inv
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Remove unsupported test configs
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Tweak tests
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Replace libcal with NCCL
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Add NVTX markers to API functions
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Tweak GemmAr tests
Signed-off-by: Vladimir Cherepanov <[email protected]>
* More test config
Signed-off-by: Vladimir Cherepanov <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Fix merge fallout
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Remove MPI dependency, comment API, add algo parameter
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Fix nvshmem dependency
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Fix nvshmem build
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Excluse CommGemm tests from L0_cppunittest
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Add cpp_distributed sh file for CI
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Adapt tp TensorAllocator
Signed-off-by: Vladimir Cherepanov <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Skip GemmAr test on unsupported HW
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Oversibscribe is needed on some clusters
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Fix incomplete libcal removal
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Move CI tests to L1
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Rename context to include NVTE prefix
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Remove leftover code
Signed-off-by: Vladimir Cherepanov <[email protected]>
* NVTE_WITH_CUBLASMP off by default
Signed-off-by: Vladimir Cherepanov <[email protected]>
* More detailed NVTE_CHECK diag
Signed-off-by: Vladimir Cherepanov <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Comment API
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Include stdbool header for legacy C compilers
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Remove now unused argument
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Abstract away cuBLASMp algo behind our own enum
Signed-off-by: Vladimir Cherepanov <[email protected]>
* More detailed shape diag messages
Signed-off-by: Vladimir Cherepanov <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update transformer_engine/common/include/transformer_engine/comm_gemm.h
Co-authored-by: Przemyslaw Tredak <[email protected]>
Signed-off-by: Vladimir Cherepanov <[email protected]>
* Add license
Signed-off-by: Vladimir Cherepanov <[email protected]>
---------
Signed-off-by: Vladimir Cherepanov <[email protected]>
Signed-off-by: Vladimir Cherepanov <[email protected]>
Co-authored-by: Vladimir Cherepanov <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Przemyslaw Tredak <[email protected]>
FP8 AllGather in FP8 GroupedGEMM + Fix Stream Usage Issue. (NVIDIA#2086)
* FP8 AllGather in FP8 GroupedGEMM
1. Support current scaling FP8 quantation with a given amax.
2. Support FP8 AG in fwd and BF16 RS in bwd.
3. The workflow is AR-max -> FP8 Quant -> FP8 AG -> FP8 GroupedGEMM.
Signed-off-by: Ming Huang <[email protected]>
* Slightly refactor
Signed-off-by: Ming Huang <[email protected]>
* Adding documents of new args.
Signed-off-by: Ming Huang <[email protected]>
* Adding unit-tests.
Signed-off-by: Ming Huang <[email protected]>
* Adding license.
Signed-off-by: Ming Huang <[email protected]>
* Move unit-tests to L1.
Signed-off-by: Ming Huang <[email protected]>
* Move quantizaer store/reset into FP8 only.
Signed-off-by: Ming Huang <[email protected]>
* Adding all layout support for Blackwell+
Signed-off-by: Ming Huang <[email protected]>
* Adopt the feedback from code-review.
Signed-off-by: Ming Huang <[email protected]>
* Fixed the wrong stream used by d2d in groupedGEMM FFI.
Signed-off-by: Ming Huang <[email protected]>
---------
Signed-off-by: Ming Huang <[email protected]>
Co-authored-by: Phuong Nguyen <[email protected]>
[JAX] Delay MeshResource validation until first usage (NVIDIA#2124)
Delay MeshResource validation until first usage
Signed-off-by: Jeremy Berchtold <[email protected]>
Co-authored-by: Phuong Nguyen <[email protected]>
[JAX] Decouple Recipe and ScalingMode (NVIDIA#1728)
* Decouple recipe and scaling mode
Signed-off-by: Jeremy Berchtold <[email protected]>
* Expose global QuantizeConfig instance as a getter
Signed-off-by: Jeremy Berchtold <[email protected]>
* Format and lint
Signed-off-by: Jeremy Berchtold <[email protected]>
* Merge branch 'main' into dev/jberchtold/jax-scaling-mode-and-recipe-decoupling
Signed-off-by: Jeremy Berchtold <[email protected]>
* Rename UsageType to TensorSource
Signed-off-by: Jeremy Berchtold <[email protected]>
* Update test_layer.py
Signed-off-by: Jeremy Berchtold <[email protected]>
---------
Signed-off-by: Jeremy Berchtold <[email protected]>
Signed-off-by: jberchtold-nvidia <[email protected]>
[JAX] `dot_1_output` sharding constraint + use AXIS_IS_UNSHARDED (NVIDIA#2128)
* add dot_1_output sharding constraint + use AXIS_IS_UNSHARDED
Signed-off-by: Phuong Nguyen <[email protected]>
---------
Signed-off-by: Phuong Nguyen <[email protected]>
[JAX] Add amax input to DBiasQuantizePrimitive and FFI (NVIDIA#2118)
* add amax input to DBiasQuantizePrimitive and FFI
Signed-off-by: Phuong Nguyen <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* make sure amax is init with zero
Signed-off-by: Phuong Nguyen <[email protected]>
* fix sharding rule
Signed-off-by: Phuong Nguyen <[email protected]>
---------
Signed-off-by: Phuong Nguyen <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Further relax constraints to cuDNN 9.13 for disabling fused attn for kv caching (NVIDIA#2121)
Signed-off-by: Kshitij Lakhani <[email protected]>
Temporarily remove comm_gemm tests (NVIDIA#2133)
Signed-off-by: Vladimir Cherepanov <[email protected]>
[PyTorch] Disable determinism for sm100 (NVIDIA#2130)
* disable determinism for sm100+ and cudnn<9.14
Signed-off-by: Charlene Yang <[email protected]>
* fix remaining CI failures
Signed-off-by: Charlene Yang <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* revert some changes
Signed-off-by: Charlene Yang <[email protected]>
* revert more changes
Signed-off-by: Charlene Yang <[email protected]>
* remove sm100 from determinism table
Signed-off-by: Charlene Yang <[email protected]>
---------
Signed-off-by: Charlene Yang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
[PyTorch] ONNX export of FP8 Current Scaling (NVIDIA#2068)
* Compute amax in normalization forward in current scaling in untuned kernels
Signed-off-by: Jan Bielak <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix
Signed-off-by: Pawel Gadzinski <[email protected]>
* fix
Signed-off-by: Pawel Gadzinski <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix
Signed-off-by: Pawel Gadzinski <[email protected]>
* code drop
Signed-off-by: Pawel Gadzinski <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix
Signed-off-by: Pawel Gadzinski <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix
Signed-off-by: Pawel Gadzinski <[email protected]>
* apply tims suggestions
Signed-off-by: Pawel Gadzinski <[email protected]>
---------
Signed-off-by: Jan Bielak <[email protected]>
Signed-off-by: Pawel Gadzinski <[email protected]>
Co-authored-by: Jan Bielak <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
[PyTorch][MOE] Tentative Fix For Replacing from_blob with empty for experts receiving zero tokens (NVIDIA#2134)
use torch empty for empty shape instead of from_blob
Signed-off-by: zhongboz <[email protected]>
Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
build: pull cached wheels (NVIDIA#2127)
* build: pull cached wheels
Signed-off-by: oliver könig <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update setup.py
Signed-off-by: oliver könig <[email protected]>
---------
Signed-off-by: oliver könig <[email protected]>
Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
feat: Add support for multiple quantization modes in the UB communicators (NVIDIA#2043)
[Common] Add checks to CUDA kernel launch and CUDA API calls (NVIDIA#2074)
* add checks to cuda kernel launch and cuda API calls
Signed-off-by: Xin Yao <[email protected]>
* Remove exceptions from destructors
Signed-off-by: Tim Moon <[email protected]>
* fix weired dispatch in ln/rmsnorm
Signed-off-by: Xin Yao <[email protected]>
---------
Signed-off-by: Xin Yao <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
[PyTorch] Support bf16+fp8 cudagraph (NVIDIA#2098)
* support bf16+fp8 model
Signed-off-by: Robin Zhang <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update
Signed-off-by: Robin Zhang <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* update
Signed-off-by: Robin Zhang <[email protected]>
---------
Signed-off-by: Robin Zhang <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tim Moon <[email protected]>
Dropout with 8-bit RNG (NVIDIA#2014)
* Add dropout kernel with 8-bit RNG
Co-authored-by: Vasudevan Rengasamy <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix license
Signed-off-by: Tim Moon <[email protected]>
* Avoid ambiguous types
Signed-off-by: Tim Moon <[email protected]>
* Do not enforce dropout prob is representable in 8 bits
Signed-off-by: Tim Moon <[email protected]>
* Expand error message
Signed-off-by: Tim Moon <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix small statistical bug from using less-equal instead of less-than
Refactor kernel implementations and add comments. Interpret masks as bytes rather than 16-bit uints.
Signed-off-by: Tim Moon <[email protected]>
* Fix linter warning
Signed-off-by: Tim Moon <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Remove unnecessary helper function in PyTorch extensions
Signed-off-by: Tim Moon <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Create GPU reload buffers on main stream (NVIDIA#2131)
* Create GPU relaod buffers on main stream
Signed-off-by: Selvaraj Anandaraj <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fixed typo
Signed-off-by: Selvaraj Anandaraj <[email protected]>
* Fixed typo
Signed-off-by: Selvaraj Anandaraj <[email protected]>
---------
Signed-off-by: Selvaraj Anandaraj <[email protected]>
Signed-off-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Selvaraj Anandaraj <[email protected]>
Co-authored-by: Paweł Gadziński <[email protected]>
0 commit comments