[Codegen] Resolve issue #17965 where the same model produces different outputs on the LLVM (CPU) and CUDA (GPU) backends #17985

vacu9708 · 2025-05-18T18:11:01Z

Summary

This PR resolves issue #17965 where the same model produces different outputs on the LLVM (CPU) and CUDA (GPU) backends.

The problem lies with inverse trigonometric functions and MaxPool ops in this issue.

Changes

Two root causes were identified and addressed:

Missing asin domain check
The Taylor-series approximation of asin did not validate its input domain, allowing values outside [-1, 1] to silently produce numeric results.
- Fix: Updated tir.asin to return quiet-NaN if the input is outside of [-1, 1].
LLVM backend and CUDA backend follow different NaN policies for float max/min
- LLVM: propagates NaNs (if any operand is NaN, the result is NaN)
- CUDA: treats NaN as missing data as described in the CUDA API manual:
  - If one operand is NaN and the other is a number, choose the numeric value.
  - If both are NaN, return NaN.
- Fix: Updated max/min ops on LLVM to match CUDA’s behavior.

Notes

I also tried to align the CUDA behavior with LLVM but CUDA does not appear to support the propagating-NaN behavior.
- Regarding this, I'd appreciate maintainers' feedback on whether to keep these different NaN policies.
The lint check seems to have a bug. It flags the following as correct:

PrimExpr out_range = Or(x<lower, x> upper);

However, the correct forms would be either:

PrimExpr out_range = Or(x < lower, x > upper);

or

PrimExpr out_range = Or(x<lower, x>upper);

tqchen · 2025-05-23T17:55:45Z

For this particular case, it should be the issue of the input itself? I think it is fine for different platforms to have different Nan policies as long as they are efficient, given efficiency outweights other parts. The main thing we want to ensure is that the correct domain behavior is reasonable.

vacu9708 · 2025-05-24T13:28:34Z

@tqchen Thanks for your feedback.
I removed the changes for the different NaN policies on both backends.
I maintained the changes for the domain limit check.

- Update `tir.asin` to return quiet NaN if the input is outside of [-1, 1]

vacu9708 · 2025-05-24T15:36:11Z

@tvm-bot rerun

…fferent outputs on the LLVM (CPU) and CUDA (GPU) backends (apache#17985) [Codegen] Add asin domain check - Update `tir.asin` to return quiet NaN if the input is outside of [-1, 1]

vacu9708 force-pushed the asin_domain/pool_nan branch 20 times, most recently from 3bb7ef7 to 1856d9e Compare May 19, 2025 12:18

vacu9708 changed the title ~~[Codegen] Add asin domain check and align NaN‐handling in pooling with CUDA semantics~~ [Codegen] Resolve issue #17965 where the same model produces different outputs on the LLVM (CPU) and CUDA (GPU) backends. May 20, 2025

vacu9708 force-pushed the asin_domain/pool_nan branch from 1856d9e to 39bf52c Compare May 24, 2025 13:26

vacu9708 force-pushed the asin_domain/pool_nan branch from 39bf52c to 6513b9c Compare May 24, 2025 13:42

[Codegen] Add asin domain check

be88aea

- Update `tir.asin` to return quiet NaN if the input is outside of [-1, 1]

vacu9708 force-pushed the asin_domain/pool_nan branch from 6513b9c to be88aea Compare May 24, 2025 13:49

vacu9708 changed the title ~~[Codegen] Resolve issue #17965 where the same model produces different outputs on the LLVM (CPU) and CUDA (GPU) backends.~~ [Codegen] Resolve issue #17965 where the same model produces different outputs on the LLVM (CPU) and CUDA (GPU) backends May 24, 2025

vacu9708 mentioned this pull request May 26, 2025

[CI Problem] lint check has a bug #18017

Open

tqchen approved these changes Jun 2, 2025

View reviewed changes

tqchen merged commit 8c9026d into apache:main Jun 2, 2025
11 checks passed

ysh329 mentioned this pull request Jul 16, 2025

[Release] v0.21.0 Release Candidate Notes #18150

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Codegen] Resolve issue #17965 where the same model produces different outputs on the LLVM (CPU) and CUDA (GPU) backends #17985

[Codegen] Resolve issue #17965 where the same model produces different outputs on the LLVM (CPU) and CUDA (GPU) backends #17985

Uh oh!

vacu9708 commented May 18, 2025 •

edited

Loading

Uh oh!

tqchen commented May 23, 2025 •

edited

Loading

Uh oh!

vacu9708 commented May 24, 2025

Uh oh!

vacu9708 commented May 24, 2025

Uh oh!

Uh oh!

Uh oh!

[Codegen] Resolve issue #17965 where the same model produces different outputs on the LLVM (CPU) and CUDA (GPU) backends #17985

[Codegen] Resolve issue #17965 where the same model produces different outputs on the LLVM (CPU) and CUDA (GPU) backends #17985

Uh oh!

Conversation

vacu9708 commented May 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Notes

Uh oh!

tqchen commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vacu9708 commented May 24, 2025

Uh oh!

vacu9708 commented May 24, 2025

Uh oh!

Uh oh!

Uh oh!

vacu9708 commented May 18, 2025 •

edited

Loading

tqchen commented May 23, 2025 •

edited

Loading