Skip to content

Conversation

vacu9708
Copy link
Contributor

@vacu9708 vacu9708 commented May 18, 2025

Summary

This PR resolves issue #17965 where the same model produces different outputs on the LLVM (CPU) and CUDA (GPU) backends.

The problem lies with inverse trigonometric functions and MaxPool ops in this issue.

Changes

Two root causes were identified and addressed:

  1. Missing asin domain check
    The Taylor-series approximation of asin did not validate its input domain, allowing values outside [-1, 1] to silently produce numeric results.
    • Fix: Updated tir.asin to return quiet-NaN if the input is outside of [-1, 1].
  2. LLVM backend and CUDA backend follow different NaN policies for float max/min
    • LLVM: propagates NaNs (if any operand is NaN, the result is NaN)
    • CUDA: treats NaN as missing data as described in the CUDA API manual:
      • If one operand is NaN and the other is a number, choose the numeric value.
      • If both are NaN, return NaN.
    • Fix: Updated max/min ops on LLVM to match CUDA’s behavior.

Notes

  • I also tried to align the CUDA behavior with LLVM but CUDA does not appear to support the propagating-NaN behavior.
    • Regarding this, I'd appreciate maintainers' feedback on whether to keep these different NaN policies.
  • The lint check seems to have a bug. It flags the following as correct:
PrimExpr out_range = Or(x<lower, x> upper);

However, the correct forms would be either:

PrimExpr out_range = Or(x < lower, x > upper);

or

PrimExpr out_range = Or(x<lower, x>upper);

@vacu9708 vacu9708 force-pushed the asin_domain/pool_nan branch 20 times, most recently from 3bb7ef7 to 1856d9e Compare May 19, 2025 12:18
@vacu9708 vacu9708 changed the title [Codegen] Add asin domain check and align NaN‐handling in pooling with CUDA semantics [Codegen] Resolve issue #17965 where the same model produces different outputs on the LLVM (CPU) and CUDA (GPU) backends. May 20, 2025
@tqchen
Copy link
Member

tqchen commented May 23, 2025

For this particular case, it should be the issue of the input itself? I think it is fine for different platforms to have different Nan policies as long as they are efficient, given efficiency outweights other parts. The main thing we want to ensure is that the correct domain behavior is reasonable.

@vacu9708 vacu9708 force-pushed the asin_domain/pool_nan branch from 1856d9e to 39bf52c Compare May 24, 2025 13:26
@vacu9708
Copy link
Contributor Author

@tqchen Thanks for your feedback.
I removed the changes for the different NaN policies on both backends.
I maintained the changes for the domain limit check.

@vacu9708 vacu9708 force-pushed the asin_domain/pool_nan branch from 39bf52c to 6513b9c Compare May 24, 2025 13:42
- Update `tir.asin` to return quiet NaN if the input is outside of [-1, 1]
@vacu9708 vacu9708 force-pushed the asin_domain/pool_nan branch from 6513b9c to be88aea Compare May 24, 2025 13:49
@vacu9708 vacu9708 changed the title [Codegen] Resolve issue #17965 where the same model produces different outputs on the LLVM (CPU) and CUDA (GPU) backends. [Codegen] Resolve issue #17965 where the same model produces different outputs on the LLVM (CPU) and CUDA (GPU) backends May 24, 2025
@vacu9708
Copy link
Contributor Author

@tvm-bot rerun

@tqchen tqchen merged commit 8c9026d into apache:main Jun 2, 2025
11 checks passed
ShiboXing pushed a commit to ShiboXing/tvm that referenced this pull request Aug 10, 2025
…fferent outputs on the LLVM (CPU) and CUDA (GPU) backends (apache#17985)

[Codegen] Add asin domain check
- Update `tir.asin` to return quiet NaN if the input is outside of [-1, 1]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants