[SCEV] Try to prove no-wrap for AddRecs via BTC. #131538

fhahn · 2025-03-16T19:30:23Z

#131281 exposed a case where
SCEV is not able to infer NSW for an AddRec, but constant folding in
SCEVExpander is able to determine the runtime check is always false
(i.e. no NSW).

This is caught by an assertion in LV, where we expand a runtime check
and the trip count expression, but the runtime check gets folded away.

For AddRecs with a step of 1, if Start + BTC >= Start, the AddRec is
treated as having NUW/NSW and won't add a wrap predicate.
https://alive2.llvm.org/ce/z/VnWwEN

This check can help determine NSW/NUW in a few more cases, but doing so
for all AddRecs has a noticeable compile time impact:
https://llvm-compile-time-tracker.com/compare.php?from=215c0d2b651dc757378209a3edaff1a130338dd8&to=cdd1c1d32c598d77b73a57bcc05c1383786b3ac4&stat=instructions:u

llvmbot · 2025-03-16T19:30:55Z

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

#131281 exposed a case where
SCEV is not able to infer NSW for an AddRec, but constant folding in
SCEVExpander is able to determine the runtime check is always false
(i.e. no NSW).

This is caught by an assertion in LV, where we expand a runtime check
and the trip count expression, but the runtime check gets folded away.

For AddRecs with a step of 1, if Start + BTC >= Start, the AddRec is
treated as having NUW/NSW and won't add a wrap predicate.
https://alive2.llvm.org/ce/z/VnWwEN

This check can help determine NSW/NUW in a few more cases, but doing so
for all AddRecs has a noticeable compile time impact:
https://llvm-compile-time-tracker.com/compare.php?from=215c0d2b651dc757378209a3edaff1a130338dd8&to=cdd1c1d32c598d77b73a57bcc05c1383786b3ac4&stat=instructions:u

I am not sure if there is a good general place where we could try to
refine wrap-flags in SCEV with logic like in the patch?

Fixes #131281.

Full diff: https://github.com/llvm/llvm-project/pull/131538.diff

2 Files Affected:

(modified) llvm/lib/Analysis/ScalarEvolution.cpp (+25)
(modified) llvm/test/Transforms/LoopVectorize/scev-predicate-reasoning.ll (+81)

diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index 8f74c1c398ced..6dbbcd008f59d 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -14775,6 +14775,29 @@ const SCEVPredicate *ScalarEvolution::getWrapPredicate(
 
 namespace {
 
+/// Return true if \p AR is known to not wrap via the loops backedge-taken count
+/// \p BTC.
+static bool proveNoWrapViaBTC(const SCEVAddRecExpr *AR,
+                              SCEVWrapPredicate::IncrementWrapFlags Pred,
+                              ScalarEvolution &SE) {
+  const Loop *L = AR->getLoop();
+  const SCEV *BTC = SE.getBackedgeTakenCount(L);
+  if (isa<SCEVCouldNotCompute>(BTC))
+    return false;
+  if (!match(AR->getStepRecurrence(SE), m_scev_One()) ||
+      AR->getType() != BTC->getType())
+    return false;
+  // AR has a step of 1, it is NSSW/NUSW if Start + BTC >= Start.
+  auto *Add = SE.getAddExpr(AR->getStart(), BTC);
+  assert((Pred == SCEVWrapPredicate::IncrementNSSW ||
+          Pred == SCEVWrapPredicate::IncrementNUSW) &&
+         "Unexpected predicate");
+  return SE.isKnownPredicate(Pred == SCEVWrapPredicate::IncrementNSSW
+                                 ? CmpInst::ICMP_SGE
+                                 : CmpInst::ICMP_UGE,
+                             Add, AR->getStart());
+}
+
 class SCEVPredicateRewriter : public SCEVRewriteVisitor<SCEVPredicateRewriter> {
 public:
 
@@ -14860,6 +14883,8 @@ class SCEVPredicateRewriter : public SCEVRewriteVisitor<SCEVPredicateRewriter> {
 
   bool addOverflowAssumption(const SCEVAddRecExpr *AR,
                              SCEVWrapPredicate::IncrementWrapFlags AddedFlags) {
+    if (proveNoWrapViaBTC(AR, AddedFlags, SE))
+      return true;
     auto *A = SE.getWrapPredicate(AR, AddedFlags);
     return addOverflowAssumption(A);
   }
diff --git a/llvm/test/Transforms/LoopVectorize/scev-predicate-reasoning.ll b/llvm/test/Transforms/LoopVectorize/scev-predicate-reasoning.ll
index 590cdd73e55f3..40c752bbaf4c8 100644
--- a/llvm/test/Transforms/LoopVectorize/scev-predicate-reasoning.ll
+++ b/llvm/test/Transforms/LoopVectorize/scev-predicate-reasoning.ll
@@ -241,3 +241,84 @@ loop:
 exit:
   ret void
 }
+
+declare i1 @cond()
+
+; Test case for https://github.com/llvm/llvm-project/issues/131281.
+; %add2 is known to not wrap via BTC.
+define void @no_signed_wrap_iv_via_btc(ptr %dst, i32 %N) mustprogress {
+; CHECK-LABEL: define void @no_signed_wrap_iv_via_btc
+; CHECK-SAME: (ptr [[DST:%.*]], i32 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[SUB:%.*]] = add i32 [[N]], -100
+; CHECK-NEXT:    [[SUB4:%.*]] = add i32 [[N]], -99
+; CHECK-NEXT:    [[TMP0:%.*]] = add i32 [[N]], 1
+; CHECK-NEXT:    [[SMAX:%.*]] = call i32 @llvm.smax.i32(i32 [[SUB4]], i32 [[TMP0]])
+; CHECK-NEXT:    [[TMP1:%.*]] = add i32 [[SMAX]], 100
+; CHECK-NEXT:    [[TMP2:%.*]] = sub i32 [[TMP1]], [[N]]
+; CHECK-NEXT:    br label [[OUTER:%.*]]
+; CHECK:       outer.loopexit:
+; CHECK-NEXT:    br label [[OUTER]]
+; CHECK:       outer:
+; CHECK-NEXT:    [[C:%.*]] = call i1 @cond()
+; CHECK-NEXT:    br i1 [[C]], label [[LOOP_PREHEADER:%.*]], label [[EXIT:%.*]]
+; CHECK:       loop.preheader:
+; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP2]], 4
+; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; CHECK:       vector.ph:
+; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i32 [[TMP2]], 4
+; CHECK-NEXT:    [[N_VEC:%.*]] = sub i32 [[TMP2]], [[N_MOD_VF]]
+; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
+; CHECK:       vector.body:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[TMP3:%.*]] = add i32 [[INDEX]], 0
+; CHECK-NEXT:    [[TMP4:%.*]] = add i32 [[SUB4]], [[TMP3]]
+; CHECK-NEXT:    [[TMP5:%.*]] = sext i32 [[TMP4]] to i64
+; CHECK-NEXT:    [[TMP6:%.*]] = getelementptr i32, ptr [[DST]], i64 [[TMP5]]
+; CHECK-NEXT:    [[TMP7:%.*]] = getelementptr i32, ptr [[TMP6]], i32 0
+; CHECK-NEXT:    store <4 x i32> zeroinitializer, ptr [[TMP7]], align 4
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 4
+; CHECK-NEXT:    [[TMP8:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT:    br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
+; CHECK:       middle.block:
+; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i32 [[TMP2]], [[N_VEC]]
+; CHECK-NEXT:    br i1 [[CMP_N]], label [[OUTER_LOOPEXIT:%.*]], label [[SCALAR_PH]]
+; CHECK:       scalar.ph:
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[LOOP_PREHEADER]] ]
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ [[INC:%.*]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
+; CHECK-NEXT:    [[ADD2:%.*]] = add i32 [[SUB4]], [[IV]]
+; CHECK-NEXT:    [[ADD_EXT:%.*]] = sext i32 [[ADD2]] to i64
+; CHECK-NEXT:    [[GEP_DST:%.*]] = getelementptr i32, ptr [[DST]], i64 [[ADD_EXT]]
+; CHECK-NEXT:    store i32 0, ptr [[GEP_DST]], align 4
+; CHECK-NEXT:    [[INC]] = add i32 [[IV]], 1
+; CHECK-NEXT:    [[ADD:%.*]] = add i32 [[SUB]], [[INC]]
+; CHECK-NEXT:    [[EC:%.*]] = icmp sgt i32 [[ADD]], [[N]]
+; CHECK-NEXT:    br i1 [[EC]], label [[OUTER_LOOPEXIT]], label [[LOOP]], !llvm.loop [[LOOP9:![0-9]+]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret void
+;
+entry:
+  %sub = add i32 %N, -100
+  %sub4 = add i32 %N, -99
+  br label %outer
+
+outer:
+  %c = call i1 @cond()
+  br i1 %c, label %loop, label %exit
+
+loop:
+  %iv = phi i32 [ 0, %outer ], [ %inc, %loop ]
+  %add2 = add i32 %sub4, %iv
+  %add.ext = sext i32 %add2 to i64
+  %gep.dst = getelementptr i32, ptr %dst, i64 %add.ext
+  store i32 0, ptr %gep.dst, align 4
+  %inc = add i32 %iv, 1
+  %add = add i32 %sub, %inc
+  %ec = icmp sgt i32 %add, %N
+  br i1 %ec, label %outer, label %loop
+
+exit:
+  ret void
+}

nikic

I don't think this is the correct way to fix the assertion failure. Generally speaking, we cannot assume that two different analyses will end up producing the same result. Adding extra checks for specific cases doesn't change that. We should make sure this kind of mismatch is always handled gracefully.

fhahn · 2025-03-17T21:32:51Z

I don't think this is the correct way to fix the assertion failure. Generally speaking, we cannot assume that two different analyses will end up producing the same result. Adding extra checks for specific cases doesn't change that. We should make sure this kind of mismatch is always handled gracefully.

Fair point, I pushed a simple fix to clean up earlier to not run into the assert. Having stricter assertions can sometimes be helpful to surface interesting gaps, that we then can improve though.

I'll see if this change can still be tested.

fhahn · 2025-03-30T20:51:41Z

Repurposed the PR to use the same logic and apply it at the same place as proveNoWrapViaConstantRanges.

Compile-time impact looks to be in the noise: https://llvm-compile-time-tracker.com/compare.php?from=fa5025b76034bcdd65d3a96eb29ae1edc18b876e&to=33e2e7b0f1afd69c2b92f976fb23c1146ad0302a&stat=instructions:u

ping :)

preames

You need to update the polly test failure.

There's a roughly analogous piece of code already in computeExitLimitFromICmp which is guarded by ControllingFiniteLoop. Unfortunately, I think that's needed. Imagine an induction variable which isn't used to control a loop exit. Take something simple like an i8 which wraps around repeatedly in a loop controlled by an separate i32 induction. I believe we can sometimes prove the BTC of the small induction variable (from the larger one), but that doesn't prevent the inner one from wrapping repeatedly. I think this case might be guarded by your widen type check, but if so, that needs to be pretty explicitly spelled out.

preames · 2025-04-01T16:26:05Z

llvm/lib/Analysis/ScalarEvolution.cpp

+  const Loop *L = AR->getLoop();
+  const SCEV *BTC = SE.getBackedgeTakenCount(L);
+  if (isa<SCEVCouldNotCompute>(BTC) ||
+      !match(AR->getStepRecurrence(SE), m_scev_One()))


You can use isOne here.

Thanks, updated

preames · 2025-04-01T16:31:20Z

llvm/lib/Analysis/ScalarEvolution.cpp

+    return Result;
+
+  // AR has a step of 1, it is NUW/NSW if Start + BTC >= Start.
+  auto *Add = SE.getAddExpr(AR->getStart(), SE.getNoopOrZeroExtend(BTC, WTy));


AR->evaluateAtIteration(BTC, SE)?

I left it as-is for now, in combination with a check that Start + BTC will not overflow (via willNotOverflow). For the more general evaluateAtIteration, I don't think we can easily check for overflow unfortunately.

fhahn

You need to update the polly test failure.

There's a roughly analogous piece of code already in computeExitLimitFromICmp which is guarded by ControllingFiniteLoop. Unfortunately, I think that's needed. Imagine an induction variable which isn't used to control a loop exit. Take something simple like an i8 which wraps around repeatedly in a loop controlled by an separate i32 induction. I believe we can sometimes prove the BTC of the small induction variable (from the larger one), but that doesn't prevent the inner one from wrapping repeatedly. I think this case might be guarded by your widen type check, but if so, that needs to be pretty explicitly spelled out.

Thanks, there should be no polly test failures in the latest version.

In the latest version, I added checks that Start + BTC won't overflow (via willNotOverflow) to correctly handle the case you mentioned, while also allowing to apply the logic to cases where the AddRec isn't controlling the loop exit.

llvm/test/Analysis/ScalarEvolution/different-loops-recs.ll in the original cases contained a a number of cases where the patch originally incorrectly inferred NUW, while Start+BTC could wrap.

fhahn · 2025-07-21T07:42:15Z

llvm/lib/Analysis/ScalarEvolution.cpp

+  const Loop *L = AR->getLoop();
+  const SCEV *BTC = SE.getBackedgeTakenCount(L);
+  if (isa<SCEVCouldNotCompute>(BTC) ||
+      !match(AR->getStepRecurrence(SE), m_scev_One()))


Thanks, updated

fhahn · 2025-07-21T07:44:23Z

llvm/lib/Analysis/ScalarEvolution.cpp

+    return Result;
+
+  // AR has a step of 1, it is NUW/NSW if Start + BTC >= Start.
+  auto *Add = SE.getAddExpr(AR->getStart(), SE.getNoopOrZeroExtend(BTC, WTy));


I left it as-is for now, in combination with a check that Start + BTC will not overflow (via willNotOverflow). For the more general evaluateAtIteration, I don't think we can easily check for overflow unfortunately.

fhahn

ping :)

nikic · 2025-07-22T15:18:35Z

llvm/lib/Analysis/ScalarEvolution.cpp

I don't really understand this condition. Doesn't willNotOverflow by itself already imply nuw? Why do we need an additional check that Start + BTC >= Start?

Yep, with step 1 the check isn't needed, removed.

There are a number of cases for which SCEV may not be able to prove a predicate will always be true/false, which may be simplified to a constant during expansion (see discussion in #131538). Bail out early if runtime checks are known to always fail, as the vector loop generated later will never execute.

There are a number of cases for which SCEV may not be able to prove a predicate will always be true/false, which may be simplified to a constant during expansion (see discussion in llvm/llvm-project#131538). Bail out early if runtime checks are known to always fail, as the vector loop generated later will never execute.

There are a number of cases for which SCEV may not be able to prove a predicate will always be true/false, which may be simplified to a constant during expansion (see discussion in llvm#131538). Bail out early if runtime checks are known to always fail, as the vector loop generated later will never execute.

nikic

Looking at the test diffs, it seems like the case where this helps in practice is if we have a -1 max const btc.

I feel like this case should really be handled by something more fundamental. The proveNoWrapViaConstantRanges() logic doesn't catch this because it checks whether adding the step to the range will not overflow -- in this case the range is full, but the code doesn't know that it won't actually wrap. Of course, the code that actually calculated that range (getRangeForAffineAR) does know that, but we lose the distinction between a full range because the addrec can hit all values and a full range because the calculation overflowed...

nikic · 2025-07-30T10:59:47Z

llvm/lib/Analysis/ScalarEvolution.cpp

Same here, shouldn't need the separate SGE check?

There's an issue here with willNotOverflow. When passing IsSigned=true, both input values will be sign-extended, so if the BTC is -1 we won't wrap, even though we should.

I think what would need to happen is for willNotOverflow to zero-extend the passed BTC. I removed handling for signed for now. I'll put up a follow-up with a test case if it is actually useful in practice.

Try to widen integer AddRecs to a type one bit wider to distinguish between the AddRec wrapping or just hitting all possible values. Alternative to llvm#131538. Note that now we can end up in the awkward situation that we fail to compute an unpredicated BTC on the first try, but succeed on the second try, because we now have a accurate max BTC. For now, I updated getPredicatedBackedgeTakencCount to remove cached BTC if that happens, but perhaps there's a better solution?

fhahn

Looking at the test diffs, it seems like the case where this helps in practice is if we have a -1 max const btc.

I feel like this case should really be handled by something more fundamental. The proveNoWrapViaConstantRanges() logic doesn't catch this because it checks whether adding the step to the range will not overflow -- in this case the range is full, but the code doesn't know that it won't actually wrap. Of course, the code that actually calculated that range (getRangeForAffineAR) does know that, but we lose the distinction between a full range because the addrec can hit all values and a full range because the calculation overflowed...

Yeah that's right. I tried to differentiate the cases by trying to evaluate the AddRec in a one-bit wider type: #151966

Not sure if I missed anything there

Similarly to llvm#131538, we can also try and check if a predicate is known to wrap given the backedge taken count. For now, this just checks directly when we try to create predicated AddRecs. This both helps to avoid spending compile-time on optimizations where we know the predicate is false, and can also help to allow additional vectorization (e.g. by deciding to scalarize memory accesses when otherwise we would try to create a predicated AddRec with a predicate that's always false). The initial version is quite restricted, but can be extended in follow-ups to cover more cases.

…1134) Similarly to #131538, we can also try and check if a predicate is known to wrap given the backedge taken count. For now, this just checks directly when we try to create predicated AddRecs. This both helps to avoid spending compile-time on optimizations where we know the predicate is false, and can also help to allow additional vectorization (e.g. by deciding to scalarize memory accesses when otherwise we would try to create a predicated AddRec with a predicate that's always false). The initial version is quite restricted, but can be extended in follow-ups to cover more cases. PR: #151134

…dRecs. (#151134) Similarly to llvm/llvm-project#131538, we can also try and check if a predicate is known to wrap given the backedge taken count. For now, this just checks directly when we try to create predicated AddRecs. This both helps to avoid spending compile-time on optimizations where we know the predicate is false, and can also help to allow additional vectorization (e.g. by deciding to scalarize memory accesses when otherwise we would try to create a predicated AddRec with a predicate that's always false). The initial version is quite restricted, but can be extended in follow-ups to cover more cases. PR: llvm/llvm-project#151134

fhahn requested a review from preames March 16, 2025 19:30

fhahn requested a review from nikic as a code owner March 16, 2025 19:30

llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Mar 16, 2025

fhahn mentioned this pull request Mar 16, 2025

[clang] Crash at -O2: Assertion `all_of(I->users()... "during expansion"' failed. #131281

Closed

nikic requested changes Mar 16, 2025

View reviewed changes

fhahn mentioned this pull request Mar 18, 2025

Task submission dtcxzyw/llvm-opt-benchmark#1312

Open

dtcxzyw mentioned this pull request Mar 19, 2025

pre-commit: PR131538 dtcxzyw/llvm-opt-benchmark#2214

Closed

fhahn changed the title ~~[SCEV] Check if AddRec doesn't wrap via BTC before adding predicate.~~ [SCEV] Try to prove no-wrap for AddRecs via BTC. Mar 30, 2025

fhahn force-pushed the scev-remove-unneccessary-predicate branch from a6f59ee to 3c2cee8 Compare March 30, 2025 20:50

fhahn force-pushed the scev-remove-unneccessary-predicate branch from 3c2cee8 to ae224f4 Compare March 30, 2025 21:01

preames reviewed Apr 1, 2025

View reviewed changes

fhahn force-pushed the scev-remove-unneccessary-predicate branch from ae224f4 to 8577415 Compare July 21, 2025 06:30

fhahn commented Jul 21, 2025

View reviewed changes

fhahn requested a review from efriedma-quic July 22, 2025 11:48

fhahn commented Jul 22, 2025

View reviewed changes

nikic reviewed Jul 22, 2025

View reviewed changes

fhahn force-pushed the scev-remove-unneccessary-predicate branch from 8577415 to dac87d4 Compare July 23, 2025 11:37

fhahn mentioned this pull request Jul 29, 2025

[SCEV] Check if predicate is known false for predicated AddRecs. #151134

Merged

nikic reviewed Jul 30, 2025

View reviewed changes

fhahn added 2 commits July 30, 2025 15:23

[SCEV] Try to prove no-wrap for AddRecs via BTC

d52fbb7

!fixup address latest comments, thanks

8676bce

fhahn added 2 commits July 30, 2025 15:23

!fixup remove check

126ac2d

!fixup remove signed case for now

5628fe4

fhahn force-pushed the scev-remove-unneccessary-predicate branch from dac87d4 to 5628fe4 Compare July 30, 2025 14:55

!fixup remove unused Add

db07e4a

fhahn mentioned this pull request Aug 4, 2025

[SCEV] Distinguish between full and wrapping AddRec in proveNoWrapViaCR. #151966

Open

fhahn commented Aug 4, 2025

View reviewed changes

[SCEV] Try to prove no-wrap for AddRecs via BTC. #131538

Are you sure you want to change the base?

[SCEV] Try to prove no-wrap for AddRecs via BTC. #131538

Uh oh!

Conversation

fhahn commented Mar 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

fhahn commented Mar 17, 2025

Uh oh!

fhahn commented Mar 30, 2025

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fhahn commented Mar 16, 2025 •

edited

Loading

llvmbot commented Mar 16, 2025 •

edited

Loading