-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[LV] Keep duplicate recipes in VPExpressionRecipe #156976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-vectorizers Author: Sam Tebbs (SamTebbs33) ChangesThe VPExpressionRecipe class uses a set to store its bundled recipes. If repeated recipes are bundled then the duplicates will be lost, causing the following recipes to not be at the expected place in the set. When printing a reduce.add(mul(ext, ext)) bundle, for example, if the extends are the same then the 3rd element of the set will be the reduction, rather than the expected mul, causing a cast error. With this change, the recipes are at the expected index in the set. Fixes #156464 Full diff: https://github.com/llvm/llvm-project/pull/156976.diff 3 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index bd1bee3a88887..f71cc09e3f990 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -29,6 +29,7 @@
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/SmallBitVector.h"
#include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Twine.h"
#include "llvm/ADT/ilist.h"
@@ -3003,8 +3004,11 @@ class VPExpressionRecipe : public VPSingleDefRecipe {
{Ext0, Ext1, Mul, Red}) {}
~VPExpressionRecipe() override {
- for (auto *R : reverse(ExpressionRecipes))
- delete R;
+ SmallSet<VPSingleDefRecipe *, 4> ExpressionRecipesSeen;
+ for (auto *R : reverse(ExpressionRecipes)) {
+ if (ExpressionRecipesSeen.insert(R).second)
+ delete R;
+ }
for (VPValue *T : LiveInPlaceholders)
delete T;
}
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 5f3503d0ce57a..1c743703a1a85 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -2740,9 +2740,8 @@ VPExpressionRecipe::VPExpressionRecipe(
ExpressionTypes ExpressionType,
ArrayRef<VPSingleDefRecipe *> ExpressionRecipes)
: VPSingleDefRecipe(VPDef::VPExpressionSC, {}, {}),
- ExpressionRecipes(SetVector<VPSingleDefRecipe *>(
- ExpressionRecipes.begin(), ExpressionRecipes.end())
- .takeVector()),
+ ExpressionRecipes(SmallVector<VPSingleDefRecipe *>(
+ ExpressionRecipes.begin(), ExpressionRecipes.end())),
ExpressionType(ExpressionType) {
assert(!ExpressionRecipes.empty() && "Nothing to combine?");
assert(
@@ -2776,25 +2775,43 @@ VPExpressionRecipe::VPExpressionRecipe(
R->removeFromParent();
}
+ // Keep track of how many instances of each recipe occur in the recipe list
+ SmallMapVector<VPSingleDefRecipe *, unsigned, 4> ExpressionRecipeCounts;
+ for (auto *R : ExpressionRecipes) {
+ auto *F = ExpressionRecipeCounts.find(R);
+ if (F == ExpressionRecipeCounts.end())
+ ExpressionRecipeCounts.insert(std::make_pair(R, 1));
+ else
+ F->second++;
+ }
+
// Internalize all external operands to the expression recipes. To do so,
// create new temporary VPValues for all operands defined by a recipe outside
// the expression. The original operands are added as operands of the
// VPExpressionRecipe itself.
for (auto *R : ExpressionRecipes) {
+ auto *F = ExpressionRecipeCounts.find(R);
+ F->second--;
for (const auto &[Idx, Op] : enumerate(R->operands())) {
auto *Def = Op->getDefiningRecipe();
if (Def && ExpressionRecipesAsSetOfUsers.contains(Def))
continue;
addOperand(Op);
- LiveInPlaceholders.push_back(new VPValue());
- R->setOperand(Idx, LiveInPlaceholders.back());
+ auto *Tmp = new VPValue();
+ Tmp->setUnderlyingValue(Op->getUnderlyingValue());
+ LiveInPlaceholders.push_back(Tmp);
+ // Only modify this recipe's operands if it's the last time it occurs in
+ // the recipe list
+ if (F->second == 0)
+ R->setOperand(Idx, Tmp);
}
}
}
void VPExpressionRecipe::decompose() {
for (auto *R : ExpressionRecipes)
- R->insertBefore(this);
+ if (!R->getParent())
+ R->insertBefore(this);
for (const auto &[Idx, Op] : enumerate(operands()))
LiveInPlaceholders[Idx]->replaceAllUsesWith(Op);
diff --git a/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll b/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll
index 2ffb8203d49dd..29167324c7a09 100644
--- a/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll
+++ b/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll
@@ -651,3 +651,50 @@ exit:
%r.0.lcssa = phi i64 [ %rdx.next, %loop ]
ret i64 %r.0.lcssa
}
+
+define i64 @print_mulacc_duplicate_extends(ptr nocapture readonly %x, ptr nocapture readonly %y, i32 %n) {
+; CHECK-LABEL: 'print_mulacc_duplicate_extends'
+; CHECK: VPlan 'Initial VPlan for VF={4},UF>=1' {
+; CHECK-NEXT: Live-in vp<[[VF:%.+]]> = VF
+; CHECK-NEXT: Live-in vp<[[VFxUF:%.+]]> = VF * UF
+; CHECK-NEXT: Live-in vp<[[VTC:%.+]]> = vector-trip-count
+; CHECK-NEXT: Live-in ir<%n> = original trip-count
+; CHECK-EMPTY:
+; CHECK: vector.ph:
+; CHECK-NEXT: EMIT vp<[[RDX_START:%.+]]> = reduction-start-vector ir<0>, ir<0>, ir<1>
+; CHECK-NEXT: Successor(s): vector loop
+; CHECK-EMPTY:
+; CHECK-NEXT: <x1> vector loop: {
+; CHECK-NEXT: vector.body:
+; CHECK-NEXT: EMIT vp<[[IV:%.+]]> = CANONICAL-INDUCTION ir<0>, vp<[[IV_NEXT:%.+]]>
+; CHECK-NEXT: WIDEN-REDUCTION-PHI ir<[[RDX:%.+]]> = phi vp<[[RDX_START]]>, vp<[[RDX_NEXT:%.+]]>
+; CHECK-NEXT: vp<[[STEPS:%.+]]> = SCALAR-STEPS vp<[[IV]]>, ir<1>
+; CHECK-NEXT: CLONE ir<[[ARRAYIDX0:%.+]]> = getelementptr inbounds ir<%x>, vp<[[STEPS]]>
+; CHECK-NEXT: vp<[[ADDR0:%.+]]> = vector-pointer ir<[[ARRAYIDX0]]>
+; CHECK-NEXT: WIDEN ir<[[LOAD0:%.+]]> = load vp<[[ADDR0]]>
+; CHECK-NEXT: EXPRESSION vp<[[RDX_NEXT:%.+]]> = ir<[[RDX]]> + reduce.sub (mul nsw (ir<[[LOAD0]]> sext to i64), (ir<[[LOAD0]]> sext to i64))
+; CHECK-NEXT: EMIT vp<[[IV_NEXT]]> = add nuw vp<[[IV]]>, vp<[[VFxUF]]>
+; CHECK-NEXT: EMIT branch-on-count vp<[[IV_NEXT]]>, vp<[[VTC]]>
+; CHECK-NEXT: No successors
+; CHECK-NEXT: }
+;
+entry:
+ br label %loop
+
+loop:
+ %iv = phi i32 [ %iv.next, %loop ], [ 0, %entry ]
+ %rdx = phi i64 [ %rdx.next, %loop ], [ 0, %entry ]
+ %arrayidx = getelementptr inbounds i16, ptr %x, i32 %iv
+ %load0 = load i16, ptr %arrayidx, align 4
+ %conv0 = sext i16 %load0 to i32
+ %mul = mul nsw i32 %conv0, %conv0
+ %conv = sext i32 %mul to i64
+ %rdx.next = sub nsw i64 %rdx, %conv
+ %iv.next = add nuw nsw i32 %iv, 1
+ %exitcond = icmp eq i32 %iv.next, %n
+ br i1 %exitcond, label %exit, label %loop
+
+exit:
+ %r.0.lcssa = phi i64 [ %rdx.next, %loop ]
+ ret i64 %r.0.lcssa
+}
|
@llvm/pr-subscribers-llvm-transforms Author: Sam Tebbs (SamTebbs33) ChangesThe VPExpressionRecipe class uses a set to store its bundled recipes. If repeated recipes are bundled then the duplicates will be lost, causing the following recipes to not be at the expected place in the set. When printing a reduce.add(mul(ext, ext)) bundle, for example, if the extends are the same then the 3rd element of the set will be the reduction, rather than the expected mul, causing a cast error. With this change, the recipes are at the expected index in the set. Fixes #156464 Full diff: https://github.com/llvm/llvm-project/pull/156976.diff 3 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index bd1bee3a88887..f71cc09e3f990 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -29,6 +29,7 @@
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/SmallBitVector.h"
#include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Twine.h"
#include "llvm/ADT/ilist.h"
@@ -3003,8 +3004,11 @@ class VPExpressionRecipe : public VPSingleDefRecipe {
{Ext0, Ext1, Mul, Red}) {}
~VPExpressionRecipe() override {
- for (auto *R : reverse(ExpressionRecipes))
- delete R;
+ SmallSet<VPSingleDefRecipe *, 4> ExpressionRecipesSeen;
+ for (auto *R : reverse(ExpressionRecipes)) {
+ if (ExpressionRecipesSeen.insert(R).second)
+ delete R;
+ }
for (VPValue *T : LiveInPlaceholders)
delete T;
}
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 5f3503d0ce57a..1c743703a1a85 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -2740,9 +2740,8 @@ VPExpressionRecipe::VPExpressionRecipe(
ExpressionTypes ExpressionType,
ArrayRef<VPSingleDefRecipe *> ExpressionRecipes)
: VPSingleDefRecipe(VPDef::VPExpressionSC, {}, {}),
- ExpressionRecipes(SetVector<VPSingleDefRecipe *>(
- ExpressionRecipes.begin(), ExpressionRecipes.end())
- .takeVector()),
+ ExpressionRecipes(SmallVector<VPSingleDefRecipe *>(
+ ExpressionRecipes.begin(), ExpressionRecipes.end())),
ExpressionType(ExpressionType) {
assert(!ExpressionRecipes.empty() && "Nothing to combine?");
assert(
@@ -2776,25 +2775,43 @@ VPExpressionRecipe::VPExpressionRecipe(
R->removeFromParent();
}
+ // Keep track of how many instances of each recipe occur in the recipe list
+ SmallMapVector<VPSingleDefRecipe *, unsigned, 4> ExpressionRecipeCounts;
+ for (auto *R : ExpressionRecipes) {
+ auto *F = ExpressionRecipeCounts.find(R);
+ if (F == ExpressionRecipeCounts.end())
+ ExpressionRecipeCounts.insert(std::make_pair(R, 1));
+ else
+ F->second++;
+ }
+
// Internalize all external operands to the expression recipes. To do so,
// create new temporary VPValues for all operands defined by a recipe outside
// the expression. The original operands are added as operands of the
// VPExpressionRecipe itself.
for (auto *R : ExpressionRecipes) {
+ auto *F = ExpressionRecipeCounts.find(R);
+ F->second--;
for (const auto &[Idx, Op] : enumerate(R->operands())) {
auto *Def = Op->getDefiningRecipe();
if (Def && ExpressionRecipesAsSetOfUsers.contains(Def))
continue;
addOperand(Op);
- LiveInPlaceholders.push_back(new VPValue());
- R->setOperand(Idx, LiveInPlaceholders.back());
+ auto *Tmp = new VPValue();
+ Tmp->setUnderlyingValue(Op->getUnderlyingValue());
+ LiveInPlaceholders.push_back(Tmp);
+ // Only modify this recipe's operands if it's the last time it occurs in
+ // the recipe list
+ if (F->second == 0)
+ R->setOperand(Idx, Tmp);
}
}
}
void VPExpressionRecipe::decompose() {
for (auto *R : ExpressionRecipes)
- R->insertBefore(this);
+ if (!R->getParent())
+ R->insertBefore(this);
for (const auto &[Idx, Op] : enumerate(operands()))
LiveInPlaceholders[Idx]->replaceAllUsesWith(Op);
diff --git a/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll b/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll
index 2ffb8203d49dd..29167324c7a09 100644
--- a/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll
+++ b/llvm/test/Transforms/LoopVectorize/vplan-printing-reductions.ll
@@ -651,3 +651,50 @@ exit:
%r.0.lcssa = phi i64 [ %rdx.next, %loop ]
ret i64 %r.0.lcssa
}
+
+define i64 @print_mulacc_duplicate_extends(ptr nocapture readonly %x, ptr nocapture readonly %y, i32 %n) {
+; CHECK-LABEL: 'print_mulacc_duplicate_extends'
+; CHECK: VPlan 'Initial VPlan for VF={4},UF>=1' {
+; CHECK-NEXT: Live-in vp<[[VF:%.+]]> = VF
+; CHECK-NEXT: Live-in vp<[[VFxUF:%.+]]> = VF * UF
+; CHECK-NEXT: Live-in vp<[[VTC:%.+]]> = vector-trip-count
+; CHECK-NEXT: Live-in ir<%n> = original trip-count
+; CHECK-EMPTY:
+; CHECK: vector.ph:
+; CHECK-NEXT: EMIT vp<[[RDX_START:%.+]]> = reduction-start-vector ir<0>, ir<0>, ir<1>
+; CHECK-NEXT: Successor(s): vector loop
+; CHECK-EMPTY:
+; CHECK-NEXT: <x1> vector loop: {
+; CHECK-NEXT: vector.body:
+; CHECK-NEXT: EMIT vp<[[IV:%.+]]> = CANONICAL-INDUCTION ir<0>, vp<[[IV_NEXT:%.+]]>
+; CHECK-NEXT: WIDEN-REDUCTION-PHI ir<[[RDX:%.+]]> = phi vp<[[RDX_START]]>, vp<[[RDX_NEXT:%.+]]>
+; CHECK-NEXT: vp<[[STEPS:%.+]]> = SCALAR-STEPS vp<[[IV]]>, ir<1>
+; CHECK-NEXT: CLONE ir<[[ARRAYIDX0:%.+]]> = getelementptr inbounds ir<%x>, vp<[[STEPS]]>
+; CHECK-NEXT: vp<[[ADDR0:%.+]]> = vector-pointer ir<[[ARRAYIDX0]]>
+; CHECK-NEXT: WIDEN ir<[[LOAD0:%.+]]> = load vp<[[ADDR0]]>
+; CHECK-NEXT: EXPRESSION vp<[[RDX_NEXT:%.+]]> = ir<[[RDX]]> + reduce.sub (mul nsw (ir<[[LOAD0]]> sext to i64), (ir<[[LOAD0]]> sext to i64))
+; CHECK-NEXT: EMIT vp<[[IV_NEXT]]> = add nuw vp<[[IV]]>, vp<[[VFxUF]]>
+; CHECK-NEXT: EMIT branch-on-count vp<[[IV_NEXT]]>, vp<[[VTC]]>
+; CHECK-NEXT: No successors
+; CHECK-NEXT: }
+;
+entry:
+ br label %loop
+
+loop:
+ %iv = phi i32 [ %iv.next, %loop ], [ 0, %entry ]
+ %rdx = phi i64 [ %rdx.next, %loop ], [ 0, %entry ]
+ %arrayidx = getelementptr inbounds i16, ptr %x, i32 %iv
+ %load0 = load i16, ptr %arrayidx, align 4
+ %conv0 = sext i16 %load0 to i32
+ %mul = mul nsw i32 %conv0, %conv0
+ %conv = sext i32 %mul to i64
+ %rdx.next = sub nsw i64 %rdx, %conv
+ %iv.next = add nuw nsw i32 %iv, 1
+ %exitcond = icmp eq i32 %iv.next, %n
+ br i1 %exitcond, label %exit, label %loop
+
+exit:
+ %r.0.lcssa = phi i64 [ %rdx.next, %loop ]
+ ret i64 %r.0.lcssa
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this only impacts printing, right? It seems like it would be simpler to just handle the case where 1 extend is shared when picking the multiply recipe, by checking if ExpressionRecipes[1]
is an extend or widen recipe?
auto *Mul = cast<VPWidenRecipe>(IsExtended ? ExpressionRecipes[2]
: ExpressionRecipes[0]);
Tmp->setUnderlyingValue(Op->getUnderlyingValue()); | ||
LiveInPlaceholders.push_back(Tmp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to set the underlying value now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was needed to fix a crash early on but I don't think it's needed now so I've removed it, thanks.
I think it's best to get the vector of recipes correct from the start so that we don't need to account for unexpectedness in other places. This could theoretically happen for future bundle types too and we don't want lots of checks when we can just get it right from the start of the bundle's existence. |
e6c89c3
to
2300506
Compare
} | ||
|
||
// Keep track of how many instances of each recipe occur in the recipe list | ||
SmallMapVector<VPSingleDefRecipe *, unsigned, 4> ExpressionRecipeCounts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried about a slightly different case, where the same (external) input operand is shared by different expression recipes. I don't think there would be an easy way to test that at the moment, but I can see that happening with other kinds of expressions in the future.
if (F->second == 0) | ||
R->setOperand(Idx, Tmp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you were to keep a map of Op -> new VPValue()
nodes, then you can just do:
LiveInPlaceholders.push_back(new VPValue());
MyMap[Op] = LiveInPlaceholders.back();
and then have a loop at the end to replace all values of Op
by their LiveInPlaceholder (i.e. new VPValue()
) values for each R
in ExpressionRecipes
.
That would also fix the potential issue I mentioned above, where same input operand is shared by multiple recipes passed to the expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's nice. It also means we don't have to keep track of the recipe counts. Done.
delete R; | ||
SmallSet<VPSingleDefRecipe *, 4> ExpressionRecipesSeen; | ||
for (auto *R : reverse(ExpressionRecipes)) { | ||
if (ExpressionRecipesSeen.insert(R).second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the comment for SmallVector<VPSingleDefRecipe *> ExpressionRecipes
(line 2973) that this vector may contain duplicates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I did add it but must not have staged it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this only impacts printing, right? It seems like it would be simpler to just handle the case where 1 extend is shared when picking the multiply recipe, by checking if
ExpressionRecipes[1]
is an extend or widen recipe?auto *Mul = cast<VPWidenRecipe>(IsExtended ? ExpressionRecipes[2] : ExpressionRecipes[0]);
I think it's best to get the vector of recipes correct from the start so that we don't need to account for unexpectedness in other places. This could theoretically happen for future bundle types too and we don't want lots of checks when we can just get it right from the start of the bundle's existence.
But it looks like handling duplicates requires quite a bit of extra complexity spread throughout; ideally there would be no need to look up the individual entries in the expression list.
At the moment this just impacts printing. IMO it would be better to just have a generic print function that does not inspect the entries directly, but instead prints something like
EXPRESSION vp<%x> = ExtendedReduction operands {
bundled recipes...
}
Sorry, it doesn't just impact printing, but anything that wants to analyse the recipes at all, such as bundled partial reductions: https://github.com/llvm/llvm-project/pull/147302/files#diff-34abe4c3cd34aa7a9664bbd204834248455635ba80b8a9ba9506d8c3e6b94d95R2878 I don't quite see why it's bad to get the recipe list correct from the very beginning? Surely that's worth a little bit of complexity in the constructor? |
At the moment only the destructor needs to know that |
This PR adds the ExtNegatedMulAccReduction expression type for VPExpressionRecipe so that extend-multiply-accumulate reductions with a negated multiply can be bundled. Stacked PRs: 1. llvm#156976 2. -> This 3. llvm#147302
But can't this also be avoided naturally, by just getting the needed extend from the reduction/result recipe which is the root of the pattern, instead of relying on a specific order of operands? In the future, the operands to a reduction pattern could also contain live-ins (e.g. constant), and in that case there would be be no recipe at all.
I guess it depends on what 'correct' means here. Currently it is simply the list of recipes that are bundled together.
Besides the descructor, decomposition and construction also need extra complexity. Granted it is not that much, but if looking up the required information from the root instruction works as well that may allow us to keep the generic code simpler. |
The problem with checking the root instruction is that you'll have to repeat the pattern matching that was already done in the transform pass. Whereas with this solution you can rely on them being in the recipe list so can fetch them from there, without re-doing any work. With Sander's placeholder caching suggestion, the code has become a lot simpler so I hope that's acceptable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the descructor, decomposition and construction also need extra complexity. Granted it is not that much, but if looking up the required information from the root instruction works as well that may allow us to keep the generic code simpler.
I guess if you're reasoning from the idea that we'll only ever have this limited list of VPexpressions, then I guess this adds (marginally) more logic. But if we'd add a new expression that takes 4 external operands where expression recipes 0 and 1 could be the same, 1 and 2 could be the same, or 2 or 3 could be the same, or any other combination, then it would all get very confusing to figure out what the meaning of the values in ExpressionRecipe
is. To me it therefore makes more sense for a VPExpression operation to always have the expected number of external operands and to take a little bit of care when needing to map that back to the expressions when decomposing in case there was a duplicate.
delete R; | ||
SmallSet<VPSingleDefRecipe *, 4> ExpressionRecipesSeen; | ||
for (auto *R : reverse(ExpressionRecipes)) { | ||
if (ExpressionRecipesSeen.insert(R).second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not done?
for (auto *R : ExpressionRecipes) { | ||
for (const auto &[Idx, Op] : enumerate(R->operands())) { | ||
auto *Def = Op->getDefiningRecipe(); | ||
if (Def && ExpressionRecipesAsSetOfUsers.contains(Def)) | ||
continue; | ||
addOperand(Op); | ||
LiveInPlaceholders.push_back(new VPValue()); | ||
R->setOperand(Idx, LiveInPlaceholders.back()); | ||
if (OperandPlaceholders.find(Op) == OperandPlaceholders.end()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to check for duplicates, you can just do:
LiveInPlaceholders.push_back(new VPValue());
OperandPlaceholders[Op] = LiveInPlaceholders.back();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, but I've made a variable for the placeholder so we don't have to call back()
.
for (auto *R : ExpressionRecipes) { | ||
for (const auto &[Idx, Op] : enumerate(R->operands())) { | ||
auto *Entry = OperandPlaceholders.find(Op); | ||
if (Entry != OperandPlaceholders.end()) | ||
R->setOperand(Idx, Entry->second); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just be:
for (auto *R : ExpressionRecipes) { | |
for (const auto &[Idx, Op] : enumerate(R->operands())) { | |
auto *Entry = OperandPlaceholders.find(Op); | |
if (Entry != OperandPlaceholders.end()) | |
R->setOperand(Idx, Entry->second); | |
} | |
} | |
for (auto *R : ExpressionRecipes) | |
for (auto &KV : OperandPlaceholders) | |
R->replaceUsesOfWith(KV.first, KV.second); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better, thanks.
} | ||
SmallSet<VPValue *, 4> PlaceholdersSeen; | ||
for (VPValue *T : LiveInPlaceholders) { | ||
if (PlaceholdersSeen.insert(T).second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to check for duplicates, there is a placeholder value for each operand, regardless of whether they're duplicate or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently I'm caching the placeholder for each operand so that duplicate operands share the same placeholder, so we do need to care about duplicates here to avoid deleting placeholders that have already been deleted. But I've changed it so that the same operands don't share a placeholder.
} | ||
} | ||
} | ||
|
||
void VPExpressionRecipe::decompose() { | ||
for (auto *R : ExpressionRecipes) | ||
R->insertBefore(this); | ||
if (!R->getParent()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a comment explaining why the !R->getParent()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks.
I would like to make sure the code we end up with is flexible enough to handle future use cases w/o many changes. One case I am wondering about is how we will handle cases where an interesting operand can be a recipe or a constant/live-in. At the moment, it looks like all extends must be recipes, but they could also be constants that can be treated as extends in some case. If we want to support them, would be back to needing to check the root instruction? E.g. scaled reductions could have constant operands (#161092) and I would assume the same also holds for some of the other reduction patterns? |
I agree with making the recipe flexible for future cases, but I don't see anything in this PR that would prevent it working with non-recipe extends. I think that's more of an issue with the constructor interface, so if we do end up making changes to accommodate for non-recipe extends then I don't think this PR will be making that work any harder. Pattern-matching the root instruction would certainly be a way to make the recipe very generic and flexible but it would of course introduce a lot of extra logic that we get for free right now. EDIT: I tried rebasing on top of #161092 and all of the LV tests still pass. |
This PR adds the ExtNegatedMulAccReduction expression type for VPExpressionRecipe so that extend-multiply-accumulate reductions with a negated multiply can be bundled. Stacked PRs: 1. llvm#156976 2. -> This 3. llvm#147302
If we'd add a new VPExpression kind for that case, then there's no need to change the VPExpression class or change the rest of the mechanism. That case could be supported quite easily with the help of this PR (I did an experiment with this locally to handle the case in #161092). A new constructor for that case is needed anyway and having it as a new expression kind makes it clear to handle these slightly differently (e.g. there's no need to query the opcode of the constant/live-in value). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with making the recipe flexible for future cases, but I don't see anything in this PR that would prevent it working with non-recipe extends. I think that's more of an issue with the constructor interface, so if we do end up making changes to accommodate for non-recipe extends then I don't think this PR will be making that work any harder. Pattern-matching the root instruction would certainly be a way to make the recipe very generic and flexible but it would of course introduce a lot of extra logic that we get for free right now.
I am not sure the pattern matching would introduce a lot of extra logic. Here's what I was thinking: we can get the multiply directly from the reduction recipe (since we need to look it up anyway). We know which operand it is and we can then get the casts from its operands. In this case, it work be even simpler, as we don't need to check IsExtended
to get the multiply.
@@ -2900,14 +2900,13 @@ void VPExpressionRecipe::print(raw_ostream &O, const Twine &Indent,
<< " (";
O << "mul";
bool IsExtended = ExpressionType == ExpressionTypes::ExtMulAccReduction;
- auto *Mul = cast<VPWidenRecipe>(IsExtended ? ExpressionRecipes[2]
- : ExpressionRecipes[0]);
+ auto *Mul = cast<VPWidenRecipe>(ExpressionRecipes.back()->getOperand(1));
Mul->printFlags(O);
if (IsExtended)
O << "(";
getOperand(0)->printAsOperand(O, SlotTracker);
if (IsExtended) {
- auto *Ext0 = cast<VPWidenCastRecipe>(ExpressionRecipes[0]);
+ auto *Ext0 = cast<VPWidenCastRecipe>(Mul->getOperand(0));
O << " " << Instruction::getOpcodeName(Ext0->getOpcode()) << " to "
<< *Ext0->getResultType() << "), (";
} else {
@@ -2915,7 +2914,7 @@ void VPExpressionRecipe::print(raw_ostream &O, const Twine &Indent,
}
getOperand(1)->printAsOperand(O, SlotTracker);
if (IsExtended) {
- auto *Ext1 = cast<VPWidenCastRecipe>(ExpressionRecipes[1]);
+ auto *Ext1 = cast<VPWidenCastRecipe>(Mul->getOperand(1));
O << " " << Instruction::getOpcodeName(Ext1->getOpcode()) << " to "
<< *Ext1->getResultType() << ")";
}
The above doesn't quite work as expected at the moment unfortunately, because we currently end up with inconsistent operands for the VPExpressionRecipe if there are duplicates, which gets fixed by doing the replacement separately as in this patch. This change is quite subtle and the mapping of external operands to LiveInPlaceholders should probably be clarified/documented somewhere.
For the case in https://github.com/llvm/llvm-project/pull/147302/files#diff-34abe4c3cd34aa7a9664bbd204834248455635ba80b8a9ba9506d8c3e6b94d95R2878 we would need an extra check if the expression is negated, and depending on that use either the second-to-last or third-to-last recipe in the list.
As we need the operand remapping change in any case, the remaining changes seem minor. I still think the only reason they make some cases slighty more convenient is because we are missing support for extended reductions with constant operands, and once we support those we will need to look up the multiply w/o relying on the number of cast recipes anyways. But if you feel that it makes life easier, keeping duplicates is fine with me for now.
EDIT: I tried rebasing on top of #161092 and all of the LV tests still pass.
That's because we don't create VPExpressionRecipes for partial reductions yet I think.
} | ||
} | ||
|
||
for (auto *R : ExpressionRecipes) | ||
for (auto &Entry : OperandPlaceholders) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need the map vector here? Can you do something like
for (auto &Entry : OperandPlaceholders) | |
for (const auto &[LiveIn, Tmp] : zip(operands(), LiveInPlaceholders)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we do actually, and I've added a test to test against the case Sander was worried about here.
} | ||
} | ||
|
||
for (auto *R : ExpressionRecipes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This subtly changes the behavior; we could have the same external operand used multiple times, and we will have multiple entries for them in LiveInPlaceholders, but only use the first of them.
I'm not sure if there's a way around this inconsistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are two placeholders for a value where only one of them is used when decomposing the expression, what is the practical issue with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is a practical issue at the moment, but as mentioned in #156976 (review) it should at least be mentioned/documented.
Or perhaps even better also store a map of live-ins and their tmp values, instead of a vector and creating unused temporary values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest commit has actually eliminated this behavioural change so I don't think we need to document anything, but I can do that as a follow-up if that changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doing replace-uses-with will update all uses of the external operand with the first temporary value we created for it, leaving the other tmp values created for the same external operand unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks, I missed that. I've added a comment as part of a rebase.
@@ -3013,8 +3015,11 @@ class VPExpressionRecipe : public VPSingleDefRecipe { | |||
{Ext0, Ext1, Mul, Red}) {} | |||
|
|||
~VPExpressionRecipe() override { | |||
for (auto *R : reverse(ExpressionRecipes)) | |||
delete R; | |||
SmallSet<VPSingleDefRecipe *, 4> ExpressionRecipesSeen; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SmallSet<VPSingleDefRecipe *, 4> ExpressionRecipesSeen; | |
SmallSetPtrSet<VPSingleDefRecipe *, 4> ExpressionRecipesSeen; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
This PR adds the ExtNegatedMulAccReduction expression type for VPExpressionRecipe so that extend-multiply-accumulate reductions with a negated multiply can be bundled. Stacked PRs: 1. llvm/llvm-project#156976 2. -> llvm/llvm-project#160154 3. llvm/llvm-project#147302
The VPExpressionRecipe class uses a set to store its bundled recipes. If repeated recipes are bundled then the duplicates will be lost, causing the following recipes to not be at the expected place in the set. When printing a reduce.add(mul(ext, ext)) bundle, if the extends are the same then the 3rd element of the set will the the reduction, rather than the expected mul, causing a cast error. With this change, the recipes are at the expected index in the set. Fixes llvm#156464
e3a2221
to
1d6e918
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with inline comments, thanks
ExpressionRecipes(SmallVector<VPSingleDefRecipe *>( | ||
ExpressionRecipes.begin(), ExpressionRecipes.end())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExpressionRecipes(SmallVector<VPSingleDefRecipe *>( | |
ExpressionRecipes.begin(), ExpressionRecipes.end())), | |
ExpressionRecipes(ExpressionRecipes), |
Does this now work? I think the explicit constructor was only needed to de-duplicate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thank you.
; CHECK: [[MIDDLE_BLOCK]]: | ||
; CHECK-NEXT: br [[DOT_CRIT_EDGE:label %.*]] | ||
; CHECK: [[SCALAR_PH:.*]]: | ||
; CHECK: [[_LR_PH:.*]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you strip the unrelated changes from the latest diff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
; CHECK: [[MIDDLE_BLOCK]]: | ||
; CHECK-NEXT: br [[DOT_CRIT_EDGE:label %.*]] | ||
; CHECK: [[SCALAR_PH:.*]]: | ||
; CHECK: [[_LR_PH:.*]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
ExpressionRecipes(SmallVector<VPSingleDefRecipe *>( | ||
ExpressionRecipes.begin(), ExpressionRecipes.end())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thank you.
This PR adds the ExtNegatedMulAccReduction expression type for VPExpressionRecipe so that extend-multiply-accumulate reductions with a negated multiply can be bundled. Stacked PRs: 1. llvm#156976 2. -> llvm#160154 3. llvm#147302
The VPExpressionRecipe class uses a set to store its bundled recipes. If repeated recipes are bundled then the duplicates will be lost, causing the following recipes to not be at the expected place in the set. When printing a reduce.add(mul(ext, ext)) bundle, for example, if the extends are the same then the 3rd element of the set will be the reduction, rather than the expected mul, causing a cast error. With this change, the recipes are at the expected index in the set. Fixes llvm#156464
The VPExpressionRecipe class uses a set to store its bundled recipes. If repeated recipes are bundled then the duplicates will be lost, causing the following recipes to not be at the expected place in the set.
When printing a reduce.add(mul(ext, ext)) bundle, for example, if the extends are the same then the 3rd element of the set will be the reduction, rather than the expected mul, causing a cast error. With this change, the recipes are at the expected index in the set.
Fixes #156464