-
Notifications
You must be signed in to change notification settings - Fork 36
[AArch64][SVE] Use FeatureUseFixedOverScalableIfEqualCost for A320 #480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64][SVE] Use FeatureUseFixedOverScalableIfEqualCost for A320 #480
Conversation
This pull review modifies files outside of the |
This is a cherry-pick from upstream head. I'm assuming we don't need to adhere to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks the same as the upstream version, and should be pretty safe as it only alters tuning features. LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, and we have DavidG's reassurance. You may bypass the rules and merge this (Release manager hat on). MarkM
On our branch, it's still a downstream change that needs tracking. Also, a reminder:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is okay for ATfL as this is an upstream backport and doesn't impact the cores we benchmark against.
Downstream issue: arm#482 With this new A320 in-order core, we follow adding the FeatureUseFixedOverScalableIfEqualCost feature to A510 and A520 (#132246), which reaps the same code generation benefits of preferring fixed over scalable when the cost is equal. So when we have: ``` void foo(float* a, float* b, float* dst, unsigned n) { for (unsigned i = 0; i < n; ++i) dst[i] = a[i] + b[i]; } ``` When compiling without the feature enabled, we get: ``` ... ld1b { z0.b }, p0/z, [x0, x10] ld1b { z2.b }, p0/z, [x1, x10] add x12, x0, x10 ldr z1, [x12, arm#1, mul vl] add x12, x1, x10 ldr z3, [x12, arm#1, mul vl] fadd z0.s, z2.s, z0.s add x12, x2, x10 fadd z1.s, z3.s, z1.s dech x11 st1b { z0.b }, p0, [x2, x10] incb x10, all, mul arm#2 str z1, [x12, arm#1, mul vl] ... ``` When compiling with, we get: ``` ... ldp q0, q1, [x12, #-16] ldp q2, q3, [x11, #-16] subs x13, x13, arm#8 fadd v0.4s, v2.4s, v0.4s fadd v1.4s, v3.4s, v1.4s add x11, x11, arm#32 add x12, x12, arm#32 stp q0, q1, [x10, #-16] add x10, x10, arm#32 ... ``` This patch also moves FeatureUseFixedOverScalableIfEqualCost for A510 and A520 from the CPU features to the tune features.
63631f7
926af2e
to
63631f7
Compare
Ok, thanks for the clarification. Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, the only changes since the previous commit are additional comments which are non-functional so I think the previous approvals should still apply.
Downstream issue: #482
With this new A320 in-order core, we follow adding the FeatureUseFixedOverScalableIfEqualCost feature to A510 and A520 (#132246), which reaps the same code generation benefits of preferring fixed over scalable when the cost is equal.
So when we have:
When compiling without the feature enabled, we get:
When compiling with, we get:
This patch also moves FeatureUseFixedOverScalableIfEqualCost for A510 and A520 from the CPU features to the tune features.