Skip to content

Conversation

stuij
Copy link
Contributor

@stuij stuij commented Aug 11, 2025

Downstream issue: #482

With this new A320 in-order core, we follow adding the FeatureUseFixedOverScalableIfEqualCost feature to A510 and A520 (#132246), which reaps the same code generation benefits of preferring fixed over scalable when the cost is equal.

So when we have:

void foo(float* a, float* b, float* dst, unsigned n) {
    for (unsigned i = 0; i < n; ++i)
        dst[i] = a[i] + b[i];
}

When compiling without the feature enabled, we get:

...
    ld1b    { z0.b }, p0/z, [x0, x10]
    ld1b    { z2.b }, p0/z, [x1, x10]
    add     x12, x0, x10
    ldr     z1, [x12, #1, mul vl]
    add     x12, x1, x10
    ldr     z3, [x12, #1, mul vl]
    fadd    z0.s, z2.s, z0.s
    add     x12, x2, x10
    fadd    z1.s, z3.s, z1.s
    dech    x11
    st1b    { z0.b }, p0, [x2, x10]
    incb    x10, all, mul #2
    str     z1, [x12, #1, mul vl]
...

When compiling with, we get:

...
  	ldp	    q0, q1, [x12, #-16]
	ldp	    q2, q3, [x11, #-16]
	subs	x13, x13, #8
	fadd	v0.4s, v2.4s, v0.4s
	fadd	v1.4s, v3.4s, v1.4s
	add	    x11, x11, #32
	add	    x12, x12, #32
	stp	    q0, q1, [x10, #-16]
	add	    x10, x10, #32

...

This patch also moves FeatureUseFixedOverScalableIfEqualCost for A510 and A520 from the CPU features to the tune features.

@stuij stuij requested review from a team as code owners August 11, 2025 14:44
@stuij stuij requested a review from john-brawn-arm August 11, 2025 14:44
@github-actions github-actions bot added the downstream-change Downstream change to LLVM tree label Aug 11, 2025
Copy link

This pull review modifies files outside of the arm-software directory, so please ensure it follows the Downstream Patch Policy.
An automated check will test if the tagging requirements have been met. Please wait for approving reviews from both Arm Toolchain for Embedded and Arm Toolchain for Linux teams before merging.

@stuij
Copy link
Contributor Author

stuij commented Aug 11, 2025

This is a cherry-pick from upstream head. I'm assuming we don't need to adhere to the Requirements for a downstream patch paragraphs in the contributing guide?

davemgreen
davemgreen previously approved these changes Aug 11, 2025
Copy link
Contributor

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks the same as the upstream version, and should be pretty safe as it only alters tuning features. LGTM.

MarkMurrayARM
MarkMurrayARM previously approved these changes Aug 11, 2025
Copy link
Contributor

@MarkMurrayARM MarkMurrayARM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, and we have DavidG's reassurance. You may bypass the rules and merge this (Release manager hat on). MarkM

@dcandler
Copy link
Contributor

dcandler commented Aug 11, 2025

This is a cherry-pick from upstream head. I'm assuming we don't need to adhere to the Requirements for a downstream patch paragraphs in the contributing guide?

On our branch, it's still a downstream change that needs tracking.

Also, a reminder:

Please wait for approving reviews from both Arm Toolchain for Embedded and Arm Toolchain for Linux teams before merging.

tblah
tblah previously approved these changes Aug 11, 2025
Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is okay for ATfL as this is an upstream backport and doesn't impact the cores we benchmark against.

Downstream issue: arm#482

With this new A320 in-order core, we follow adding the
FeatureUseFixedOverScalableIfEqualCost feature to A510 and A520
(#132246), which reaps the same code generation benefits of preferring fixed
over scalable when the cost is equal.

So when we have:
```
void foo(float* a, float* b, float* dst, unsigned n) {
    for (unsigned i = 0; i < n; ++i)
        dst[i] = a[i] + b[i];
}
```

When compiling without the feature enabled, we get:
```
...
    ld1b    { z0.b }, p0/z, [x0, x10]
    ld1b    { z2.b }, p0/z, [x1, x10]
    add     x12, x0, x10
    ldr     z1, [x12, arm#1, mul vl]
    add     x12, x1, x10
    ldr     z3, [x12, arm#1, mul vl]
    fadd    z0.s, z2.s, z0.s
    add     x12, x2, x10
    fadd    z1.s, z3.s, z1.s
    dech    x11
    st1b    { z0.b }, p0, [x2, x10]
    incb    x10, all, mul arm#2
    str     z1, [x12, arm#1, mul vl]
...
```

When compiling with, we get:
```
...
  	ldp	    q0, q1, [x12, #-16]
	ldp	    q2, q3, [x11, #-16]
	subs	x13, x13, arm#8
	fadd	v0.4s, v2.4s, v0.4s
	fadd	v1.4s, v3.4s, v1.4s
	add	    x11, x11, arm#32
	add	    x12, x12, arm#32
	stp	    q0, q1, [x10, #-16]
	add	    x10, x10, arm#32

...
```

This patch also moves FeatureUseFixedOverScalableIfEqualCost for A510 and A520
from the CPU features to the tune features.
@stuij stuij dismissed stale reviews from tblah, MarkMurrayARM, and davemgreen via 63631f7 August 12, 2025 09:16
@stuij stuij force-pushed the fixed-over-scalable branch from 926af2e to 63631f7 Compare August 12, 2025 09:16
@stuij
Copy link
Contributor Author

stuij commented Aug 12, 2025

On our branch, it's still a downstream change that needs tracking.

Ok, thanks for the clarification. Done.

Copy link
Contributor

@dcandler dcandler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the only changes since the previous commit are additional comments which are non-functional so I think the previous approvals should still apply.

@stuij stuij merged commit bb670eb into arm:release/arm-software/21.x Aug 12, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
downstream-change Downstream change to LLVM tree
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants