Skip to content

Conversation

AlexandreEichenberger
Copy link
Collaborator

Cedric mentioned that 4k page alignment for the fp32 data is beneficial for performance.

This PR add 4k alignment for data used by zlow.stick when not implemented on CPU. For unstick, it was already the case in all but one case (which is added by this PR).

Option is default on for z17, off for z16 (as stick is always done in software).

Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Copy link
Collaborator

@tungld tungld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

assert(allocOfXOp && "unstick output should always be allocated");
auto alignmentAttr = allocOfXOp.getAlignment();
int64_t intAlign = alignmentAttr ? alignmentAttr.value() : 1;
if (intAlign >= gAlignment) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be (intAlign != 0 && intAlign % gAlignment == 0)?

Copy link
Collaborator Author

@AlexandreEichenberger AlexandreEichenberger Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that would protect us if for some reason we were to align buffers to multiple of 4k. But then what happens if we want an align to 7k... should we bump it to 8k?

That is what I implemented, next 4k page given current alignment of alloc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@AlexandreEichenberger
Copy link
Collaborator Author

@jenkins-droid test this please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants