-
Notifications
You must be signed in to change notification settings - Fork 370
Stick/unstick fp32 data alloc at 4k pages #3250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Stick/unstick fp32 data alloc at 4k pages #3250
Conversation
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
assert(allocOfXOp && "unstick output should always be allocated"); | ||
auto alignmentAttr = allocOfXOp.getAlignment(); | ||
int64_t intAlign = alignmentAttr ? alignmentAttr.value() : 1; | ||
if (intAlign >= gAlignment) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be (intAlign != 0 && intAlign % gAlignment == 0)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that would protect us if for some reason we were to align buffers to multiple of 4k. But then what happens if we want an align to 7k... should we bump it to 8k?
That is what I implemented, next 4k page given current alignment of alloc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@jenkins-droid test this please |
Signed-off-by: Alexandre Eichenberger <[email protected]>
Signed-off-by: Alexandre Eichenberger <[email protected]>
Cedric mentioned that 4k page alignment for the fp32 data is beneficial for performance.
This PR add 4k alignment for data used by
zlow.stick
when not implemented on CPU. For unstick, it was already the case in all but one case (which is added by this PR).Option is default on for z17, off for z16 (as stick is always done in software).