-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[OpenMP][Clang] Use ATTACH
map-type for list-items with base-pointers.
#153683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[OpenMP][Clang] Use ATTACH
map-type for list-items with base-pointers.
#153683
Conversation
For the following: ```c int *p; \#pragma omp target map(p[0]) // (A) (void)p; \#pragma omp target map(p) // (B) (void)p; \#pragma omp target map(p, p[0]) // (C) (void)p; \#pragma omp target map(p[0], p) // (D) (void)p; ``` For (A), the pointer `p` is predetermined `firstprivate`, so it should be (and is) captured by-copy. However, for (B), (C), and (D), since `p` is already listed in a `map` clause, it's not predetermined `firstprivate`, and hence, should be captured by-reference, like any other mapped variable. To ensure the correct handling of (C) and (D), the following changes were made: 1. In SemaOpenMP, we now ensure that `p` is marked to be captured by-reference in these cases. 2. We no longer ignore `map(p)` during codegen of `target` constructs, even if there's another map like `map(p[0])` that would have been mapped using a PTR_AND_OBJ map. 3. For cases like (D), we now handle `map(p)` before `map(p[0])`, so the former gets the TARGET_PARAM flag and sets the kernel argument.
… before their derefs.
The output of the compile-and-run tests is incorrect. These will be used for reference in future commits that resolve the issues. Also updated the existing clang LIT test, target_map_both_pointer_pointee_codegen.cpp, with more regions and more narrowed-down update_cc_test_checks filters.
…mapped-ptrs-by-ref
…ion-using-attach-maptype
…ion-using-attach-maptype
This patch introduces libomptarget support for the ATTACH map-type, which can be used to implement OpenMP conditional compliant pointer attachment, based on whether the pointer/pointee is newly mapped on a given construct. For example, for the following: ```c int *p; #pragma omp target enter data map(p[1:10]) ``` The following maps can be emitted by clang: ``` (A) &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM &p, &p[1], sizeof(p), ATTACH ``` Without this map-type, the two possible maps emitted by clang: ``` (B) &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM (C) &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ ```` (B) does not perform any pointer attachment, while (C) also maps the pointer p, which are both incorrect. In terms of implementation, maps with the ATTACH map-type are handled after all other maps have been processed, as it requires knowledge of which new allocations happened as part of the construct. As per OpenMP 5.0, an attachment should happen only when either the pointer or the pointee was newly mapped while handling the construct. Maps with ATTACH map-type-bit do not increase/decrease the ref-count. With OpenMP 6.1, `attach(always/never)` can be used to force/prevent attachment. For `attach(always)`, the compiler will insert the ALWAYS map-type, which would let libomptarget bypass the check about one of the pointer/pointee being new. With `attach(never)`, the ATTACH map will not be emitted at all. The size argument of the ATTACH map-type can specify values greater than `sizeof(void*)` which can be used to support pointer attachment on Fortran descriptors. Note that this also requires shadow-pointer tracking to also support them. That has not been implemented in this patch. This was worked upon in coordination with Ravi Narayanaswamy, who has since retired. Happy retirement, Ravi!
This patch introduces libomptarget support for the ATTACH map-type, which can be used to implement OpenMP conditional compliant pointer attachment, based on whether the pointer/pointee is newly mapped on a given construct. For example, for the following: ```c int *p; #pragma omp target enter data map(p[1:10]) ``` The following maps can be emitted by clang: ``` (A) &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM &p, &p[1], sizeof(p), ATTACH ``` Without this map-type, the two possible maps emitted by clang: ``` (B) &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM (C) &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ ```` (B) does not perform any pointer attachment, while (C) also maps the pointer p, which are both incorrect. In terms of implementation, maps with the ATTACH map-type are handled after all other maps have been processed, as it requires knowledge of which new allocations happened as part of the construct. As per OpenMP 5.0, an attachment should happen only when either the pointer or the pointee was newly mapped while handling the construct. Maps with ATTACH map-type-bit do not increase/decrease the ref-count. With OpenMP 6.1, `attach(always/never)` can be used to force/prevent attachment. For `attach(always)`, the compiler will insert the ALWAYS map-type, which would let libomptarget bypass the check about one of the pointer/pointee being new. With `attach(never)`, the ATTACH map will not be emitted at all. The size argument of the ATTACH map-type can specify values greater than `sizeof(void*)` which can be used to support pointer attachment on Fortran descriptors. Note that this also requires shadow-pointer tracking to also support them. That has not been implemented in this patch. This was worked upon in coordination with Ravi Narayanaswamy, who has since retired. Happy retirement, Ravi!
IsValidBase = true; | ||
|
||
if (!IsValidBase) { | ||
SemaRef.Diag(ELoc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// - For pointers mapped by value that have either an implicit map or an | ||
// array section, the runtime library may pass the NULL value to the | ||
// device instead of the value passed to it by the compiler. | ||
// - If both a pointer an a dereference of it are mapped, then the pointer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #145454 for chages/tests specific to the change in variable capturing. If preferred, the change can be merged before this, but that would lead to some regressions.
/// TODO: Handle cases for target-update, where the list-item is a | ||
/// non-contiguous array-section that still has a base-pointer. | ||
static std::pair<const Expr *, std::optional<size_t>> | ||
findAttachPtrExpr(MappableExprComponentListRef Components, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The helper/utility functions can be reviewed separately as part of this PR, if that is preferable: #155625.
// &p, &p[1], 24*sizeof(float), TARGET_PARAM | TO | FROM | PTR_AND_OBJ | ||
// in unified shared memory mode or for local pointers | ||
// p, &p[1], 24*sizeof(float), TARGET_PARAM | TO | FROM | ||
// p, &p[1], 24*sizeof(float), TARGET_PARAM | TO | FROM // map pointee |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// is same as the use_device_ptr operand. e.g. | ||
// map expr | use_device_ptr expr | current behavior | ||
// ---------|---------------------|----------------- | ||
// p[1] | p | match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See clang/test/OpenMP/target_data_use_device_ptr_codegen.cpp for tests for when we match and when we don't match against an existing map. We also have compile+run tests (for non-USM cases), that are passing with this PR. See offload/test/mapping/use_device_ptr.
// map expr | use_device_addr expr | current | possible restrictive/ | ||
// | | behavior | safer behavior | ||
// ---------|----------------------|-----------|----------------------- | ||
// p | p | match | match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See clang/test/OpenMP/target_data_use_device_addr_codegen.cpp for codegen tests, and offload/test/mapping/use_device_addr for compile+run tests (that now pass with this PR).
// &(ps->ps->ps), &(ps->ps->ps->ps), sizeof(S2*), PTR_AND_OBJ | TO | FROM | ||
// ps, &ps[0], 0, TARGET_PARAM | IMPLICIT // (+) | ||
// &(ps->ps->ps[0]), &(ps->ps->ps->ps), sizeof(S2*), FROM | ||
// &(ps->ps->ps), &(ps->ps->ps->ps), sizeof(void*), ATTACH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the tests at the bottom of clang/test/OpenMP/target_data_codegen.cpp for examples for deeply nested member-of expressions.
// FIXME: This is now redundant as we are not populating DeferredInfo | ||
// anymore. Remove unless we find a legitimate need of populating | ||
// using DefferedInfo during the review process. | ||
auto CI = DeferredInfo.find(Data.first); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexey-bataev, do you remember any specific case which required us to use DeferredInfo? With the current codegen, it's not being populated, and all use_device_ptr/addr unit tests are passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexey-bataev, did you get a chance to look at this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember anything related to this, check the commits history
} | ||
|
||
// &f[0], &f[0], 0, PARAM | IMPLICIT | ||
// &f[index][0], &f[index][index+1], (index+1)*4 < 4? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ddpagan, it's not related to this PR, but do you know why we are not unconditionally using 0 as the size for assumed-size maps with pointer bases? For example line 14 uses 4, while line 21 uses 0. And here, we are trying to compute some size at runtime.
…-using-attach-maptype
…-using-attach-maptype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too many changes, would be good split into separate patches and commit step-by-step
…on `target, when applicable. On a target construct, if there's an implicit map on a struct, or that of this[:], and an explicit map with a member of that struct/class as the base-pointer, we need to make sure that base-pointer is implicitly mapped, to make sure we don't map the full struct/class. For example: ```cpp struct S { int dummy[10000]; int *p; void f1() { #pragma omp target map(p[0:1]) (void)this; } }; S s; void f2() { #pragma omp target map(s.p[0:10]) (void)s; } ``` Only `this-p` and `s.p` should be mapped in the two cases above. If we were to implicitly map the full struct `s`, or `this[0:1]`, it would map the `dummy` field as well. This was pulled out of llvm#153683 to make that PR smaller. it also pulls out one other utility, and an NFC changes to the AttachPtrExpr comparator from that PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pulled out another NFC subset out of this into #161294. Breaking this down further is going to be difficult practically, except maybe pulling out the error-checking for use_device_addr
operands.
I had originally thought about breaking this up based on whether the construct is target/non-target, but that doesn't work because they share the calls to the underlying generateAllInfo
and generateInfoForComponentList
.
Aside from that, leaving out the use_device_ptr/addr changes would cause stability regressions, and the same is true for leaving out the capturing code (however, not as widespread, so if preferable, #145454 can be merged first).
… attach base-ptrs. (llvm#155625) These have been pulled out of the codegen PR llvm#153683, to reduce the size of that PR.
…rom pointer to reference. Also adds an instance of `AttachPtrExprComparator` to the `MappableExprHandler` class, so that it can be reused for multiple comparisons. This was extracted out of llvm#153683 to make that PR more focused on the functional changes.
AttachPtrExprComparator
from pointer to reference.
#161785
…on-using-attach-maptype
This adds support for using
ATTACH
map-type for proper pointer-attachment when mapping list-items that have base-pointers.For example, for the following:
The following maps are now emitted by clang:
Previously, the two possible maps emitted by clang were:
(B) does not perform any pointer attachment, while (C) also maps the
pointer p, both of which are incorrect.
With this change, we are using ATTACH-style maps, like
(A)
, for cases where the expression has a base-pointer. For example:Why a large PR
While it's unfortunate that this PR has gotten large and difficult to review, the issue is that all the functional changes have to be made together, to prevent regressions from partially implemented changes.
For example, the changes to capturing were previously done separately (#145454), but they would still cause stability issues in absence of full attach-mapping. And attach-mapping needs those changes to be able to launch kernels.
We extracted the utilities and functions, like those for finding attach-ptrs, or comparing exprs, out as a separate NFC PR that doesn't call those functions, just adds them (#155625), and another NFC PR for adding an implicit clause on attach-ptrs, when applicable, is #161294. Maybe the change that adds a new error message for use_device_addr on array-sections with non-var base-pointers can be extracted out too (but that will have to be a follow-up change in that case, and we may get comp-fails with this PR when the erroneous case is not caught/diagnosed).
Grouping of maps based on attach base-pointers
We also group mapping of clauses with the same base decl in the order of the increasing complexity of their base-pointers, e.g. for something like:
We first map
spp
, thenspp[0]
thenspp[0][0]
andspp[0][0].a
.This allows us to also group "struct" allocation based on their attach pointers. This resolves the issues of us always mapping everything from the beginning of the symbol
spp
. Each group is mapped independently, and at the same level, likespp[0][0]
and its memberspp[0][0].a
, we still get map them together as part of the same contiguous structspp[0][0]
. This resolves issue #141042.use_device_ptr/addr fixes
The handling of
use_device_ptr/addr
was updated to use the attach-ptr information, and works for many cases that were failing before. It has to be done as part of this series because otherwise, the switch from ptr_to_obj to attach-style mapping would have caused regressions in existing use_device_ptr/addr tests.Handling of attach-pointers that are members of implicitly mapped structs:
p
below, is a base-pointer in amap
clause on a target construct (likemap(p[0:1])
, and the base of that struct is either thethis
pointer (implicitly or explicitly), or a struct that is implicitly mapped on that construct, we add an implicitmap(p)
so that we don't implicitly map the full struct.Scope for improvement:
findAttachPtrExpr
is fairly simple, and fast.Needs future work:
ref_ptr/ref_ptee
, andattach
map-type-modifier on them.