-
Notifications
You must be signed in to change notification settings - Fork 807
[SYCL] Implement coverage instrumentation for device code #20206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
Conversation
Signed-off-by: Michael Aziz <[email protected]>
Signed-off-by: Michael Aziz <[email protected]>
clang-offload-extract | ||
clang-offload-packager | ||
clang-linker-wrapper | ||
compiler-rt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do other projects enable profiling unconditionally in the build too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only remaining question from me, the rest is 🔥 👍
Would it make sense to add a high-level design document here as well (unless one exists already). Also, what about kernels using optional features? Are we merging info collected by different device images somehow? That needs to be explained either in some doc or at the PR title. |
I think this makes sense. I'll add a document to
I'm not sure I understand the question. If a kernel uses some features that prevent it from being submitted to a device, I would expect the corresponding profiling counters to contain zero values and not change the resulting coverage report.
The profiling counters collected from any device image are added to the host's profiling counters. If there are multiple device images that share device code, each one will increase the same profiling counters on the host. The assumption is that the host and device compilers instrument the same functions so it's possible to copy the counter values from the device to the host. The resulting profile output contains the values of the host profiling counters at the end of the program's execution. I'll make this clear in the design document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Driver changes LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but ping me when the design doc is ready for another round :)
Signed-off-by: Michael Aziz <[email protected]>
28ef68e
to
9408601
Compare
Signed-off-by: Michael Aziz <[email protected]>
Thanks for the reviews. I've added |
Signed-off-by: Michael Aziz <[email protected]>
|
||
Profiling counters for code coverage are lowered by the compiler as device globals. Specifically, the `InstrProfilingLoweringPass` is modified so that, when targeting SPIR-V, coverage counters are represented as pointers to USM buffers, matching the representation of other SYCL device globals. This indirection allows counters to be relocatable and managed consistently with other device-side global variables. | ||
|
||
Each counter is annotated with a unique identifier (`sycl-unique-id`) of the form `__profc_<fn_hash>`, where `<fn_hash>` is a 64-bit unsigned integer uniquely identifying the instrumented function. The counter's size is also recorded via the `sycl-device-global-size` attribute. These attributes ensure that counters are discoverable and manageable by the SYCL runtime and integration headers/footers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the counter's size determined?
Can't we use one size (e.g. 8-byte integer) for all counters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The counter variable is actually an array of integers, each of which is eight bytes. The number of elements in this array is equal to the number of regions in the function being instrumented. For the kernel in the coverage test case, there are two regions since there are two code branches. The resulting device global variable will have a size of (2 * sizeof(std::uint64_t))
bytes. I'll update the document to try to make this more clear.
Signed-off-by: Michael Aziz <[email protected]>
This PR extends Clang's source-based code coverage to work with SYCL device code. It includes the following changes:
InstrProfilingLoweringPass
was updated to lower profiling counters to SYCL device globals.This feature may not work correctly for functions that differ between host and device due to the use of the
__SYCL_DEVICE_ONLY__
macro. In such cases, it may not be possible to correlate the profiling counters from the device to the host. Resolves #7803.