Skip to content

Conversation

0x12CC
Copy link
Contributor

@0x12CC 0x12CC commented Sep 26, 2025

This PR extends Clang's source-based code coverage to work with SYCL device code. It includes the following changes:

  1. The InstrProfilingLoweringPass was updated to lower profiling counters to SYCL device globals.
  2. A new function was added to the compiler runtime to increment the host-side profiling counters.
  3. The SYCL runtime was updated to send the contents of the device-side profiling counters to the new compiler runtime function when their device global map entries are being freed.

This feature may not work correctly for functions that differ between host and device due to the use of the __SYCL_DEVICE_ONLY__ macro. In such cases, it may not be possible to correlate the profiling counters from the device to the host. Resolves #7803.

Signed-off-by: Michael Aziz <[email protected]>
clang-offload-extract
clang-offload-packager
clang-linker-wrapper
compiler-rt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do other projects enable profiling unconditionally in the build too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only remaining question from me, the rest is 🔥 👍

@aelovikov-intel
Copy link
Contributor

Would it make sense to add a high-level design document here as well (unless one exists already). Also, what about kernels using optional features? Are we merging info collected by different device images somehow? That needs to be explained either in some doc or at the PR title.

@0x12CC
Copy link
Contributor Author

0x12CC commented Sep 26, 2025

Would it make sense to add a high-level design document here as well (unless one exists already).

I think this makes sense. I'll add a document to sycl/doc/design.

Also, what about kernels using optional features?

I'm not sure I understand the question. If a kernel uses some features that prevent it from being submitted to a device, I would expect the corresponding profiling counters to contain zero values and not change the resulting coverage report.

Are we merging info collected by different device images somehow? That needs to be explained either in some doc or at the PR title.

The profiling counters collected from any device image are added to the host's profiling counters. If there are multiple device images that share device code, each one will increase the same profiling counters on the host. The assumption is that the host and device compilers instrument the same functions so it's possible to copy the counter values from the device to the host. The resulting profile output contains the values of the host profiling counters at the end of the program's execution. I'll make this clear in the design document.

Copy link
Contributor

@srividya-sundaram srividya-sundaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Driver changes LGTM

Copy link
Contributor

@maarquitos14 maarquitos14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but ping me when the design doc is ready for another round :)

Signed-off-by: Michael Aziz <[email protected]>
Signed-off-by: Michael Aziz <[email protected]>
@0x12CC
Copy link
Contributor Author

0x12CC commented Sep 26, 2025

Thanks for the reviews. I've added sycl/doc/design/DeviceCodeCoverage.md in 8201ffc.

@0x12CC 0x12CC requested a review from a team as a code owner September 26, 2025 20:05

Profiling counters for code coverage are lowered by the compiler as device globals. Specifically, the `InstrProfilingLoweringPass` is modified so that, when targeting SPIR-V, coverage counters are represented as pointers to USM buffers, matching the representation of other SYCL device globals. This indirection allows counters to be relocatable and managed consistently with other device-side global variables.

Each counter is annotated with a unique identifier (`sycl-unique-id`) of the form `__profc_<fn_hash>`, where `<fn_hash>` is a 64-bit unsigned integer uniquely identifying the instrumented function. The counter's size is also recorded via the `sycl-device-global-size` attribute. These attributes ensure that counters are discoverable and manageable by the SYCL runtime and integration headers/footers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the counter's size determined?
Can't we use one size (e.g. 8-byte integer) for all counters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The counter variable is actually an array of integers, each of which is eight bytes. The number of elements in this array is equal to the number of regions in the function being instrumented. For the kernel in the coverage test case, there are two regions since there are two code branches. The resulting device global variable will have a size of (2 * sizeof(std::uint64_t)) bytes. I'll update the document to try to make this more clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: Support for code coverage
5 participants