|
| 1 | +# Design for Device-side Code Coverage |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +This document describes the design and implementation of device-side code coverage for SYCL, extending Clang's source-based code coverage to support device code. The approach leverages the existing SYCL device global infrastructure, as detailed in the [DeviceGlobal.md](DeviceGlobal.md) design document, to enable collection and aggregation of coverage data from device kernels. |
| 6 | + |
| 7 | +## Design Details |
| 8 | + |
| 9 | +### Profiling Counter Representation |
| 10 | + |
| 11 | +Profiling counters for code coverage are lowered by the compiler as device globals. Specifically, the `InstrProfilingLoweringPass` is modified so that, when targeting SPIR-V, coverage counters are represented as pointers to USM buffers, matching the representation of other SYCL device globals. This indirection allows counters to be relocatable and managed consistently with other device-side global variables. |
| 12 | + |
| 13 | +Each counter is annotated with a unique identifier (`sycl-unique-id`) of the form `__profc_<fn_hash>`, where `<fn_hash>` is a 64-bit unsigned integer uniquely identifying the instrumented function. The counter's size is also recorded via the `sycl-device-global-size` attribute. These attributes ensure that counters are discoverable and manageable by the SYCL runtime and integration headers/footers. |
| 14 | + |
| 15 | +### Integration with Device Global Infrastructure |
| 16 | + |
| 17 | +The device global infrastructure, as described in [DeviceGlobal.md](DeviceGlobal.md), provides mechanisms for mapping host and device instances of global variables, managing their lifetimes, and facilitating data transfer. Device-side coverage counters are treated as a special class of device globals: |
| 18 | + |
| 19 | +- They use the shared alloation type rather than the device allocation type for the underlying USM memory. |
| 20 | +- They do not have corresponding `device_global` declarations in host code. |
| 21 | +- Their lifetime and cleanup are managed via the device global map, with integration footer code ensuring registration and deregistration. |
| 22 | + |
| 23 | +### Runtime Handling and Data Aggregation |
| 24 | + |
| 25 | +When a device global entry corresponding to a coverage counter is released (e.g., when a device image is unloaded), the SYCL runtime aggregates the values from the device-side counter into the equivalent host-side counter. Equivalence is determined by matching both the `<fn_hash>` and the number of counter regions. If no matching host-side counter exists—typically due to differences in code between host and device caused by the `__SYCL_DEVICE_ONLY__` macro—the device-side counter values are discarded. |
| 26 | + |
| 27 | +The aggregation is performed by invoking a new function in the compiler runtime, `__sycl_increment_profile_counters`, which is weakly linked to accommodate optional runtime availability. This function accepts the `<fn_hash>`, the number of regions, and the increment values, and updates the host-side counters accordingly. At program exit, the final profile data reflects the sum of host and device coverage counters. |
| 28 | + |
| 29 | +### Compiler and Runtime Changes |
| 30 | + |
| 31 | +#### Compiler Frontend |
| 32 | + |
| 33 | +- The lowering pass for coverage counters is updated to emit device globals with the appropriate attributes and indirection. |
| 34 | +- Integration headers and footers are updated to register device global counters with the runtime, using the unique identifier and size. |
| 35 | + |
| 36 | +#### SYCL Runtime |
| 37 | + |
| 38 | +- Device globals with IDs matching the `__profc_<fn_hash>` pattern are recognized as coverage counters. |
| 39 | +- USM allocation and management for counters is handled as for other device globals, but without host-side declarations. |
| 40 | +- Upon cleanup, device-side counter values are aggregated into host-side counters via the runtime API. |
| 41 | + |
| 42 | +#### Compiler Runtime |
| 43 | + |
| 44 | +- The new function `__sycl_increment_profile_counters` is introduced to update host-side counters. |
| 45 | +- The function is weakly linked to allow for optional inclusion. |
| 46 | + |
| 47 | +### Limitations and Considerations |
| 48 | + |
| 49 | +- The feature is currently implemented only for SPIR-V targets; CUDA and HIP backends are not supported. |
| 50 | +- Devices lacking support for device globals cannot utilize device-side code coverage. |
| 51 | +- Differences in code between host and device (e.g., due to `__SYCL_DEVICE_ONLY__`) may prevent aggregation of coverage data for some functions. |
| 52 | +- The design relies on the robustness of the device global infrastructure for correct mapping and lifetime management. |
| 53 | + |
| 54 | +## Relationship to Device Global Design |
| 55 | + |
| 56 | +This feature is built upon the mechanisms described in [DeviceGlobal.md](DeviceGlobal.md), including: |
| 57 | + |
| 58 | +- Use of unique string identifiers (`sycl-unique-id`) for mapping and management. |
| 59 | +- USM-based allocation and zero-initialization of device-side storage. |
| 60 | +- Integration header/footer registration for host-device correlation. |
| 61 | +- Runtime database for device global management and lookup. |
| 62 | + |
| 63 | +The code coverage counters are a specialized use case of device globals, with additional logic for aggregation and profile generation. |
| 64 | + |
| 65 | +## References |
| 66 | + |
| 67 | +- [Implementation design for SYCL device globals](DeviceGlobal.md) |
| 68 | +- [Clang Source-based Code Coverage](https://clang.llvm.org/docs/SourceBasedCodeCoverage.html) |
| 69 | +- [SYCL Specification](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html) |
0 commit comments