Skip to content

Commit 8201ffc

Browse files
committed
Add a design doc
Signed-off-by: Michael Aziz <[email protected]>
1 parent 9408601 commit 8201ffc

File tree

1 file changed

+69
-0
lines changed

1 file changed

+69
-0
lines changed
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Design for Device-side Code Coverage
2+
3+
## Overview
4+
5+
This document describes the design and implementation of device-side code coverage for SYCL, extending Clang's source-based code coverage to support device code. The approach leverages the existing SYCL device global infrastructure, as detailed in the [DeviceGlobal.md](DeviceGlobal.md) design document, to enable collection and aggregation of coverage data from device kernels.
6+
7+
## Design Details
8+
9+
### Profiling Counter Representation
10+
11+
Profiling counters for code coverage are lowered by the compiler as device globals. Specifically, the `InstrProfilingLoweringPass` is modified so that, when targeting SPIR-V, coverage counters are represented as pointers to USM buffers, matching the representation of other SYCL device globals. This indirection allows counters to be relocatable and managed consistently with other device-side global variables.
12+
13+
Each counter is annotated with a unique identifier (`sycl-unique-id`) of the form `__profc_<fn_hash>`, where `<fn_hash>` is a 64-bit unsigned integer uniquely identifying the instrumented function. The counter's size is also recorded via the `sycl-device-global-size` attribute. These attributes ensure that counters are discoverable and manageable by the SYCL runtime and integration headers/footers.
14+
15+
### Integration with Device Global Infrastructure
16+
17+
The device global infrastructure, as described in [DeviceGlobal.md](DeviceGlobal.md), provides mechanisms for mapping host and device instances of global variables, managing their lifetimes, and facilitating data transfer. Device-side coverage counters are treated as a special class of device globals:
18+
19+
- They use the shared alloation type rather than the device allocation type for the underlying USM memory.
20+
- They do not have corresponding `device_global` declarations in host code.
21+
- Their lifetime and cleanup are managed via the device global map, with integration footer code ensuring registration and deregistration.
22+
23+
### Runtime Handling and Data Aggregation
24+
25+
When a device global entry corresponding to a coverage counter is released (e.g., when a device image is unloaded), the SYCL runtime aggregates the values from the device-side counter into the equivalent host-side counter. Equivalence is determined by matching both the `<fn_hash>` and the number of counter regions. If no matching host-side counter exists—typically due to differences in code between host and device caused by the `__SYCL_DEVICE_ONLY__` macro—the device-side counter values are discarded.
26+
27+
The aggregation is performed by invoking a new function in the compiler runtime, `__sycl_increment_profile_counters`, which is weakly linked to accommodate optional runtime availability. This function accepts the `<fn_hash>`, the number of regions, and the increment values, and updates the host-side counters accordingly. At program exit, the final profile data reflects the sum of host and device coverage counters.
28+
29+
### Compiler and Runtime Changes
30+
31+
#### Compiler Frontend
32+
33+
- The lowering pass for coverage counters is updated to emit device globals with the appropriate attributes and indirection.
34+
- Integration headers and footers are updated to register device global counters with the runtime, using the unique identifier and size.
35+
36+
#### SYCL Runtime
37+
38+
- Device globals with IDs matching the `__profc_<fn_hash>` pattern are recognized as coverage counters.
39+
- USM allocation and management for counters is handled as for other device globals, but without host-side declarations.
40+
- Upon cleanup, device-side counter values are aggregated into host-side counters via the runtime API.
41+
42+
#### Compiler Runtime
43+
44+
- The new function `__sycl_increment_profile_counters` is introduced to update host-side counters.
45+
- The function is weakly linked to allow for optional inclusion.
46+
47+
### Limitations and Considerations
48+
49+
- The feature is currently implemented only for SPIR-V targets; CUDA and HIP backends are not supported.
50+
- Devices lacking support for device globals cannot utilize device-side code coverage.
51+
- Differences in code between host and device (e.g., due to `__SYCL_DEVICE_ONLY__`) may prevent aggregation of coverage data for some functions.
52+
- The design relies on the robustness of the device global infrastructure for correct mapping and lifetime management.
53+
54+
## Relationship to Device Global Design
55+
56+
This feature is built upon the mechanisms described in [DeviceGlobal.md](DeviceGlobal.md), including:
57+
58+
- Use of unique string identifiers (`sycl-unique-id`) for mapping and management.
59+
- USM-based allocation and zero-initialization of device-side storage.
60+
- Integration header/footer registration for host-device correlation.
61+
- Runtime database for device global management and lookup.
62+
63+
The code coverage counters are a specialized use case of device globals, with additional logic for aggregation and profile generation.
64+
65+
## References
66+
67+
- [Implementation design for SYCL device globals](DeviceGlobal.md)
68+
- [Clang Source-based Code Coverage](https://clang.llvm.org/docs/SourceBasedCodeCoverage.html)
69+
- [SYCL Specification](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html)

0 commit comments

Comments
 (0)