You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SYCL] Support per-object file compilation (#7595)
This change adds per-object compilation support for SYCL, also called
non-relocatable device code mode. This is already supported in clang for
HIP and CUDA.
It adds a new option -f[no-]sycl-rdc. The default is -fsycl-rdc, which
compiles code as today. Passing -fno-sycl-rdc activates the new mode.
This is just an alias to the existing flag used by AMD/CUDA,
f[no-]-gpu-rdc.
The main implication is that we no longer link all device code together
into one big module before post link.
Instead, we execute all jobs after device linking on a per-object file
basis.
This means sycl-post-link and the later jobs execute multiple times,
since we no longer have one big module.
This can result in large improvement performance in the compiler runtime
and memory usage, we see a max memory usage reduction for QUDA with -g
from over 250GB to 4GB and a large compiler runtime improvement as well.
Error cases:
1) Cross-object dependencies. Since we don't link device code together,
each object file must be independent. I added an error in Sema to error
if the user passes this flag and has cross-object dependencies.
2) Invalid architecture in fat object. We currently warn gracefully
about this, in per-object-file mode llvm-foreach throws an error
customers won't understand, so error out in that case instead of
warning.
Signed-off-by: Sarnie, Nick <[email protected]>
0 commit comments