-
Notifications
You must be signed in to change notification settings - Fork 808
Description
Dear All,
I ran into an interesting issue just now, which I thought could be worth discussing about a bit...
I had to realise that if I build the project with its CUDA backend enabled, I can only use the compiler on a machine that has an NVidia GPU installed. (Or at least has the necessary driver installed at least, even if doesn't have a GPU.) This is disappointing, as I was hoping that I could share a single installation between a number of my machines. Some of which would have both NVidia and Intel devices in them, and some that would only have Intel ones.
But when I use the compiler on a machine without an NVidia driver installed (because it doesn't have an NVidia card), I run into the following:
[bash][atlas]:build > ldd -r /opt/intel-clang/12.0.0/x86_64-centos8/lib/libsycl.so
linux-vdso.so.1 (0x00007ffe7f2de000)
libOpenCL.so.1 => /opt/intel-clang/12.0.0/x86_64-centos8/lib/libOpenCL.so.1 (0x00007ff00841b000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007ff008217000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007ff007ff7000)
libpi_cuda.so => /opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so (0x00007ff007dbc000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007ff007ba4000)
libcuda.so.1 => not found
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007ff00780f000)
libm.so.6 => /lib64/libm.so.6 (0x00007ff00748d000)
libc.so.6 => /lib64/libc.so.6 (0x00007ff0070cb000)
/lib64/ld-linux-x86-64.so.2 (0x00007ff008b2f000)
libcuda.so.1 => not found
undefined symbol: cuSurfObjectDestroy (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuGetErrorName (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuCtxDestroy_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuCtxSetCurrent (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuArray3DCreate_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuEventCreate (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemcpyDtoDAsync_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemcpy2DAsync_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuDeviceGet (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuLinkCreate_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemHostUnregister (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemFreeHost (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuStreamCreate (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemAlloc_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemcpyAtoHAsync_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuArrayGetDescriptor_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuEventElapsedTime (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemAllocHost_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemFree_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuDeviceTotalMem_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemcpyHtoAAsync_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemcpyHtoA_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemcpyAtoA_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemAllocManaged (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemcpyDtoHAsync_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemsetD8Async (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuDriverGetVersion (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuArray3DGetDescriptor_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuDeviceGetName (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemcpy3D_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemHostGetDevicePointer_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuCtxPopCurrent_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuLinkDestroy (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuLinkComplete (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemcpyHtoDAsync_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuCtxGetCurrent (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuGetErrorString (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuLaunchKernel (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuSurfObjectCreate (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuModuleUnload (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemsetD2D32Async (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuStreamWaitEvent (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuCtxSynchronize (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuModuleLoadDataEx (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuDeviceGetAttribute (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemPrefetchAsync (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuDeviceGetCount (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemcpyHtoD_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemcpyAsync (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuFuncGetAttribute (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuInit (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuEventSynchronize (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuArrayDestroy (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuEventRecord (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuCtxCreate_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemcpy3DAsync_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuModuleGetFunction (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuStreamSynchronize (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuCtxPushCurrent_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuLinkAddData_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuDevicePrimaryCtxRetain (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuPointerGetAttribute (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemsetD32Async (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemsetD16Async (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuMemcpy2D_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuStreamDestroy_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuEventDestroy_v2 (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
undefined symbol: cuDevicePrimaryCtxRelease (/opt/intel-clang/12.0.0/x86_64-centos8/lib/libpi_cuda.so)
[bash][atlas]:build >
(In this case this is actually happening inside of a Docker image and not with an installation on a network file-system, but I was also thinking about installing the compiler on a network drive.)
The issue is clearly that in this setup libsycl.so
always needs libpi_cuda.so
. But wouldn't it be possible to make it use libpi_cuda.so
more as a "plugin"? So that if it can't be loaded/used (for whatever reason), it would still be able to compile code at least for platforms that don't need libpi_cuda.so
.
I guess at that point one would better handle every backend in the same way. So that if the OpenCL based backend has issues, but the CUDA based one does not, the compiler could still function... (Even though the OpenCL based one is less of an issue, as you can install libOpenCL.so
without a compatible piece of hardware very easily.)
What do you think? 🤔
Cheers,
Attila