Skip to content

libnvidia-ml.so -> libnvidia-ml.so.1 symlink not created in containers #292

@brinnjoyce

Description

@brinnjoyce

We're running into an issue running builds and integration tests in containers on kubernetes using Nvidia Operator:

/usr/bin/ld: cannot find -lnvidia-ml: No such file or directory

This error can be reproduced with gcc -lnvidia-ml

On investigation it's because libnvidia-container is making the nvidia libraries and drivers available in the container, but not creating the libnvidia-ml.so -> libnvidia-ml.so.1 symlink. e.g.:

shared_ci_bot@runner-nagp1soyw-project-9373-concurrent-0-pyktetzc:/$ ls -la /usr/lib/x86_64-linux-gnu/libnvidia-ml*
lrwxrwxrwx 1 root root      26 Jan 17 12:27 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 -> libnvidia-ml.so.550.127.05
-rwxr-xr-x 1 root root 2078360 Jan 16 12:37 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.127.05

Creating the symlink manually resolves the issue.

I see in https://github.com/NVIDIA/libnvidia-container/blob/main/src/nvc_mount.c there is a workaround to create symlinks for libcuda.so and a few others. Can the same be done for libnvidia-ml.so ?

See https://docs.nvidia.com/deploy/pdf/NVML_API_Reference_Guide.pdf Chapter 1 Page 2 for reference that it should be linked this way:

On Linux the NVML library is named "libnvidia-ml.so" and can be found on the standard library path. To link against the NVML library add the -lnvidia-ml flag to your linker command.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions