-
Notifications
You must be signed in to change notification settings - Fork 13.2k
ci : add AMD runners and workflows #16249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thats certainly much much slower than this gpu should be. |
Yes, I think the GPU virtualization that these VMs use is massively degrading the performance. Either that, or I misconfigured something. Open to suggestions/opinions if having these runners would be useful. On one hand I guess it's better than nothing. On the other hand, 50 minutes per workflow will likely result in infinite queue of jobs. In any case, this is the best I can do using Azure cloud. If people have ideas how to provision AMD hardware in an alternative way - open to suggestions. |
amd previously offered us time on mi300 machines on digital ocean (https://www.amd.com/en/developer/resources/cloud-access/amd-developer-cloud.html) in our collaboration meeting, maybe they can spare the container hours for CI. I can attest that these containers are fast. |
Yeah this isn't right. I skimmed through the install guide and it looks like that it tells you to install the proprietary AMD driver using I'd expect the virtualization to have some effect but this is ridiculously slow. Another thing you can do is check the gpu utilization to see if it's actually being used properly. |
This might be an possiblity if you have a colo or office space for physical machines (does ggml even have an office?). Lemonade uses llama.cpp as their backend and they might be willing to provide us with support. |
@netrunnereve While running perplexity with the ROCm build with the 270M Gemma, the ggml@ggml-7-x86-amd-v710:~$ amd-smi monitor
GPU POWER GPU_T MEM_T GFX_CLK GFX% MEM% ENC% DEC% VRAM_USAGE
0 N/A N/A N/A N/A N/A N/A N/A N/A 1.9/ 4.3 GB Here are some dumps: ggml@ggml-7-x86-amd-v710:~/work/llama.cpp/build-rocm$ dpkg -l | grep mesa
ii amdgpu-multimedia 1:6.4.60403-2194681.24.04 amd64 Meta package to install mesa multimedia components.
ii libegl-mesa0:amd64 25.0.7-0ubuntu0.24.04.2 amd64 free implementation of the EGL API -- Mesa vendor library
ii libegl1-amdgpu-mesa:amd64 1:25.0.0.60403-2194681.24.04 amd64 free implementation of the EGL API -- Mesa vendor library
ii libegl1-amdgpu-mesa-drivers:amd64 1:25.0.0.60403-2194681.24.04 amd64 free implementation of the EGL API -- hardware drivers
ii libgl1-amdgpu-mesa-dri:amd64 1:25.0.0.60403-2194681.24.04 amd64 free implementation of the OpenGL API -- DRI modules
ii libgl1-amdgpu-mesa-glx:amd64 1:25.0.0.60403-2194681.24.04 amd64 free implementation of the OpenGL API -- GLX runtime
ii libgl1-mesa-dri:amd64 25.0.7-0ubuntu0.24.04.2 amd64 free implementation of the OpenGL API -- DRI modules
ii libglx-mesa0:amd64 25.0.7-0ubuntu0.24.04.2 amd64 free implementation of the OpenGL API -- GLX vendor library
ii mesa-amdgpu-libgallium:amd64 1:25.0.0.60403-2194681.24.04 amd64 shared infrastructure for Mesa drivers
ii mesa-amdgpu-va-drivers:amd64 1:25.0.0.60403-2194681.24.04 amd64 Mesa VA-API video acceleration drivers
ii mesa-amdgpu-vdpau-drivers:amd64 1:25.0.0.60403-2194681.24.04 amd64 Mesa VDPAU video acceleration drivers
ii mesa-common-dev:amd64 25.0.7-0ubuntu0.24.04.2 amd64 Developer documentation for Mesa
ii mesa-libgallium:amd64 25.0.7-0ubuntu0.24.04.2 amd64 shared infrastructure for Mesa drivers
ii mesa-va-drivers:amd64 25.0.7-0ubuntu0.24.04.2 amd64 Mesa VA-API video acceleration drivers
ii mesa-vdpau-drivers:amd64 25.0.7-0ubuntu0.24.04.2 amd64 Mesa VDPAU video acceleration drivers
ii mesa-vulkan-drivers:amd64 25.0.7-0ubuntu0.24.04.2 amd64 Mesa Vulkan graphics drivers
ggml@ggml-7-x86-amd-v710:~/work/llama.cpp/build-rocm$ modinfo amdgpu | grep version
version: 6.12.12
srcversion: AC5C22E22EEDC97831DD74B
vermagic: 6.11.0-1018-azure SMP mod_unload modversions
parm: hws_gws_support:Assume MEC2 FW supports GWS barriers (false = rely on FW version check (Default), true = force supported) (bool)
ggml@ggml-7-x86-amd-v710:~/work/llama.cpp/build-rocm$ vulkaninfo --summary
'DISPLAY' environment variable not set... skipping surface info
==========
VULKANINFO
==========
Vulkan Instance Version: 1.4.321
Instance Extensions: count = 24
-------------------------------
VK_EXT_acquire_drm_display : extension revision 1
VK_EXT_acquire_xlib_display : extension revision 1
VK_EXT_debug_report : extension revision 10
VK_EXT_debug_utils : extension revision 2
VK_EXT_direct_mode_display : extension revision 1
VK_EXT_display_surface_counter : extension revision 1
VK_EXT_headless_surface : extension revision 1
VK_EXT_surface_maintenance1 : extension revision 1
VK_EXT_swapchain_colorspace : extension revision 5
VK_KHR_device_group_creation : extension revision 1
VK_KHR_display : extension revision 23
VK_KHR_external_fence_capabilities : extension revision 1
VK_KHR_external_memory_capabilities : extension revision 1
VK_KHR_external_semaphore_capabilities : extension revision 1
VK_KHR_get_display_properties2 : extension revision 1
VK_KHR_get_physical_device_properties2 : extension revision 2
VK_KHR_get_surface_capabilities2 : extension revision 1
VK_KHR_portability_enumeration : extension revision 1
VK_KHR_surface : extension revision 25
VK_KHR_surface_protected_capabilities : extension revision 1
VK_KHR_wayland_surface : extension revision 6
VK_KHR_xcb_surface : extension revision 6
VK_KHR_xlib_surface : extension revision 6
VK_LUNARG_direct_driver_loading : extension revision 1
Instance Layers: count = 13
---------------------------
VK_LAYER_AMD_switchable_graphics_64 AMD switchable graphics layer 1.4.308 version 1
VK_LAYER_INTEL_nullhw INTEL NULL HW 1.1.73 version 1
VK_LAYER_KHRONOS_profiles Khronos Profiles layer 1.4.321 version 1
VK_LAYER_KHRONOS_shader_object Khronos Shader object layer 1.4.321 version 1
VK_LAYER_KHRONOS_synchronization2 Khronos Synchronization2 layer 1.4.321 version 1
VK_LAYER_KHRONOS_validation Khronos Validation Layer 1.4.321 version 1
VK_LAYER_LUNARG_api_dump LunarG API dump layer 1.4.321 version 2
VK_LAYER_LUNARG_crash_diagnostic Crash Diagnostic Layer is a crash/hang debugging tool that helps determines GPU progress in a Vulkan application. 1.4.321 version 1
VK_LAYER_LUNARG_gfxreconstruct GFXReconstruct Capture Layer Version 1.0.5 1.4.321 version 4194309
VK_LAYER_LUNARG_monitor Execution Monitoring Layer 1.4.321 version 1
VK_LAYER_LUNARG_screenshot LunarG image capture layer 1.4.321 version 1
VK_LAYER_MESA_device_select Linux device selection layer 1.4.303 version 1
VK_LAYER_MESA_overlay Mesa Overlay layer 1.4.303 version 1
Devices:
========
GPU0:
apiVersion = 1.4.308
driverVersion = 2.0.342
vendorID = 0x1002
deviceID = 0x7461
deviceType = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
deviceName = AMD Radeon Pro V710 MxGPU
driverID = DRIVER_ID_AMD_PROPRIETARY
driverName = AMD proprietary driver
driverInfo = (LLPC)
conformanceVersion = 1.4.0.0
deviceUUID = 02000000-0000-0000-0000-000000000000
driverUUID = 414d442d-4c49-4e55-582d-445256000000 I'm not very sure which commands to run, so if you have any specific in mind, let me know. |
Is the GPU exclusive to this vm? V710 supports 12 way partitioning, if its configured like that it may simply be loaded by the other vms. |
Yes, the 2 runners that I deployed are of type |
Looking at some of the old CI runs it looks like the v710 was doing fine then, taking less than 20 minutes per run which is around the same time the v100 machine took. I wonder if the vm got messed up or if the host it's running on has some problems. https://github.com/ggml-org/llama.cpp/actions/runs/17938107299/job/51008109600
Maybe rocm-smi doesn't work on vms, I don't know. You can also try with radeontop or amdgpu-top.
The Mesa and amdgpu versions are fine, but the vulkaninfo shows that you're on the proprietary driver and I wonder if the backend is mistakenly showing RADV. This doesn't explain why ROCM is so slow but let's deal with one thing at a time. Personally I would just get rid of the amdgpu-install stuff and first try the default Ubuntu driver packages, but if you want to use amdgpu-install then remove it with |
The gtt usage is strange as it shouldn't be using that much on such a small model.
Try memtest-vulkan, it'll give you an idea of what your memory bandwidth is. It won't hit the memory bandwidth limit but the write test gets within 75% of it on my card. |
Here are the results from
|
Wow that's some atrocious memory bandwidth which explains the slow runs. I'm pretty sure either the host or GPU is broken. |
Yeah, something is wrong. I tried redeploying the instances multiple times on different operating systems - always the same result.
$ amd-smi monitor
GPU POWER GPU_T MEM_T GFX_CLK GFX% MEM% ENC% DEC% VRAM_USAGE
0 N/A N/A N/A N/A N/A N/A N/A N/A 0.2/ 4.3 GB I'm out of ideas. If you think of something to try let me know. Otherwise will probably retry in a few months. I can also open SSH access on a fresh VM if you or someone else wants to give this a try. |
I'm out of ideas too, you've pretty much tried what I would've done myself. |
f48d3f3
to
c355b35
Compare
c355b35
to
498888b
Compare
Apart from being massively slow, the workflows seem to work fine: https://github.com/ggml-org/llama.cpp/actions/runs/18087388576 This PR enables the AMD runs only for commits on |
I deployed 2x runners with AMD V710 GPUs to run CI workflows. However, they are extremely slow. Here are some benches for gemma 3 270M:
./bin/llama-bench -m ~/.cache/llama.cpp/ggml-org_gemma-3-270m-GGUF_gemma-3-270m-Q8_0.gguf -n 32
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Pro V710 MxGPU (RADV NAVI32) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
build: aa3ee0e (6582)
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Pro V710 MxGPU, gfx1101 (0x1101), VMM: no, Wave Size: 32
build: aa3ee0e (6582)
Does anyone know if this is expected? I installed ROCm driver per the following instructions:
https://learn.microsoft.com/en-us/azure/virtual-machines/linux/azure-n-series-amd-gpu-driver-linux-installation-guide
Is there some extra configuration needed to make AMD run faster? Currently, the computation using GPU (either with ROCm/HIP or Vulkan) is multiple times slower compared to CPU-only which does not seem normal. So I guess I have misconfigured something, but not sure what.
cc @IMbackK @netrunnereve