Skip to content

Conversation

ai-fonsi
Copy link

@ai-fonsi ai-fonsi commented Sep 28, 2025

"Integrated" CUDA devices seem to be bugged and produce incorrect output in specific cases. Since disabling the integrated flag seems to neither affect performance nor memory usage on Jetson, I propose disabling the option until the underlying issue is fixed.

Fixes #15034 and probably also #15923.

@ai-fonsi ai-fonsi requested a review from slaren as a code owner September 28, 2025 15:07
@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Sep 28, 2025
@slaren
Copy link
Member

slaren commented Sep 28, 2025

This is probably the same synchronization issue with the scheduler that @ggerganov found when making the Metal backend async. To confirm this, can you verify if it works (without this change) by launching with the env variable CUDA_LAUNCH_BLOCKING=1 (effectively disabling async compute)?

@ai-fonsi
Copy link
Author

I tried starting llama-server with CUDA_LAUNCH_BLOCKING=1, it didn't fix the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Eval bug: Broken/no Gemma 3n output on CUDA (Nvidia Jetson Orin Nano)
2 participants