Skip to content

Conversation

anmyachev
Copy link
Contributor

@anmyachev anmyachev commented Sep 12, 2025

zeKernelCreate returns the error ZE_RESULT_ERROR_INVALID_KERNEL_NAME, but in fact this is just a consequence of compilation problems in the function zeModuleCreate, which returns ZE_RESULT_SUCCESS but in the resulting module there are actually no kernels (we can check it with printModuleKernelName).

In the logs from zeModuleCreate one can see messages of the following nature: we couldn't compile without exceeding max permitted PTSS, drop SIMD(started to appear after intel/intel-graphics-compiler@2a9efc9). I suppose in that case we could throw an exception that Triton Autotuner can handle:

class OutOfResources(TritonError):

The main problem is that if a compiled module gets cache hit in IGC, it does not generate any more messages like we couldn't compile without exceeding max permitted PTSS, drop SIMD (which can be bypassed using NEO_CACHE_PERSISTENT=0 env var, but obviously we can't rely on it in production).

In situations where the kernel is simply launched (without autotuning) - we could get more obvious error messages:

image

This change is inspired by #4838 (comment), namely the situation when with a certain set of parameters (for example with num_warps=2) the autotuner breaks, but num_warps=4 or num_warps=8 could work correctly.

I would like to hear from reviewers the following:

  • do you think this change is useful?
  • any ideas on what can be done with the IGC cache to get the same result over multiple runs?

More links:

anmyachev and others added 5 commits September 11, 2025 16:28
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
Copy link
Contributor

@kurapov-peter kurapov-peter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO module creation should just fail in that case. In other words, there shouldn't be a success status if the compilation failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants