Skip to content

[Pytorch upstream] Feature request: Save SPIR-V Build flag to CompiledKernel metadata for Inductor. #5153

@etaf

Description

@etaf

Describe the bug

Hi team,

We received an end-to-end performance issue report from Llama3.1 users. They observed a performance drop when using the Inductor C++ wrapper (AOTInductor) compared to the Python wrapper.

The root cause is that, in the C++ wrapper, Inductor needs to launch the kernel directly (since Triton is not required in AOTInductor deploy mode). When launching the SPIR-V kernel compiled by Triton, a build_flag is required for the Level Zero API zeModuleCreate to indicate whether large GRF mode is enabled. However, Inductor currently has no visibility into this flag, which is determined by Triton. As a result, Inductor does not pass the correct build_flag to L0, leading to a different binary kernel than the one Triton would build.

To address this, I suggest that Triton store the build_flag in the metadata of the CompiledKernel object returned by tl.compile(). This way, Inductor can retrieve and propagate the correct flag.

This is a critical performance issue for AOTInductor users and others relying on the C++ wrapper to reduce host overhead. We would greatly appreciate it if this feature request could be included in PyTorch 2.10.

Thanks

Environment details

None

Metadata

Metadata

Assignees

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions