-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Allow configuration template to disable some SIMD. #20215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@microsoft-github-policy-service agree company="Ubisoft" |
|
||
#if defined(MLAS_TARGET_AMD64) | ||
|
||
#if !defined(ORT_DISABLE_AVX2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do these macros come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to follow the way enabling/disabling certain things was done in the code (for instance: ORT_ENABLE_STREAM). Then the macros should be defined with a '-D CMAKE_CXX_FLAGS=ORT_DISABLE_AVX2'.
Would there be a preferred way of doing this? Should it be done through cmake option() ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be preferable to make them CMake options. They should be documented too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? I don't want to overcrowd the cmake options. That would be 5 more options.
If you're positive, I will add the options and documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they should be documented. it's fine to add additional CMake options. these are additional build configuration options so it seems like a reasonable place for them.
The instruction set will be chosen dynamically at runtime. What is the reason to add those macros? Could you please explain more about it? |
Some SDKs just don't have the include files needed for the intrinsic. So making it optional would allow more platforms to be able to use the library. |
Which SDKs don't have the intrinsic files? |
There is one of our gaming platform that links inclusion of the files containing the symbols for the intrinsics (__m512 for instance) with the compiler actually generating code with the instruction set (-mavx512 for instance). |
Have you had the chance to review with my last comments? |
Bump. Just to make sure we don't forget about this one. |
Any news on this? @yufenglee ? |
/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline |
/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline |
/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline |
Azure Pipelines successfully started running 2 pipeline(s). |
Azure Pipelines successfully started running 9 pipeline(s). |
Azure Pipelines successfully started running 10 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/azp run Linux Android Emulator QNN CI Pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run Python format |
No pipelines are associated with this pull request. |
@jslap-ubi, you need to sync to latest main branch to trigger the Python format CI. |
2f67b8f
to
46f8996
Compare
/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline |
/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline |
Azure Pipelines successfully started running 9 pipeline(s). |
Azure Pipelines successfully started running 10 pipeline(s). |
/azp run Linux Android Emulator QNN CI Pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
Thanks for the approval @yufenglee , what is the next step for this? |
Is there anything else I need to do to go forward with the merge? @yufenglee |
46f8996
to
d0aada7
Compare
I just rebase on main since there was now a file conflicting. @yufenglee : I think we need to renew the approval. Also, What would then be missing for having this MR merged? |
d0aada7
to
7ca7306
Compare
The MR has been rebased, it would be ready to merge. @yufenglee @yihonglyu : Could it be possible to re-approve this? |
|
||
#if defined(MLAS_TARGET_AMD64) | ||
|
||
#if !defined(ORT_DISABLE_AVX2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be preferable to make them CMake options. They should be documented too.
} | ||
|
||
#ifndef __APPLE__ | ||
#if !defined(__APPLE__) && !defined(ORT_DISABLE_AMX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This !defined(ORT_DISABLE_AMX)
block also contains this AVX kernel assignment.
Why is this guarded by the ORT_DISABLE_AMX
macro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I think it happened when I rebased. I will fix it.
Do you want to skip compilation of certain optimized kernels? It looks like this change only prevents them from being selected at runtime. Would you mind elaborating on your use case? |
On some gaming platform, compiling the kernels that target those architecture will just fail bacause the intrinsics are not present at all. So we need to skip the compilation of these specific kernel, and to skip where these function pointers are used. |
I see. Does this change work as-is? I would expect that you'd also need to disable the actual compilation of the kernel source files elsewhere, like in onnxruntime_mlas.cmake. |
Yes, the actual kernel is located in the file qgemm_kernel_amx.cpp. (we do exclude this file from the build on our side), So I think the exclusion of the files should be integrated in the cmake files, right? |
Yes, that seems reasonable. |
7908f8b
to
4568d96
Compare
I did the fixes in the cmake files, and I rebased. |
cmake/onnxruntime_mlas.cmake
Outdated
set_source_files_properties(${mlas_platform_srcs_sse2} PROPERTIES COMPILE_FLAGS "-msse2") | ||
|
||
set(mlas_platform_srcs_sse41 | ||
${MLAS_SRC_DIR}/qgemm_kernel_sse41.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we adding this file here?
FYI, there's a comment saying it's only for Windows:
onnxruntime/onnxruntime/core/mlas/lib/qgemm_kernel_sse41.cpp
Lines 20 to 21 in 1c9a388
// N.B. MSVC does not require turning on SSE 4.1 intrinsics and the current use | |
// for this code is Windows only, so restrict this kernel to that environment. |
and also this ifdef checking for _MSC_VER
, which as far as I know implies Windows:
onnxruntime/onnxruntime/core/mlas/lib/platform.cpp
Lines 316 to 329 in 1c9a388
#if defined(_MSC_VER) | |
// | |
// Check if the processor supports SSE 4.1 instructions. | |
// | |
#ifndef FORCE_GENERIC_ALGORITHMS | |
if ((Cpuid1[2] & 0x80000) != 0) { | |
#else // FORCE_GENERIC_ALGORITHMS | |
if (false) { | |
#endif // FORCE_GENERIC_ALGORITHMS | |
this->GemmU8S8Dispatch = &MlasGemmU8S8DispatchSse41; | |
} | |
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flag is necessary for compiling with clang on windows. clang on windows defines _MSC_VER but won't have the necessary sse41 symbols for the gemm op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see, I didn't realize you all were using Clang on Windows. I'm not sure how much we've tested with that.
one question - would a check with check_cxx_compiler_flag (e.g., check_cxx_compiler_flag("-msse4.1" HAS_INTEL_SSE_4_1)
) be sufficient to determine whether the file can be compiled in your particular case? that could be a more automatic way to exclude files requiring specific instruction support which may be unavailable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this particular change would also enable compilation of qgemm_kernel_sse41.cpp on non-Windows platforms. that may be fine, but it would be good to verify that first. for this PR, it may be simpler to only include qgemm_kernel_sse41.cpp in setup_mlas_source_for_windows()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking the compiler flag might work, but I agree it's probably simpler for this PR just to limit the inclusion to the setup for windows -- that's what I've done in the most recent version.
) | ||
|
||
if (NOT onnxruntime_ORT_MINIMAL_BUILD) | ||
if (NOT onnxruntime_DISABLE_SSE4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not the only place which would need to be modified. see also setup_mlas_source_for_windows()
:
onnxruntime/cmake/onnxruntime_mlas.cmake
Line 71 in 1c9a388
function(setup_mlas_source_for_windows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I've updated all locations where these checks need to be done.
cmake/CMakeLists.txt
Outdated
endif() | ||
|
||
if (onnxruntime_DISABLE_SSE4) | ||
add_compile_definitions(DISABLE_SSE4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we limit the definition's scope to the onnxruntime_mlas target?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
cmake/CMakeLists.txt
Outdated
endif() | ||
|
||
if (onnxruntime_DISABLE_AVX) | ||
add_compile_definitions(DISABLE_AVX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these defines don't match up with the macros in the code. e.g., ORT_DISABLE_AVX
. please verify that they work as expected. we don't have CI builds which test these options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Sorry. I tested it this morning. Will fix it.
Hi! Just a heads up, I will be taking over for @jslap-ubi on the PR. I'm looking into your comments and getting caught up, I hope to address them early next week. |
cmake/onnxruntime_mlas.cmake
Outdated
# Usually, SIMD instructions in kernel compile fine, and we detect at run-time if they are supported | ||
# But on some platforms, even compiling these SIMD specific instructions could fail. | ||
# So we provide options to disable compiling these SIMD instructions. | ||
option(onnxruntime_DISABLE_SSE4 "Disable compiling kernel with SSE4 instructions" OFF) | ||
option(onnxruntime_DISABLE_AVX "Disable compiling kernel with AVX instructions" OFF) | ||
option(onnxruntime_DISABLE_AVX2 "Disable compiling kernel with AVX2 instructions" OFF) | ||
option(onnxruntime_DISABLE_AVX512 "Disable compiling kernel with AVX512 instructions" OFF) | ||
option(onnxruntime_DISABLE_AMX "Disable compiling kernel with AMX instructions" OFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we keep these option
definitions next to the others in cmake/CMakeLists.txt
?
cmake/onnxruntime_mlas.cmake
Outdated
option(onnxruntime_DISABLE_AMX "Disable compiling kernel with AMX instructions" OFF) | ||
|
||
if (onnxruntime_DISABLE_SSE4) | ||
add_compile_definitions(ORT_DISABLE_SSE4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please limit the definitions to the onnxruntime_mlas
target, e.g., with target_compile_definitions(onnxruntime_mlas PRIVATE ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
if (NOT onnxruntime_DISABLE_SSE4) | ||
target_sources(onnxruntime_mlas PRIVATE | ||
${mlas_platform_srcs_sse41} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will mlas_platform_srcs_sse41
have been set to anything here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it wouldn't have been set to anything. Fixed.
cmake/onnxruntime_mlas.cmake
Outdated
set_source_files_properties(${mlas_platform_srcs_sse2} PROPERTIES COMPILE_FLAGS "-msse2") | ||
|
||
set(mlas_platform_srcs_sse41 | ||
${MLAS_SRC_DIR}/qgemm_kernel_sse41.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see, I didn't realize you all were using Clang on Windows. I'm not sure how much we've tested with that.
one question - would a check with check_cxx_compiler_flag (e.g., check_cxx_compiler_flag("-msse4.1" HAS_INTEL_SSE_4_1)
) be sufficient to determine whether the file can be compiled in your particular case? that could be a more automatic way to exclude files requiring specific instruction support which may be unavailable.
cmake/onnxruntime_mlas.cmake
Outdated
set_source_files_properties(${mlas_platform_srcs_sse2} PROPERTIES COMPILE_FLAGS "-msse2") | ||
|
||
set(mlas_platform_srcs_sse41 | ||
${MLAS_SRC_DIR}/qgemm_kernel_sse41.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this particular change would also enable compilation of qgemm_kernel_sse41.cpp on non-Windows platforms. that may be fine, but it would be good to verify that first. for this PR, it may be simpler to only include qgemm_kernel_sse41.cpp in setup_mlas_source_for_windows()
.
#endif // !defined(ORT_DISABLE_AVX2) | ||
|
||
#endif // MLAS_TARGET_AMD64 | ||
#endif // !defined(ORT_DISABLE_AVX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently there is a hierarchy of the code disabled by these macros. e.g., ORT_DISABLE_AVX -> ORT_DISABLE_AVX2 -> ORT_DISABLE_AVX512, where if ORT_DISABLE_AVX
is defined, the code controlled by ORT_DISABLE_AVX512
will also be disabled.
this hierarchy is not reflected in the CMake logic that specifies what files are compiled. could it be made consistent?
e.g., flatten the ifdefs here so that they only surround the assignments of kernels that require the corresponding instruction support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Also, only add the qgemm sse4 file when on windows.
07bc7cf
to
74adb0e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you elaborate a bit more on your use case(s)? what compilers are you using and what errors do you encounter?
I wonder if we can improve our automatic configuration instead of needing to add options to disable things manually. but it's not yet obvious to me what we're trying to fix here.
cmake/onnxruntime_mlas.cmake
Outdated
set(mlas_platform_srcs_sse41 | ||
${MLAS_SRC_DIR}/qgemm_kernel_sse41.cpp | ||
) | ||
set_source_files_properties(${mlas_platform_srcs_sse41} PROPERTIES COMPILE_FLAGS "-msse4.1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-msse4.1
won't work for MSVC, right? that is the compiler for which setup_mlas_source_for_windows()
will be called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed this, you're right this doesn't make sense for MSVC.
|
||
if (((Cpuid1[2] & 0x1000) != 0) && ((Cpuid7[1] & 0x20) != 0)) { | ||
|
||
#if !defined(ORT_DISABLE_AVX2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this ifdef block also includes FMA3 kernels. should those also be controlled by the macro to disable AVX2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I've re-grouped the FMA3 kernels together and moved them outside the block.
I apologies for the lack of clarity (and delay, I was on holiday). We have a case where we need to compile the project using clang on windows with only SSE2 instructions and not SSE4 (this is to support older machines). |
Description
Add compile flags to disable some SIMD optimizations
Motivation and Context
There is a mechanism to detect at runtime if some SIMD optimizations are available, and skip them if it is not the case, But on some platform, some compiler SIMD optimization are not available, even at the compilation step.