Skip to content

Conversation

r-devulap
Copy link
Member

This pull request improves cross-platform support and CPU feature detection for the x86 SIMD sort library, primarily targeting compatibility with Microsoft's Visual C++ (MSVC) compiler. The changes ensure that both build configuration and runtime CPU feature checks work correctly on MSVC, while maintaining compatibility with other compilers.

Build system compatibility:

  • Updated lib/meson.build to use MSVC-specific compiler flags (/arch:AVX2 and /arch:AVX512) instead of the GCC/Clang flags (-march=...), ensuring correct SIMD instruction set selection on Windows/MSVC. [1] [2] [3] [4]

CPU feature detection abstraction:

  • Added new header lib/x86simdsortcpuid.h to abstract CPU feature detection, using MSVC intrinsics for runtime checks and caching results, while falling back to __builtin_cpu_supports for other compilers.
  • Replaced direct calls to __builtin_cpu_supports and __builtin_cpu_init in lib/x86simdsort.cpp with the new xss_cpu_supports and xss_cpu_init functions/macros, ensuring consistent feature detection across platforms. [1] [2] [3]

Symbol visibility and constructor attribute compatibility:

  • Defined XSS_EXPORT_SYMBOL, XSS_HIDE_SYMBOL, and XSS_ATTRIBUTE_CONSTRUCTOR macros in lib/x86simdsort.h and lib/x86simdsort.cpp to handle symbol visibility and static initialization attributes for both MSVC and non-MSVC compilers. [1] [2]

Code style and correctness:

  • Improved static assertions and formatting in object_qsort for clarity and correctness.

Miscellaneous:

  • Added a blank line at the top of meson.build for style consistency.

@fabiocannizzo
Copy link

fabiocannizzo commented Sep 14, 2025

Using as HEAD: cfe57c4

I fix a few typos:

-#if defined(__INTEL_COMPILER) and !defined(__SANITIZE_ADDRESS__)
+#if defined(__INTEL_COMPILER) && !defined(__SANITIZE_ADDRESS__)
-#elif __GNUC__ >= 8 and !defined(__SANITIZE_ADDRESS__)
+#elif __GNUC__ >= 8 && !defined(__SANITIZE_ADDRESS__)

Then I open the x64 Native Tools Command Prompt for VS2022 and run the following commands from the lib folder:

cl /nologo /c /arch:AVX2 /I../src /std:c++17 x86simdsort-avx2.cpp
cl /nologo /c /arch:AVX512 /I../src /std:c++17 x86simdsort-skx.cpp
cl /nologo /c /arch:AVX512 /I../src /std:c++17 x86simdsort-icl.cpp
cl /nologo /c /arch:AVX512 /I../src /std:c++17 x86simdsort-spr.cpp

The first 2 compile, but the last 2 do not. See compilation logs attached:
x86simdsort-icl.cpp.log
x86simdsort-spr.cpp.log

This is the reason why I fell back to clang with mingw and introduced the C-api in #211.

The cl compiler version I use is: Microsoft (R) C/C++ Optimizing Compiler Version 19.44.35211 for x64

here you mention you could build the library with MSVC. What compiler flags and compiler version do you use?

Hope this helps.

@r-devulap
Copy link
Member Author

cl /nologo /c /arch:AVX512 /I../src /std:c++17 x86simdsort-icl.cpp

4dc9609 fixes this error.

cl /nologo /c /arch:AVX512 /I../src /std:c++17 x86simdsort-spr.cpp

This is specific to _Float16 dispatch, if MSVC doesn't support it, then nothing much we can do about it. I would recommend using meson to build the library though, _Float16 is an optional requirement, meson build will auto detect if your compiler supports it and skips it if it doesn't.

#211 (comment) you mention you could build the library with MSVC. What compiler flags and compiler version do you use?

This is what worked for me:

meson setup  --warnlevel 2 --buildtype release builddir
cd builddir
ninja

@r-devulap
Copy link
Member Author

@fabiocannizzo c1994cc adds a CI job to build with MSVC and that runs successfully. There are a few warnings that we can work on fixing (see #222) but I don't think that's gating the build itself. Does this address your concerns of building with MSVC?

@r-devulap r-devulap requested a review from Copilot September 17, 2025 03:41
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds Microsoft Visual C++ (MSVC) compiler support for the x86 SIMD sort library on Windows platforms. The changes enable cross-platform compatibility by providing MSVC-specific compiler flags and CPU feature detection while maintaining compatibility with existing GCC/Clang builds.

  • Introduced MSVC-specific compiler flags (/arch:AVX2, /arch:AVX512) to replace GCC/Clang march flags
  • Added cross-platform CPU feature detection abstraction using MSVC intrinsics or GCC builtins
  • Updated build system to conditionally apply compiler-specific configurations

Reviewed Changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
meson.build Added compiler argument validation for GCC/Clang and formatting improvements
lib/x86simdsortcpuid.h New header providing cross-platform CPU feature detection abstraction
lib/x86simdsort.h Added MSVC-compatible symbol visibility macros and improved code formatting
lib/x86simdsort.cpp Updated to use abstracted CPU detection functions and MSVC-compatible constructor attributes
lib/x86simdsort-icl.cpp Added MSVC-specific include guard for AVX512 headers
lib/meson.build Replaced conditional compilation with MSVC/GCC compiler flag selection
.github/workflows/c-cpp.yml Added Windows MSVC CI build job and simplified existing test configurations

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@fabiocannizzo
Copy link

fabiocannizzo commented Sep 17, 2025

@r-devulap : thank you for this work.
I can build with meson using c1994cc
but I cannot compile an executable linking with the import lib of the DLL.
dumpbin /exports x86simdsort.lib
shows no C++ exported symbols.

@r-devulap
Copy link
Member Author

but I cannot compile an executable linking with the import lib of the DLL.
dumpbin /exports x86simdsort.lib
shows no C++ exported symbols.

oh? not sure why. MSVC isn’t my usual environment, but I’ll try to replicate that.

@fabiocannizzo
Copy link

fabiocannizzo commented Sep 24, 2025

oh? not sure why.

Just a guess here: _declspec(dllexport) is explicitly needed for each template specialization to be exported.

if we have

template <class T> void foo() {...}

When we force specialisation for int we need:

template void _declspec(dllexport) foo<int>();

@r-devulap
Copy link
Member Author

Just a guess here: _declspec(dllexport) is explicitly needed for each template specialization to be exported.

@fabiocannizzo that was it. Let me know if it works now.

@fabiocannizzo
Copy link

fabiocannizzo commented Sep 26, 2025

@fabiocannizzo that was it. Let me know if it works now.

This is making progress. Now I can see exported symbols from the import library and the DLL. I can also create a simple executable linking with the library. However, when I try and run it, it segfaults.

int x[]= {4,3,2,1};
x86simdsort::qsort(x, 4, false, false); // this segfaults

With the debugger I can see it jumps to some invalid memory address when entering the function.

The problem is that the dynamic pointers are not initialized. For example, when the function below is invoked, the pointer internal_qsortint32_t has not been initialized and it is still nullptr. This means the function resolve_qsortint32_t is not called when the DLL is loaded.

static void (*internal_qsortint32_t)(int32_t *, size_t, bool, bool) = 0;
template <>
void __declspec(dllexport)
qsort(int32_t *arr, size_t arrsize, bool hasnan, bool descending)
{
    (*internal_qsortint32_t)(arr, arrsize, hasnan, descending);
}

It is not clear to me what is the mechanism which trigger execution of resolve_qsortint32_t in the Linux case. Anyway, to fix it, there are various options.

  1. Add a dllMain function in x86simdsort.cpp, and explicitly invoke the resolve_* functions.
#include <windows.h>
BOOL APIENTRY DllMain( HMODULE hModule, DWORD  ul_reason_for_call, LPVOID lpReserved)
{
    switch (ul_reason_for_call)
    {
        case DLL_PROCESS_ATTACH:
            // invoke here all resolvers
            resolve_qsortint32_t
            // ...
            break;

        default:
            break;
    }
    return TRUE; // Return TRUE for successful initialization, FALSE otherwise (especially for DLL_PROCESS_ATTACH)
}
  1. Add global scope initializers (in my opinion this is the simplest, given the structure of the library).
static void resolve_qsortint32_t(void) { ... };
struct Init_qsortint32_t
{
    Init_qsortint32_t() { resolve_qsortint32_t(); }
} init_qsortint32_t; // constructor for this global instance is called at startup.

Appending the following to the dispatch macros solve the issue (see patch.txt):

#define DISPATCH(func, TYPE, ISA) \
    DECLARE_INTERNAL_##func(TYPE) static XSS_ATTRIBUTE_CONSTRUCTOR void CAT( \
            CAT(resolve_, func), TYPE)(void) \
    { \
        ...
    } \
    struct CAT(CAT(Init_, func), TYPE) { \
        CAT(CAT(Init_, func), TYPE)() { CAT(CAT(resolve_, func), TYPE)(); } \
    } CAT(CAT(initializer_, func), TYPE);
   #define DISPATCH_KV_FUNC(func, TYPE1, TYPE2, ISA) \
    static XSS_ATTRIBUTE_CONSTRUCTOR void CAT( \
            CAT(CAT(CAT(resolve_, func), _), TYPE1), TYPE2)(void) \
    { \
         ...
    } \
    struct CAT(CAT(CAT(CAT(Init_, func), _), TYPE1), TYPE2) { \
        CAT(CAT(CAT(CAT(Init_, func), _), TYPE1), TYPE2)() \
        { CAT(CAT(CAT(CAT(resolve_, func), _), TYPE1), TYPE2)();} \
    } CAT(CAT(CAT(CAT(initializer_, func), _), TYPE1), TYPE2);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants