diff --git a/README.md b/README.md index a28838a9..b56769f7 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,55 @@ # x86-simd-sort C++ header file library for SIMD based 16-bit, 32-bit and 64-bit data type -sorting on x86 processors. Source header files are available in src directory. -We currently only have AVX-512 based implementation of quicksort. This -repository also includes a test suite which can be built and run to test the -sorting algorithms for correctness. It also has benchmarking code to compare -its performance relative to std::sort. +sorting algorithms on x86 processors. Source header files are available in src +directory. We currently only have AVX-512 based implementation of quicksort, +argsort, quickselect, paritalsort and key-value sort. This repository also +includes a test suite which can be built and run to test the sorting algorithms +for correctness. It also has benchmarking code to compare its performance +relative to std::sort. The following API's are currently supported: + +#### Quicksort + +``` +void avx512_qsort(T* arr, int64_t arrsize) +``` +Supported datatypes: `uint16_t, int16_t, _Float16, uint32_t, int32_t, float, +uint64_t, int64_t and double` + +#### Argsort + +``` +std::vector arg = avx512_argsort(T* arr, int64_t arrsize) +void avx512_argsort(T* arr, int64_t *arg, int64_t arrsize) +``` +Supported datatypes: `uint32_t, int32_t, float, uint64_t, int64_t and double`. +The algorithm resorts to scalar `std::sort` if the array contains NAN. + +#### Quickselect + +``` +void avx512_qselect(T* arr, int64_t arrsize) +void avx512_qselect(T* arr, int64_t arrsize, bool hasnan) +``` +Supported datatypes: `uint16_t, int16_t, _Float16 ,uint32_t, int32_t, float, +uint64_t, int64_t and double`. Use an additional optional argument `bool +hasnan` if you expect your arrays to contain nan. + +#### Partialsort + +``` +void avx512_partialsort(T* arr, int64_t arrsize) +void avx512_partialsort(T* arr, int64_t arrsize, bool hasnan) +``` +Supported datatypes: `uint16_t, int16_t, _Float16 ,uint32_t, int32_t, float, +uint64_t, int64_t and double`. Use an additional optional argument `bool +hasnan` if you expect your arrays to contain nan. + +#### Key-value sort +``` +void avx512_qsort_kv(T* key, uint64_t* value , int64_t arrsize) +``` +Supported datatypes: `uint64_t, int64_t and double` ## Algorithm details @@ -20,13 +64,14 @@ network. The core implementations of the vectorized qsort functions `avx512_qsort(T*, int64_t)` are modified versions of avx2 quicksort presented in the paper [2] and source code associated with that paper [3]. -## Handling NAN in float and double arrays +## A note on NAN in float and double arrays If you expect your array to contain NANs, please be aware that the these -routines **do not preserve your NANs as you pass them**. The -`avx512_qsort()` routine will put all your NAN's at the end of the sorted -array and replace them with `std::nan("1")`. Please take a look at -`avx512_qsort()` and `avx512_qsort()` functions for details. +routines **do not preserve your NANs as you pass them**. The quicksort, +quickselect, partialsort and key-value sorting routines will sort NAN's to the +end of the array and replace them with `std::nan("1")`. `avx512_argsort` +routines will also resort to a scalar argsort that uses `std::sort` to sort array +that contains NAN. ## Example to include and build this in a C++ code @@ -36,7 +81,7 @@ array and replace them with `std::nan("1")`. Please take a look at #include "src/avx512-32bit-qsort.hpp" int main() { - const int ARRSIZE = 10; + const int ARRSIZE = 1000; std::vector arr; /* Initialize elements is reverse order */ @@ -45,7 +90,7 @@ int main() { } /* call avx512 quicksort */ - avx512_qsort(arr.data(), ARRSIZE); + avx512_qsort(arr.data(), ARRSIZE); return 0; } @@ -54,7 +99,7 @@ int main() { ### Build using gcc ``` -gcc main.cpp -mavx512f -mavx512dq -O3 +g++ main.cpp -mavx512f -mavx512dq -O3 ``` This is a header file only library and we do not provide any compile time and @@ -75,9 +120,24 @@ compiler to build. gcc >= 8.x ``` +### Build using Meson + +meson is the recommended build system to build the test and benchmark suite. + +``` +meson setup builddir && cd builddir && ninja +``` + +It build two executables: + +- `testexe`: runs a bunch of tests written in ./tests directory. +- `benchexe`: measures performance of these algorithms for various data types. + + ### Build using Make -`make` command builds two executables: +Makefile uses `-march=sapphirerapids` as a global compile flag and hence it +will require g++-12. `make` command builds two executables: - `testexe`: runs a bunch of tests written in ./tests directory. - `benchexe`: measures performance of these algorithms for various data types and compares them to std::sort. @@ -85,15 +145,6 @@ gcc >= 8.x You can use `make test` and `make bench` to build just the `testexe` and `benchexe` respectively. -### Build using Meson - -You can also build `testexe` and `benchexe` using Meson/Ninja with the following -command: - -``` -meson setup builddir && cd builddir && ninja -``` - ## Requirements and dependencies The sorting routines relies only on the C++ Standard Library and requires a @@ -101,7 +152,8 @@ relatively modern compiler to build (gcc 8.x and above). Since they use the AVX-512 instruction set, they can only run on processors that have AVX-512. Specifically, the 32-bit and 64-bit require AVX-512F and AVX-512DQ instruction set. The 16-bit sorting requires the AVX-512F, AVX-512BW and AVX-512 VMBI2 -instruction set. The test suite is written using the Google test framework. +instruction set. The test suite is written using the Google test framework. The +benchmark is written using the google benchmark framework. ## References