From e511eff719b6253b5bf613280859728d73a107f0 Mon Sep 17 00:00:00 2001 From: Raghuveer Devulapalli Date: Mon, 27 Mar 2023 11:16:15 -0700 Subject: [PATCH 1/2] Update README --- README.md | 98 ++++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 75 insertions(+), 23 deletions(-) diff --git a/README.md b/README.md index a28838a9..ed810375 100644 --- a/README.md +++ b/README.md @@ -1,11 +1,55 @@ # x86-simd-sort C++ header file library for SIMD based 16-bit, 32-bit and 64-bit data type -sorting on x86 processors. Source header files are available in src directory. -We currently only have AVX-512 based implementation of quicksort. This -repository also includes a test suite which can be built and run to test the -sorting algorithms for correctness. It also has benchmarking code to compare -its performance relative to std::sort. +sorting algorithms on x86 processors. Source header files are available in src +directory. We currently only have AVX-512 based implementation of quicksort, +argsort, quickselect, paritalsort and key-value sort. This repository also +includes a test suite which can be built and run to test the sorting algorithms +for correctness. It also has benchmarking code to compare its performance +relative to std::sort. The following API's are currently supported: + +### Quicksort + +``` +avx512_qsort(T* arr, int64_t arrsize) +``` +Supported datatypes: `uint16_t, int16_t, _Float16, uint32_t, int32_t, float, +uint64_t, int64_t and double` + +### Argsort + +``` +std::vector arg = avx512_argsort(T* arr, int64_t arrsize) +void avx512_argsort(T* arr, int64_t *arg, int64_t arrsize) +``` +Supported datatypes: `uint32_t, int32_t, float, uint64_t, int64_t and double`. +The algorithm resorts to scalar std::sort if the array contains NAN. + +### Quickselect + +``` +avx512_qselect(T* arr, int64_t arrsize) +avx512_qselect(T* arr, int64_t arrsize, bool hasnan) +``` +Supported datatypes: `uint16_t, int16_t, _Float16 ,uint32_t, int32_t, float, +uint64_t, int64_t and double`. Use an additional optional argument `bool +hasnan` if you expect your arrays to contain nan. + +### Partialsort + +``` +avx512_partialsort(T* arr, int64_t arrsize) +avx512_partialsort(T* arr, int64_t arrsize, bool hasnan) +``` +Supported datatypes: `uint16_t, int16_t, _Float16 ,uint32_t, int32_t, float, +uint64_t, int64_t and double`. Use an additional optional argument `bool +hasnan` if you expect your arrays to contain nan. + +### Key-value sort +``` +avx512_qsort_kv(T* key, uint64_t* value , int64_t arrsize) +``` +Supported datatypes: `uint64_t, int64_t and double` ## Algorithm details @@ -20,13 +64,14 @@ network. The core implementations of the vectorized qsort functions `avx512_qsort(T*, int64_t)` are modified versions of avx2 quicksort presented in the paper [2] and source code associated with that paper [3]. -## Handling NAN in float and double arrays +## A note on NAN in float and double arrays If you expect your array to contain NANs, please be aware that the these -routines **do not preserve your NANs as you pass them**. The -`avx512_qsort()` routine will put all your NAN's at the end of the sorted -array and replace them with `std::nan("1")`. Please take a look at -`avx512_qsort()` and `avx512_qsort()` functions for details. +routines **do not preserve your NANs as you pass them**. The quicksort, +quickselect, partialsort and key-value sorting routines will sort NAN's to the +end of the array and replace them with `std::nan("1")`. `avx512_argsort` +routines will also resort to a scalar argsort that uses std::sort to sort array +that contains NAN. ## Example to include and build this in a C++ code @@ -45,7 +90,7 @@ int main() { } /* call avx512 quicksort */ - avx512_qsort(arr.data(), ARRSIZE); + avx512_qsort(arr.data(), ARRSIZE); return 0; } @@ -54,7 +99,7 @@ int main() { ### Build using gcc ``` -gcc main.cpp -mavx512f -mavx512dq -O3 +g++ main.cpp -mavx512f -mavx512dq -O3 ``` This is a header file only library and we do not provide any compile time and @@ -75,9 +120,24 @@ compiler to build. gcc >= 8.x ``` +### Build using Meson + +meson is the recommended build system to build the test and benchmark suite. + +``` +meson setup builddir && cd builddir && ninja +``` + +It build two executables: + +- `testexe`: runs a bunch of tests written in ./tests directory. +- `benchexe`: measures performance of these algorithms for various data types. + + ### Build using Make -`make` command builds two executables: +Makefile uses `-march=sapphirerapids` as a global compile flag and hence it +will require g++-12. `make` command builds two executables: - `testexe`: runs a bunch of tests written in ./tests directory. - `benchexe`: measures performance of these algorithms for various data types and compares them to std::sort. @@ -85,15 +145,6 @@ gcc >= 8.x You can use `make test` and `make bench` to build just the `testexe` and `benchexe` respectively. -### Build using Meson - -You can also build `testexe` and `benchexe` using Meson/Ninja with the following -command: - -``` -meson setup builddir && cd builddir && ninja -``` - ## Requirements and dependencies The sorting routines relies only on the C++ Standard Library and requires a @@ -101,7 +152,8 @@ relatively modern compiler to build (gcc 8.x and above). Since they use the AVX-512 instruction set, they can only run on processors that have AVX-512. Specifically, the 32-bit and 64-bit require AVX-512F and AVX-512DQ instruction set. The 16-bit sorting requires the AVX-512F, AVX-512BW and AVX-512 VMBI2 -instruction set. The test suite is written using the Google test framework. +instruction set. The test suite is written using the Google test framework. The +benchmark is written using the google benchmark framework. ## References From b2e482fbf129aa01383fdc152ee98c92c2bdf960 Mon Sep 17 00:00:00 2001 From: Raghuveer Devulapalli Date: Thu, 22 Jun 2023 11:03:28 -0700 Subject: [PATCH 2/2] Fix function signatures --- README.md | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index ed810375..b56769f7 100644 --- a/README.md +++ b/README.md @@ -8,46 +8,46 @@ includes a test suite which can be built and run to test the sorting algorithms for correctness. It also has benchmarking code to compare its performance relative to std::sort. The following API's are currently supported: -### Quicksort +#### Quicksort ``` -avx512_qsort(T* arr, int64_t arrsize) +void avx512_qsort(T* arr, int64_t arrsize) ``` Supported datatypes: `uint16_t, int16_t, _Float16, uint32_t, int32_t, float, uint64_t, int64_t and double` -### Argsort +#### Argsort ``` -std::vector arg = avx512_argsort(T* arr, int64_t arrsize) -void avx512_argsort(T* arr, int64_t *arg, int64_t arrsize) +std::vector arg = avx512_argsort(T* arr, int64_t arrsize) +void avx512_argsort(T* arr, int64_t *arg, int64_t arrsize) ``` Supported datatypes: `uint32_t, int32_t, float, uint64_t, int64_t and double`. -The algorithm resorts to scalar std::sort if the array contains NAN. +The algorithm resorts to scalar `std::sort` if the array contains NAN. -### Quickselect +#### Quickselect ``` -avx512_qselect(T* arr, int64_t arrsize) -avx512_qselect(T* arr, int64_t arrsize, bool hasnan) +void avx512_qselect(T* arr, int64_t arrsize) +void avx512_qselect(T* arr, int64_t arrsize, bool hasnan) ``` Supported datatypes: `uint16_t, int16_t, _Float16 ,uint32_t, int32_t, float, uint64_t, int64_t and double`. Use an additional optional argument `bool hasnan` if you expect your arrays to contain nan. -### Partialsort +#### Partialsort ``` -avx512_partialsort(T* arr, int64_t arrsize) -avx512_partialsort(T* arr, int64_t arrsize, bool hasnan) +void avx512_partialsort(T* arr, int64_t arrsize) +void avx512_partialsort(T* arr, int64_t arrsize, bool hasnan) ``` Supported datatypes: `uint16_t, int16_t, _Float16 ,uint32_t, int32_t, float, uint64_t, int64_t and double`. Use an additional optional argument `bool hasnan` if you expect your arrays to contain nan. -### Key-value sort +#### Key-value sort ``` -avx512_qsort_kv(T* key, uint64_t* value , int64_t arrsize) +void avx512_qsort_kv(T* key, uint64_t* value , int64_t arrsize) ``` Supported datatypes: `uint64_t, int64_t and double` @@ -70,7 +70,7 @@ If you expect your array to contain NANs, please be aware that the these routines **do not preserve your NANs as you pass them**. The quicksort, quickselect, partialsort and key-value sorting routines will sort NAN's to the end of the array and replace them with `std::nan("1")`. `avx512_argsort` -routines will also resort to a scalar argsort that uses std::sort to sort array +routines will also resort to a scalar argsort that uses `std::sort` to sort array that contains NAN. ## Example to include and build this in a C++ code @@ -81,7 +81,7 @@ that contains NAN. #include "src/avx512-32bit-qsort.hpp" int main() { - const int ARRSIZE = 10; + const int ARRSIZE = 1000; std::vector arr; /* Initialize elements is reverse order */