Skip to content

Conversation

sterrettm2
Copy link
Contributor

This patch adds support for descending kv-sort and ascending/descending kv-select and kv-partial_sort
For reference, some benchmarks comparing to Pytorch's scalar implementation are provided:

With normally distributed float32:

Partial Sort (sorted topk):                                                                           
size           default        avx2           avx512         AVX2 speedup % AVX512 speedup %
16             4.180374384    4.0826931      4.202754736    -2.336663537   0.53536717     
128            5.362578869    6.11398387     5.229052782    14.01200838    -2.489960336   
1024           14.10357333    14.0685565     9.240571122    -0.248283326   -34.48063899   
10000          322.740033     89.83533496    48.29404964    -72.1647996    -85.03623824   
100000         3757.874566    1168.889193    721.571655     -68.89493856   -80.79841032   
1000000        45415.52301    11909.35397    8130.106271    -73.77690891   -82.09839779   
10000000       539107.6565    217536.211     171638.298     -59.64883667   -68.16251894   
100000000      6410786.629    2307972.193    1874841.69     -63.99861161   -70.75488862   
                                                                                          
Select (unsorted topk):                                                                           
size           default        avx2           avx512         AVX2 speedup % AVX512 speedup %                                      
16             4.046328783    4.075162649    4.120069742    0.712593258    1.822416396    
128            4.912573099    5.651566267    4.852572441    15.04289408    -1.221369267   
1024           9.33410556     7.444588822    6.450861102    -20.24314731   -30.88934917   
10000          131.0603339    30.31828351    21.59984892    -76.8669264    -83.5191562    
100000         1273.479035    461.2905271    368.7236242    -63.77714006   -71.04596038   
1000000        12720.43527    4615.330229    6321.014143    -63.71719891   -50.30819302   
10000000       253137.2547    130307.7698    125634.7179    -48.52287945   -50.36893401   
100000000      2394076.347    1226446.152    1257464.17     -48.77163575   -47.47602048   

With uniformly distributed (from 0 to 10e9) int32:

Partial Sort (sorted topk):                                                                           
size           default        avx2           avx512         AVX2 speedup % AVX512 speedup %
16             4.159518957    4.193886042    4.062754393    0.82622738     -2.326340269   
128            4.946233034    6.055155277    5.076540947    22.41953089    2.634487941    
1024           10.11158738    12.40083982    8.094132173    22.63989176    -19.95191391   
10000          235.6421399    81.19698569    47.26006824    -65.54224736   -79.94413552   
100000         2726.637111    1325.142324    692.2358897    -51.40012146   -74.61210049   
1000000        32515.39025    13960.33343    7689.116048    -57.06545939   -76.35237963   
10000000       461327.1713    221366.8823    176379.2038    -52.01520828   -61.76700295   
100000000      5129393.339    2428071.022    1890691.996    -52.66358297   -63.14004658   
                                                                                          
Select (unsorted topk):                                                                           
size           default        avx2           avx512         AVX2 speedup % AVX512 speedup %                             
16             7.28207159     7.344153643    7.334493876    0.852532847    0.719881485    
128            7.539570808    9.854883432    8.218926668    30.70881198    9.010537563    
1024           14.03857871    19.15253601    10.045604      36.427885      -28.4428701    
10000          146.630404     48.96809058    29.91605115    -66.60440861   -79.59764801   
100000         1304.309364    585.7699348    807.1453417    -55.08964737   -38.11703236   
1000000        16298.37354    7968.290179    8426.867279    -51.10990578   -48.29626859   
10000000       276746.4638    175631.3324    161258.0299    -36.5370997    -41.73077129   
100000000      3139316.559    1701539.516    1615918.875    -45.79904624   -48.5264119  

Copy link
Member

@r-devulap r-devulap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some minor reformat to avoid duplicate dispatch functions for keyvalue methods. LGTM! thanks @sterrettm2!

@r-devulap r-devulap merged commit 2315766 into numpy:main May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants