Skip to content

Conversation

AlexGuteniev
Copy link
Contributor

@AlexGuteniev AlexGuteniev commented May 25, 2024

It is both way simpler (don't need to use another intrinsics set) and better for codegen (as the previous version is only for 64-bits and does not take advantage of smaller parameter size).

I have not tested on actual ARM64 and ARM64EC.

Resolves #4683
Resolves #2129
Resolves llvm/llvm-project#50830

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner May 25, 2024 18:27
@AlexGuteniev
Copy link
Contributor Author

Honestly, what I'm doing for ARM64EC here is uneducated guess.
This needs to be checked by an ARM64EC expert.

@StephanTLavavej StephanTLavavej added performance Must go faster ARM64 Related to the ARM64 architecture labels May 28, 2024
@StephanTLavavej StephanTLavavej self-assigned this May 28, 2024
@StephanTLavavej StephanTLavavej removed their assignment Jun 13, 2024
@StephanTLavavej StephanTLavavej self-assigned this Jun 14, 2024
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 9fd47b5 into microsoft:main Jun 18, 2024
@StephanTLavavej
Copy link
Member

Thanks for greatly simplifying this code! 😻 ✨ 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ARM64 Related to the ARM64 architecture performance Must go faster
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

ARM64 __popcnt intrinsics Add neon_cnt and neon_addv8 for 64-bit ARM for parity with MSVC <intrin0.h>: needs some stuff for _M_ARM64
2 participants