-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Implement SVE2 ShiftLeftLogical Intrinsics #116380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement SVE2 ShiftLeftLogical Intrinsics #116380
Conversation
…gned, ShiftLeftLogicalWideningEven, ShiftLeftLogicalWideningOdd
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics |
HARDWARE_INTRINSIC(Sve2, ShiftArithmeticSaturate, -1, -1, {INS_sve_sqshl, INS_invalid, INS_sve_sqshl, INS_invalid, INS_sve_sqshl, INS_invalid, INS_sve_sqshl, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_EmbeddedMaskedOperation|HW_Flag_HasRMWSemantics|HW_Flag_LowMaskedOperation) | ||
HARDWARE_INTRINSIC(Sve2, ShiftLeftAndInsert, -1, 3, {INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_invalid, INS_invalid}, HW_Category_ShiftLeftByImmediate, HW_Flag_Scalable|HW_Flag_HasImmediateOperand|HW_Flag_HasRMWSemantics) | ||
HARDWARE_INTRINSIC(Sve2, ShiftLeftLogicalSaturate, -1, -1, {INS_invalid, INS_sve_uqshl, INS_invalid, INS_sve_uqshl, INS_invalid, INS_sve_uqshl, INS_invalid, INS_sve_uqshl, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_BaseTypeFromFirstArg|HW_Flag_EmbeddedMaskedOperation|HW_Flag_HasRMWSemantics) | ||
HARDWARE_INTRINSIC(Sve2, ShiftLeftLogicalSaturateUnsigned, -1, -1, {INS_invalid, INS_sve_sqshlu, INS_invalid, INS_sve_sqshlu, INS_invalid, INS_sve_sqshlu, INS_invalid, INS_sve_sqshlu, INS_invalid, INS_invalid}, HW_Category_ShiftLeftByImmediate, HW_Flag_Scalable|HW_Flag_EmbeddedMaskedOperation|HW_Flag_HasImmediateOperand|HW_Flag_HasRMWSemantics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sqshlu
takes signed elements, so these instructions should be on TYP_BYTE
, TYP_SHORT
, etc. right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that these columns should match the return type of the intrinsic rather than the input types, when they don't match in type. And HW_Flag_BaseTypeFromFirstArg
is used when you need to disambiguate between intrinsics with the same return type but different argument types. Is this the right approach?
I think ShiftLeftLogicalSaturate
doesn't need HW_Flag_BaseTypeFromFirstArg
and I've left this in by mistake. It just works by chance because the return type is the same as the type of the first argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share some codegen examples of ShiftLeftLogicalSaturateUnsigned
, ShiftLeftLogicalWideningEven
and ShiftLeftLogicalWideningOdd
for various method overloads?
Unrelated, looking at HW_Flag_BaseTypeFromFirstArg
, I think there are lot of instructions that doesn't need that flag, but was added (possibly from copy/paste errors), e.g. ftssel
, but that's a separate topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, the following function:
[MethodImpl(MethodImplOptions.NoInlining)]
static void Test()
{
System.Console.WriteLine(
Sve2.ShiftLeftLogicalSaturateUnsigned(
Vector<short>.Zero,
255
)
);
var a = Vector<short>.Zero;
System.Console.WriteLine(
Sve2.ShiftLeftLogicalWideningEven(a, 3)
);
System.Console.WriteLine(
Sve2.ShiftLeftLogicalWideningOdd(a, 3)
);
}
produces this (I've removed code related to boxing the vectors and printing them):
; Total bytes of code 236, prolog size 12, PerfScore 56.00, instruction count 59, allocated bytes for code 236 (MethodHash=aac37465) for method JIT.HardwareIntrinsics.Arm.Program:TestDump() (Tier0)
; ============================================================
*************** After end code gen, before unwindEmit()
G_M35738_IG01: ; func=00, offs=0x000000, size=0x000C, bbWeight=1, PerfScore 2.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
IN0037: 000000 stp fp, lr, [sp, #-0x40]!
IN0038: 000004 mov fp, sp
IN0039: 000008 str xzr, [fp, #0x38] // [V01 tmp1]
G_M35738_IG02: ; offs=0x00000C, size=0x00D8, bbWeight=1, PerfScore 51.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB01 [0000], byref
...
IN0006: 000020 movi v0.4s, #0
IN0007: 000024 mov w0, #255
IN0008: 000028 movz x1, #0x51B8 // code for System.Runtime.Intrinsics.Arm.Sve2:ShiftLeftLogicalSaturateUnsigned(System.Numerics.Vector`1[short],byte):System.Numerics.Vector`1[ushort]
IN0009: 00002C movk x1, #0x8710 LSL #16
IN000a: 000030 movk x1, #0xFA38 LSL #32
IN000b: 000034 ldr x1, [x1]
IN000c: 000038 blr x1
...
IN0015: 00005C movi v16.4s, #0
IN0016: 000060 sshllb z16.d, z16.s, #3
IN0017: 000064 str q16, [fp, #0x20] // [V02 tmp2]
...
IN0026: 0000A0 movi v16.4s, #0
IN0027: 0000A4 sshllt z16.d, z16.s, #3
IN0028: 0000A8 str q16, [fp, #0x10] // [V03 tmp3]
...
G_M35738_IG03: ; offs=0x0000E4, size=0x0008, bbWeight=1, PerfScore 2.00, epilog, nogc, extend
IN003a: 0000E4 ldp fp, lr, [sp], #0x40
IN003b: 0000E8 ret lr
This is the function generated for ShiftLeftLogicalSaturateUnsigned
, because the constant passed in is out of range.
; Total bytes of code 272, prolog size 8, PerfScore 115.00, instruction count 68, allocated bytes for code 272 (MethodHash=8e6e654e) for method System.Runtime.Intrinsics.Arm.Sve2:ShiftLeftLogicalSaturateUnsigned(System.Numerics.Vector`1[short],byte):System.Numerics.Vector`1[ushort] (Tier0)
; ============================================================
*************** After end code gen, before unwindEmit()
G_M39601_IG01: ; func=00, offs=0x000000, size=0x0010, bbWeight=1, PerfScore 3.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
IN003f: 000000 stp fp, lr, [sp, #-0x30]!
IN0040: 000004 mov fp, sp
IN0041: 000008 str q0, [fp, #0x20] // [V00 arg0]
IN0042: 00000C str w0, [fp, #0x1C] // [V01 arg1]
G_M39601_IG02: ; offs=0x000010, size=0x0030, bbWeight=1, PerfScore 14.00, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB01 [0000], byref, isz
IN0001: 000010 ldr w0, [fp, #0x1C] // [V01 arg1]
IN0002: 000014 uxtb w0, w0
IN0003: 000018 cmp w0, #16
IN0004: 00001C bhs G_M39601_IG21
IN0005: 000020 ldr w0, [fp, #0x1C] // [V01 arg1]
IN0006: 000024 uxtb w0, w0
IN0007: 000028 ldr q16, [fp, #0x20] // [V00 arg0]
IN0008: 00002C ptrue p0.h
IN0009: 000030 adr x1, [G_M39601_IG03]
IN000a: 000034 add x1, x1, x0, LSL #3
IN000b: 000038 add x1, x1, x0, LSL #2
IN000c: 00003C br x1
G_M39601_IG03: ; offs=0x000040, size=0x000C, bbWeight=1, PerfScore 6.00, BB01 [0000], extend
IN000d: 000040 movprfx z16.h, p0/z, z16.h
IN000e: 000044 sqshlu z16.h, p0/m, z16.h, #0
IN000f: 000048 b G_M39601_IG19
G_M39601_IG04: ; offs=0x00004C, size=0x000C, bbWeight=1, PerfScore 6.00, BB01 [0000], extend
IN0010: 00004C movprfx z16.h, p0/z, z16.h
IN0011: 000050 sqshlu z16.h, p0/m, z16.h, #1
IN0012: 000054 b G_M39601_IG19
G_M39601_IG05: ; offs=0x000058, size=0x000C, bbWeight=1, PerfScore 6.00, BB01 [0000], extend
IN0013: 000058 movprfx z16.h, p0/z, z16.h
IN0014: 00005C sqshlu z16.h, p0/m, z16.h, #2
IN0015: 000060 b G_M39601_IG19
G_M39601_IG06: ; offs=0x000064, size=0x000C, bbWeight=1, PerfScore 6.00, BB01 [0000], extend
IN0016: 000064 movprfx z16.h, p0/z, z16.h
IN0017: 000068 sqshlu z16.h, p0/m, z16.h, #3
IN0018: 00006C b G_M39601_IG19
...
G_M39601_IG19: ; offs=0x0000FC, size=0x0004, bbWeight=1, PerfScore 0.50, BB01 [0000], extend
IN003c: 0000FC mov v0.16b, v16.16b
G_M39601_IG20: ; offs=0x000100, size=0x0008, bbWeight=1, PerfScore 2.00, epilog, nogc, extend
IN0043: 000100 ldp fp, lr, [sp], #0x30
IN0044: 000104 ret lr
G_M39601_IG21: ; offs=0x000108, size=0x0008, bbWeight=0, PerfScore 0.00, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB02 [0001], gcvars, byref
IN003d: 000108 bl CORINFO_HELP_THROW_ARGUMENTOUTOFRANGEEXCEPTION
IN003e: 00010C brk #0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the function generated for ShiftLeftLogicalSaturateUnsigned, because the constant passed in is out of range.
Is it? You are passing short
elements, so should be ok to take up to 65535
, unless i am missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sshllb z16.d, z16.s, #3
Since you are using public static Vector<int> ShiftLeftLogicalWideningEven(Vector<short> value, [ConstantExpected] byte count)
in your example, I was hoping to see input register element type to be H
and that of result to be S
, so should be: sshllb z16.s, z16.h, #3
. Same with `sshllt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it? You are passing
short
elements, so should be ok to take up to65535
, unless i am missing something.
It's a shift amount, so it's between 1-16 for the number of bits shifted in the narrow type.
Since you are using
public static Vector<int> ShiftLeftLogicalWideningEven(Vector<short> value, [ConstantExpected] byte count)
in your example, I was hoping to see input register element type to beH
and that of result to beS
, so should be:sshllb z16.s, z16.h, #3
. Same with `sshllt.
Yes I missed that detail. I think the code is correct as the operation is passing, but I forgot to change the assembly printing part of the emitter which I've corrected now. Is there a way to dump the code buffer next to the assembly in the JitDump so I can double check the opcode?
HARDWARE_INTRINSIC(Sve2, ShiftArithmeticRoundedSaturate, -1, -1, {INS_sve_sqrshl, INS_invalid, INS_sve_sqrshl, INS_invalid, INS_sve_sqrshl, INS_invalid, INS_sve_sqrshl, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_EmbeddedMaskedOperation|HW_Flag_HasRMWSemantics|HW_Flag_LowMaskedOperation) | ||
HARDWARE_INTRINSIC(Sve2, ShiftArithmeticSaturate, -1, -1, {INS_sve_sqshl, INS_invalid, INS_sve_sqshl, INS_invalid, INS_sve_sqshl, INS_invalid, INS_sve_sqshl, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_EmbeddedMaskedOperation|HW_Flag_HasRMWSemantics|HW_Flag_LowMaskedOperation) | ||
HARDWARE_INTRINSIC(Sve2, ShiftLeftAndInsert, -1, 3, {INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_invalid, INS_invalid}, HW_Category_ShiftLeftByImmediate, HW_Flag_Scalable|HW_Flag_HasImmediateOperand|HW_Flag_HasRMWSemantics) | ||
HARDWARE_INTRINSIC(Sve2, ShiftLeftLogicalSaturate, -1, -1, {INS_invalid, INS_sve_uqshl, INS_invalid, INS_sve_uqshl, INS_invalid, INS_sve_uqshl, INS_invalid, INS_sve_uqshl, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_BaseTypeFromFirstArg|HW_Flag_EmbeddedMaskedOperation|HW_Flag_HasRMWSemantics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HARDWARE_INTRINSIC(Sve2, ShiftLeftAndInsert, -1, 3, {INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_invalid, INS_invalid}, HW_Category_ShiftLeftByImmediate, HW_Flag_Scalable|HW_Flag_HasImmediateOperand|HW_Flag_HasRMWSemantics) | ||
HARDWARE_INTRINSIC(Sve2, ShiftLeftLogicalSaturate, -1, -1, {INS_invalid, INS_sve_uqshl, INS_invalid, INS_sve_uqshl, INS_invalid, INS_sve_uqshl, INS_invalid, INS_sve_uqshl, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_BaseTypeFromFirstArg|HW_Flag_EmbeddedMaskedOperation|HW_Flag_HasRMWSemantics) | ||
HARDWARE_INTRINSIC(Sve2, ShiftLeftLogicalSaturateUnsigned, -1, -1, {INS_invalid, INS_sve_sqshlu, INS_invalid, INS_sve_sqshlu, INS_invalid, INS_sve_sqshlu, INS_invalid, INS_sve_sqshlu, INS_invalid, INS_invalid}, HW_Category_ShiftLeftByImmediate, HW_Flag_Scalable|HW_Flag_EmbeddedMaskedOperation|HW_Flag_HasImmediateOperand|HW_Flag_HasRMWSemantics) | ||
HARDWARE_INTRINSIC(Sve2, ShiftLeftLogicalWideningEven, -1, 2, {INS_invalid, INS_invalid, INS_sve_sshllb, INS_sve_ushllb, INS_sve_sshllb, INS_sve_ushllb, INS_sve_sshllb, INS_sve_ushllb, INS_invalid, INS_invalid}, HW_Category_ShiftLeftByImmediate, HW_Flag_Scalable|HW_Flag_HasImmediateOperand) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also I think ShiftLeftLogicalWideningEven
and ShiftLeftLogicalWideningOdd
needs HW_Flag_BaseTypeFromFirstArg
I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
/ba-g failures seems unrelated |
Includes:
Contributes towards #115479
@a74nh @kunalspathak