Skip to content

Conversation

snickolls-arm
Copy link
Contributor

Includes:

  • ShiftLeftLogicalSaturate
  • ShiftLeftLogicalSaturateUnsigned
  • ShiftLeftLogicalWideningEven
  • ShiftLeftLogicalWideningOdd

Contributes towards #115479

@a74nh @kunalspathak

…gned, ShiftLeftLogicalWideningEven, ShiftLeftLogicalWideningOdd
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jun 6, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

HARDWARE_INTRINSIC(Sve2, ShiftArithmeticSaturate, -1, -1, {INS_sve_sqshl, INS_invalid, INS_sve_sqshl, INS_invalid, INS_sve_sqshl, INS_invalid, INS_sve_sqshl, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_EmbeddedMaskedOperation|HW_Flag_HasRMWSemantics|HW_Flag_LowMaskedOperation)
HARDWARE_INTRINSIC(Sve2, ShiftLeftAndInsert, -1, 3, {INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_invalid, INS_invalid}, HW_Category_ShiftLeftByImmediate, HW_Flag_Scalable|HW_Flag_HasImmediateOperand|HW_Flag_HasRMWSemantics)
HARDWARE_INTRINSIC(Sve2, ShiftLeftLogicalSaturate, -1, -1, {INS_invalid, INS_sve_uqshl, INS_invalid, INS_sve_uqshl, INS_invalid, INS_sve_uqshl, INS_invalid, INS_sve_uqshl, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_BaseTypeFromFirstArg|HW_Flag_EmbeddedMaskedOperation|HW_Flag_HasRMWSemantics)
HARDWARE_INTRINSIC(Sve2, ShiftLeftLogicalSaturateUnsigned, -1, -1, {INS_invalid, INS_sve_sqshlu, INS_invalid, INS_sve_sqshlu, INS_invalid, INS_sve_sqshlu, INS_invalid, INS_sve_sqshlu, INS_invalid, INS_invalid}, HW_Category_ShiftLeftByImmediate, HW_Flag_Scalable|HW_Flag_EmbeddedMaskedOperation|HW_Flag_HasImmediateOperand|HW_Flag_HasRMWSemantics)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sqshlu takes signed elements, so these instructions should be on TYP_BYTE, TYP_SHORT, etc. right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that these columns should match the return type of the intrinsic rather than the input types, when they don't match in type. And HW_Flag_BaseTypeFromFirstArg is used when you need to disambiguate between intrinsics with the same return type but different argument types. Is this the right approach?

I think ShiftLeftLogicalSaturate doesn't need HW_Flag_BaseTypeFromFirstArg and I've left this in by mistake. It just works by chance because the return type is the same as the type of the first argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share some codegen examples of ShiftLeftLogicalSaturateUnsigned, ShiftLeftLogicalWideningEven and ShiftLeftLogicalWideningOdd for various method overloads?

Unrelated, looking at HW_Flag_BaseTypeFromFirstArg, I think there are lot of instructions that doesn't need that flag, but was added (possibly from copy/paste errors), e.g. ftssel, but that's a separate topic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, the following function:

        [MethodImpl(MethodImplOptions.NoInlining)]
        static void Test()
        {
            System.Console.WriteLine(
                Sve2.ShiftLeftLogicalSaturateUnsigned(
                    Vector<short>.Zero,
                    255
                )
            );

            var a = Vector<short>.Zero;
            System.Console.WriteLine(
                Sve2.ShiftLeftLogicalWideningEven(a, 3)
            );

            System.Console.WriteLine(
                Sve2.ShiftLeftLogicalWideningOdd(a, 3)
            );
        }

produces this (I've removed code related to boxing the vectors and printing them):

; Total bytes of code 236, prolog size 12, PerfScore 56.00, instruction count 59, allocated bytes for code 236 (MethodHash=aac37465) for method JIT.HardwareIntrinsics.Arm.Program:TestDump() (Tier0)
; ============================================================

*************** After end code gen, before unwindEmit()
G_M35738_IG01:        ; func=00, offs=0x000000, size=0x000C, bbWeight=1, PerfScore 2.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG

IN0037: 000000      stp     fp, lr, [sp, #-0x40]!
IN0038: 000004      mov     fp, sp
IN0039: 000008      str     xzr, [fp, #0x38]	// [V01 tmp1]

G_M35738_IG02:        ; offs=0x00000C, size=0x00D8, bbWeight=1, PerfScore 51.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB01 [0000], byref

...
IN0006: 000020      movi    v0.4s, #0
IN0007: 000024      mov     w0, #255
IN0008: 000028      movz    x1, #0x51B8      // code for System.Runtime.Intrinsics.Arm.Sve2:ShiftLeftLogicalSaturateUnsigned(System.Numerics.Vector`1[short],byte):System.Numerics.Vector`1[ushort]
IN0009: 00002C      movk    x1, #0x8710 LSL #16
IN000a: 000030      movk    x1, #0xFA38 LSL #32
IN000b: 000034      ldr     x1, [x1]
IN000c: 000038      blr     x1
...
IN0015: 00005C      movi    v16.4s, #0
IN0016: 000060      sshllb  z16.d, z16.s, #3
IN0017: 000064      str     q16, [fp, #0x20]	// [V02 tmp2]
...
IN0026: 0000A0      movi    v16.4s, #0
IN0027: 0000A4      sshllt  z16.d, z16.s, #3
IN0028: 0000A8      str     q16, [fp, #0x10]	// [V03 tmp3]
...

G_M35738_IG03:        ; offs=0x0000E4, size=0x0008, bbWeight=1, PerfScore 2.00, epilog, nogc, extend

IN003a: 0000E4      ldp     fp, lr, [sp], #0x40
IN003b: 0000E8      ret     lr

This is the function generated for ShiftLeftLogicalSaturateUnsigned, because the constant passed in is out of range.

; Total bytes of code 272, prolog size 8, PerfScore 115.00, instruction count 68, allocated bytes for code 272 (MethodHash=8e6e654e) for method System.Runtime.Intrinsics.Arm.Sve2:ShiftLeftLogicalSaturateUnsigned(System.Numerics.Vector`1[short],byte):System.Numerics.Vector`1[ushort] (Tier0)
; ============================================================

*************** After end code gen, before unwindEmit()
G_M39601_IG01:        ; func=00, offs=0x000000, size=0x0010, bbWeight=1, PerfScore 3.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG

IN003f: 000000      stp     fp, lr, [sp, #-0x30]!
IN0040: 000004      mov     fp, sp
IN0041: 000008      str     q0, [fp, #0x20]	// [V00 arg0]
IN0042: 00000C      str     w0, [fp, #0x1C]	// [V01 arg1]

G_M39601_IG02:        ; offs=0x000010, size=0x0030, bbWeight=1, PerfScore 14.00, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB01 [0000], byref, isz

IN0001: 000010      ldr     w0, [fp, #0x1C]	// [V01 arg1]
IN0002: 000014      uxtb    w0, w0
IN0003: 000018      cmp     w0, #16
IN0004: 00001C      bhs     G_M39601_IG21
IN0005: 000020      ldr     w0, [fp, #0x1C]	// [V01 arg1]
IN0006: 000024      uxtb    w0, w0
IN0007: 000028      ldr     q16, [fp, #0x20]	// [V00 arg0]
IN0008: 00002C      ptrue   p0.h
IN0009: 000030      adr     x1, [G_M39601_IG03]
IN000a: 000034      add     x1, x1, x0,  LSL #3
IN000b: 000038      add     x1, x1, x0,  LSL #2
IN000c: 00003C      br      x1

G_M39601_IG03:        ; offs=0x000040, size=0x000C, bbWeight=1, PerfScore 6.00, BB01 [0000], extend

IN000d: 000040      movprfx z16.h, p0/z, z16.h
IN000e: 000044      sqshlu  z16.h, p0/m, z16.h, #0
IN000f: 000048      b       G_M39601_IG19

G_M39601_IG04:        ; offs=0x00004C, size=0x000C, bbWeight=1, PerfScore 6.00, BB01 [0000], extend

IN0010: 00004C      movprfx z16.h, p0/z, z16.h
IN0011: 000050      sqshlu  z16.h, p0/m, z16.h, #1
IN0012: 000054      b       G_M39601_IG19

G_M39601_IG05:        ; offs=0x000058, size=0x000C, bbWeight=1, PerfScore 6.00, BB01 [0000], extend

IN0013: 000058      movprfx z16.h, p0/z, z16.h
IN0014: 00005C      sqshlu  z16.h, p0/m, z16.h, #2
IN0015: 000060      b       G_M39601_IG19

G_M39601_IG06:        ; offs=0x000064, size=0x000C, bbWeight=1, PerfScore 6.00, BB01 [0000], extend

IN0016: 000064      movprfx z16.h, p0/z, z16.h
IN0017: 000068      sqshlu  z16.h, p0/m, z16.h, #3
IN0018: 00006C      b       G_M39601_IG19

...

G_M39601_IG19:        ; offs=0x0000FC, size=0x0004, bbWeight=1, PerfScore 0.50, BB01 [0000], extend

IN003c: 0000FC      mov     v0.16b, v16.16b

G_M39601_IG20:        ; offs=0x000100, size=0x0008, bbWeight=1, PerfScore 2.00, epilog, nogc, extend

IN0043: 000100      ldp     fp, lr, [sp], #0x30
IN0044: 000104      ret     lr

G_M39601_IG21:        ; offs=0x000108, size=0x0008, bbWeight=0, PerfScore 0.00, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB02 [0001], gcvars, byref

IN003d: 000108      bl      CORINFO_HELP_THROW_ARGUMENTOUTOFRANGEEXCEPTION
IN003e: 00010C      brk     #0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the function generated for ShiftLeftLogicalSaturateUnsigned, because the constant passed in is out of range.

Is it? You are passing short elements, so should be ok to take up to 65535, unless i am missing something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sshllb z16.d, z16.s, #3

Since you are using public static Vector<int> ShiftLeftLogicalWideningEven(Vector<short> value, [ConstantExpected] byte count) in your example, I was hoping to see input register element type to be H and that of result to be S, so should be: sshllb z16.s, z16.h, #3. Same with `sshllt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it? You are passing short elements, so should be ok to take up to 65535, unless i am missing something.

It's a shift amount, so it's between 1-16 for the number of bits shifted in the narrow type.

Since you are using public static Vector<int> ShiftLeftLogicalWideningEven(Vector<short> value, [ConstantExpected] byte count) in your example, I was hoping to see input register element type to be H and that of result to be S, so should be: sshllb z16.s, z16.h, #3. Same with `sshllt.

Yes I missed that detail. I think the code is correct as the operation is passing, but I forgot to change the assembly printing part of the emitter which I've corrected now. Is there a way to dump the code buffer next to the assembly in the JitDump so I can double check the opcode?

HARDWARE_INTRINSIC(Sve2, ShiftArithmeticRoundedSaturate, -1, -1, {INS_sve_sqrshl, INS_invalid, INS_sve_sqrshl, INS_invalid, INS_sve_sqrshl, INS_invalid, INS_sve_sqrshl, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_EmbeddedMaskedOperation|HW_Flag_HasRMWSemantics|HW_Flag_LowMaskedOperation)
HARDWARE_INTRINSIC(Sve2, ShiftArithmeticSaturate, -1, -1, {INS_sve_sqshl, INS_invalid, INS_sve_sqshl, INS_invalid, INS_sve_sqshl, INS_invalid, INS_sve_sqshl, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_EmbeddedMaskedOperation|HW_Flag_HasRMWSemantics|HW_Flag_LowMaskedOperation)
HARDWARE_INTRINSIC(Sve2, ShiftLeftAndInsert, -1, 3, {INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_invalid, INS_invalid}, HW_Category_ShiftLeftByImmediate, HW_Flag_Scalable|HW_Flag_HasImmediateOperand|HW_Flag_HasRMWSemantics)
HARDWARE_INTRINSIC(Sve2, ShiftLeftLogicalSaturate, -1, -1, {INS_invalid, INS_sve_uqshl, INS_invalid, INS_sve_uqshl, INS_invalid, INS_sve_uqshl, INS_invalid, INS_sve_uqshl, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_BaseTypeFromFirstArg|HW_Flag_EmbeddedMaskedOperation|HW_Flag_HasRMWSemantics)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HW_Flag_BaseTypeFromFirstArg

I think HW_Flag_BaseTypeFromFirstArg needs to be on ShiftLeftLogicalSaturateUnsigned instead of ShiftLeftLogicalSaturate.

HARDWARE_INTRINSIC(Sve2, ShiftLeftAndInsert, -1, 3, {INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_sve_sli, INS_invalid, INS_invalid}, HW_Category_ShiftLeftByImmediate, HW_Flag_Scalable|HW_Flag_HasImmediateOperand|HW_Flag_HasRMWSemantics)
HARDWARE_INTRINSIC(Sve2, ShiftLeftLogicalSaturate, -1, -1, {INS_invalid, INS_sve_uqshl, INS_invalid, INS_sve_uqshl, INS_invalid, INS_sve_uqshl, INS_invalid, INS_sve_uqshl, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_BaseTypeFromFirstArg|HW_Flag_EmbeddedMaskedOperation|HW_Flag_HasRMWSemantics)
HARDWARE_INTRINSIC(Sve2, ShiftLeftLogicalSaturateUnsigned, -1, -1, {INS_invalid, INS_sve_sqshlu, INS_invalid, INS_sve_sqshlu, INS_invalid, INS_sve_sqshlu, INS_invalid, INS_sve_sqshlu, INS_invalid, INS_invalid}, HW_Category_ShiftLeftByImmediate, HW_Flag_Scalable|HW_Flag_EmbeddedMaskedOperation|HW_Flag_HasImmediateOperand|HW_Flag_HasRMWSemantics)
HARDWARE_INTRINSIC(Sve2, ShiftLeftLogicalWideningEven, -1, 2, {INS_invalid, INS_invalid, INS_sve_sshllb, INS_sve_ushllb, INS_sve_sshllb, INS_sve_ushllb, INS_sve_sshllb, INS_sve_ushllb, INS_invalid, INS_invalid}, HW_Category_ShiftLeftByImmediate, HW_Flag_Scalable|HW_Flag_HasImmediateOperand)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also I think ShiftLeftLogicalWideningEven and ShiftLeftLogicalWideningOdd needs HW_Flag_BaseTypeFromFirstArg I guess?

Copy link
Contributor

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some comments

Copy link
Contributor

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@kunalspathak
Copy link
Contributor

/ba-g failures seems unrelated

@kunalspathak kunalspathak merged commit ff8c934 into dotnet:main Jun 17, 2025
150 of 158 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jul 18, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Runtime.Intrinsics community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants