Skip to content

Conversation

a4lg
Copy link
Contributor

@a4lg a4lg commented Sep 19, 2025

On rust-lang/stdarch#1765, it has been pointed out that two RISC-V (64-bit only) intrinsics to perform AES key scheduling have wrong target feature.
aes64ks1i and aes64ks2 instructions require either Zkne (scalar cryptography: AES encryption) or Zknd (scalar cryptography: AES decryption) extension (or both) but corresponding Rust intrinsics (in core::arch::riscv64) required both Zkne and Zknd extensions.

An excerpt from the original intrinsics:

#[target_feature(enable = "zkne", enable = "zknd")]

To fix that, we need to:

  1. Represent a condition where either Zkne or Zknd is available and
  2. Workaround an issue: llvm.riscv.aes64ks1i / llvm.riscv.aes64ks2 LLVM intrinsics require either Zkne or Zknd extension.

This PR attempts to resolve them by:

  1. Adding a perma-unstable RISC-V target feature: zkne_or_zknd (implied from both zkne and zknd) and
  2. Using inline assembly to construct machine code directly (because zkne_or_zknd alone cannot imply neither Zkne nor Zknd, we cannot use LLVM intrinsics).

The author confirmed that we can construct an AES key scheduling function with decent performance using fixed aes64ks1i and aes64ks2 intrinsics (with optimization enabled).


Big thanks to @sayantn for the fundamental idea.

In this implementation, the author (I) used .option push, .option arch and .option pop. They can be used to temporally change the architecture in specific region of the code and almost all architecture changes are temporary (except ELF flags permanently set by using compressed instruction extensions and/or the Ztso extension).
We can use .option arch, +zkne or .option arch, +zknd and I arbitrarily chose the Zkne extension.

r? @Amanieu
@rustbot label +O-riscv +A-target-feature

Because some AES key scheduling instructions require *either* Zkne or
Zknd extension, we must have a target feature to represent
`(Zkne || Zknd)`.

This commit adds (perma-unstable) target feature to the RISC-V
architecture: `zkne_or_zknd` for this purpose.

Helped-by: sayantn <[email protected]>
…insics

Using the inline assembly and `zkne_or_zknd` target feature could avoid
current issues regarding intrinsics available when either Zkne or Zknd
is available.

Before this commit, intrinsics `aes64ks1i` and `aes64ks2` required
both Zkne and Zknd extensions, not either Zkne or Zknd.

Closes: rust-lang/stdarch#1765
@rustbot
Copy link
Collaborator

rustbot commented Sep 19, 2025

stdarch is developed in its own repository. If possible, consider making this change to rust-lang/stdarch instead.

cc @Amanieu, @folkertdev, @sayantn

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. O-riscv Target: RISC-V architecture labels Sep 19, 2025
@a4lg
Copy link
Contributor Author

a4lg commented Sep 19, 2025

Example (AES-256 shared encryption/decryption)

#![no_std]
#![feature(riscv_ext_intrinsics)]

#![allow(clippy::identity_op)]

use core::arch::riscv64::{aes64ks1i, aes64ks2};

#[target_feature(enable = "zkne")]
// #[target_feature(enable = "zknd")]
pub fn aes256_key_schedule(key: &[u64; 4], scheduled_key: &mut [u64; 30]) {
    let mut rk0 = key[0];
    let mut rk1 = key[1];
    let mut rk2 = key[2];
    let mut rk3 = key[3];
    scheduled_key[0] = rk0;
    scheduled_key[1] = rk1;
    scheduled_key[2] = rk2;
    scheduled_key[3] = rk3;
    macro_rules! double_round {
        ($i: expr) => {
            let tmp = aes64ks1i::<$i>(rk3);
            rk0 = aes64ks2(tmp, rk0);
            rk1 = aes64ks2(rk0, rk1);
            let tmp = aes64ks1i::<10>(rk1);
            rk2 = aes64ks2(tmp, rk2);
            rk3 = aes64ks2(rk2, rk3);
            scheduled_key[4 * ($i + 1) + 0] = rk0;
            scheduled_key[4 * ($i + 1) + 1] = rk1;
            scheduled_key[4 * ($i + 1) + 2] = rk2;
            scheduled_key[4 * ($i + 1) + 3] = rk3;
        };
    }
    double_round!(0);
    double_round!(1);
    double_round!(2);
    double_round!(3);
    double_round!(4);
    double_round!(5);
    // Process tail
    let tmp = aes64ks1i::<6>(rk3);
    rk0 = aes64ks2(tmp, rk0);
    rk1 = aes64ks2(rk0, rk1);
    scheduled_key[4 * 7 + 0] = rk0;
    scheduled_key[4 * 7 + 1] = rk1;
}

Note that we can use aes64ks1i and aes64ks2 without unsafe keyword (because of matching target features) as long as either #[target_feature(enable = "zkne")] or #[target_feature(enable = "zknd")] is attached to the function.

By compiling this AES-256 key scheduling code with optimization enabled, we'll get for example:

0000000000000000 <a::aes256_key_schedule::h9f3a83c35f5e5426>:
   0:   6110            ld          a2,0(a0)
   2:   6514            ld          a3,8(a0)
   4:   6918            ld          a4,16(a0)
   6:   6d08            ld          a0,24(a0)
   8:   e190            sd          a2,0(a1)
   a:   e594            sd          a3,8(a1)
   c:   e998            sd          a4,16(a1)
   e:   ed88            sd          a0,24(a1)
  10:   31051793        aes64ks1i   a5,a0,0x0
  14:   7ec78633        aes64ks2    a2,a5,a2
  18:   7ed606b3        aes64ks2    a3,a2,a3
  1c:   31a69793        aes64ks1i   a5,a3,0xa
  20:   7ee78733        aes64ks2    a4,a5,a4
  24:   7ea70533        aes64ks2    a0,a4,a0
  28:   f190            sd          a2,32(a1)
  2a:   f594            sd          a3,40(a1)
  2c:   f998            sd          a4,48(a1)
  2e:   fd88            sd          a0,56(a1)
  30:   31151793        aes64ks1i   a5,a0,0x1
  34:   7ec78633        aes64ks2    a2,a5,a2
  38:   7ed606b3        aes64ks2    a3,a2,a3
  3c:   31a69793        aes64ks1i   a5,a3,0xa
  40:   7ee78733        aes64ks2    a4,a5,a4
  44:   7ea70533        aes64ks2    a0,a4,a0
  48:   e1b0            sd          a2,64(a1)
  4a:   e5b4            sd          a3,72(a1)
  4c:   e9b8            sd          a4,80(a1)
  4e:   eda8            sd          a0,88(a1)
  50:   31251793        aes64ks1i   a5,a0,0x2
  54:   7ec78633        aes64ks2    a2,a5,a2
  58:   7ed606b3        aes64ks2    a3,a2,a3
  5c:   31a69793        aes64ks1i   a5,a3,0xa
  60:   7ee78733        aes64ks2    a4,a5,a4
  64:   7ea70533        aes64ks2    a0,a4,a0
  68:   f1b0            sd          a2,96(a1)
  6a:   f5b4            sd          a3,104(a1)
  6c:   f9b8            sd          a4,112(a1)
  6e:   fda8            sd          a0,120(a1)
  70:   31351793        aes64ks1i   a5,a0,0x3
  74:   7ec78633        aes64ks2    a2,a5,a2
  78:   7ed606b3        aes64ks2    a3,a2,a3
  7c:   31a69793        aes64ks1i   a5,a3,0xa
  80:   7ee78733        aes64ks2    a4,a5,a4
  84:   7ea70533        aes64ks2    a0,a4,a0
  88:   e1d0            sd          a2,128(a1)
  8a:   e5d4            sd          a3,136(a1)
  8c:   e9d8            sd          a4,144(a1)
  8e:   edc8            sd          a0,152(a1)
  90:   31451793        aes64ks1i   a5,a0,0x4
  94:   7ec78633        aes64ks2    a2,a5,a2
  98:   7ed606b3        aes64ks2    a3,a2,a3
  9c:   31a69793        aes64ks1i   a5,a3,0xa
  a0:   7ee78733        aes64ks2    a4,a5,a4
  a4:   7ea70533        aes64ks2    a0,a4,a0
  a8:   f1d0            sd          a2,160(a1)
  aa:   f5d4            sd          a3,168(a1)
  ac:   f9d8            sd          a4,176(a1)
  ae:   fdc8            sd          a0,184(a1)
  b0:   31551793        aes64ks1i   a5,a0,0x5
  b4:   7ec78633        aes64ks2    a2,a5,a2
  b8:   7ed606b3        aes64ks2    a3,a2,a3
  bc:   31a69793        aes64ks1i   a5,a3,0xa
  c0:   7ee78733        aes64ks2    a4,a5,a4
  c4:   7ea70533        aes64ks2    a0,a4,a0
  c8:   e1f0            sd          a2,192(a1)
  ca:   e5f4            sd          a3,200(a1)
  cc:   e9f8            sd          a4,208(a1)
  ce:   ede8            sd          a0,216(a1)
  d0:   31651513        aes64ks1i   a0,a0,0x6
  d4:   7ec50533        aes64ks2    a0,a0,a2
  d8:   7ed50633        aes64ks2    a2,a0,a3
  dc:   f1e8            sd          a0,224(a1)
  de:   f5f0            sd          a2,232(a1)
  e0:   8082            ret

@rustbot rustbot added the A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. label Sep 19, 2025
@a4lg
Copy link
Contributor Author

a4lg commented Sep 20, 2025

Example (AES-256 decryption only)

Normally, we'll perform AES key scheduling and then conversion for decryption. Let's see what will happen when those two operations are folded together. Note that inverse MixColumns operation (aes64im) is applied to scheduled_key[2..28].

#![no_std]
#![feature(riscv_ext_intrinsics)]

#![allow(clippy::identity_op)]

use core::arch::riscv64::{aes64im, aes64ks1i, aes64ks2};

#[target_feature(enable = "zknd")]
pub fn aes256_key_schedule_on_decryption(key: &[u64; 4], scheduled_key: &mut [u64; 30]) {
    let mut rk0 = key[0];
    let mut rk1 = key[1];
    let mut rk2 = key[2];
    let mut rk3 = key[3];
    scheduled_key[0] = rk0;
    scheduled_key[1] = rk1;
    scheduled_key[2] = aes64im(rk2);
    scheduled_key[3] = aes64im(rk3);
    macro_rules! double_round {
        ($i: expr) => {
            let tmp = aes64ks1i::<$i>(rk3);
            rk0 = aes64ks2(tmp, rk0);
            rk1 = aes64ks2(rk0, rk1);
            let tmp = aes64ks1i::<10>(rk1);
            rk2 = aes64ks2(tmp, rk2);
            rk3 = aes64ks2(rk2, rk3);
            scheduled_key[4 * ($i + 1) + 0] = aes64im(rk0);
            scheduled_key[4 * ($i + 1) + 1] = aes64im(rk1);
            scheduled_key[4 * ($i + 1) + 2] = aes64im(rk2);
            scheduled_key[4 * ($i + 1) + 3] = aes64im(rk3);
        };
    }
    double_round!(0);
    double_round!(1);
    double_round!(2);
    double_round!(3);
    double_round!(4);
    double_round!(5);
    // Process tail
    let tmp = aes64ks1i::<6>(rk3);
    rk0 = aes64ks2(tmp, rk0);
    rk1 = aes64ks2(rk0, rk1);
    scheduled_key[4 * 7 + 0] = rk0;
    scheduled_key[4 * 7 + 1] = rk1;
}

Since the inline assembly implementation is pure, aes64ks1i and aes64ks2 instructions can move around between other instructions.

0000000000000000 <a::aes256_key_schedule_on_decryption::h2c42ad1fddb30384>:
   0:   01053803        ld          a6,16(a0)
   4:   6d14            ld          a3,24(a0)
   6:   6118            ld          a4,0(a0)
   8:   6508            ld          a0,8(a0)
   a:   30081893        aes64im     a7,a6
   e:   30069613        aes64im     a2,a3
  12:   31069793        aes64ks1i   a5,a3,0x0
  16:   e198            sd          a4,0(a1)
  18:   e588            sd          a0,8(a1)
  1a:   0115b823        sd          a7,16(a1)
  1e:   ed90            sd          a2,24(a1)
  20:   7ee78633        aes64ks2    a2,a5,a4
  24:   7ea60533        aes64ks2    a0,a2,a0
  28:   30061893        aes64im     a7,a2
  2c:   31a51793        aes64ks1i   a5,a0,0xa
  30:   30051293        aes64im     t0,a0
  34:   7f0787b3        aes64ks2    a5,a5,a6
  38:   7ed786b3        aes64ks2    a3,a5,a3
  3c:   30079813        aes64im     a6,a5
  40:   30069313        aes64im     t1,a3
  44:   31169713        aes64ks1i   a4,a3,0x1
  48:   0315b023        sd          a7,32(a1)
  4c:   0255b423        sd          t0,40(a1)
  50:   0305b823        sd          a6,48(a1)
  54:   0265bc23        sd          t1,56(a1)
  58:   7ec70633        aes64ks2    a2,a4,a2
  5c:   7ea60533        aes64ks2    a0,a2,a0
  60:   30061813        aes64im     a6,a2
  64:   31a51713        aes64ks1i   a4,a0,0xa
  68:   30051893        aes64im     a7,a0
  6c:   7ef70733        aes64ks2    a4,a4,a5
  70:   7ed706b3        aes64ks2    a3,a4,a3
  74:   30071293        aes64im     t0,a4
  78:   30069313        aes64im     t1,a3
  7c:   31269793        aes64ks1i   a5,a3,0x2
  80:   0505b023        sd          a6,64(a1)
  84:   0515b423        sd          a7,72(a1)
  88:   0455b823        sd          t0,80(a1)
  8c:   0465bc23        sd          t1,88(a1)
  90:   7ec78633        aes64ks2    a2,a5,a2
  94:   7ea60533        aes64ks2    a0,a2,a0
  98:   30061813        aes64im     a6,a2
  9c:   31a51793        aes64ks1i   a5,a0,0xa
  a0:   30051893        aes64im     a7,a0
  a4:   7ee78733        aes64ks2    a4,a5,a4
  a8:   7ed706b3        aes64ks2    a3,a4,a3
  ac:   30071293        aes64im     t0,a4
  b0:   30069313        aes64im     t1,a3
  b4:   31369793        aes64ks1i   a5,a3,0x3
  b8:   0705b023        sd          a6,96(a1)
  bc:   0715b423        sd          a7,104(a1)
  c0:   0655b823        sd          t0,112(a1)
  c4:   0665bc23        sd          t1,120(a1)
  c8:   7ec78633        aes64ks2    a2,a5,a2
  cc:   7ea60533        aes64ks2    a0,a2,a0
  d0:   30061813        aes64im     a6,a2
  d4:   31a51793        aes64ks1i   a5,a0,0xa
  d8:   30051893        aes64im     a7,a0
  dc:   7ee78733        aes64ks2    a4,a5,a4
  e0:   7ed706b3        aes64ks2    a3,a4,a3
  e4:   30071293        aes64im     t0,a4
  e8:   30069313        aes64im     t1,a3
  ec:   31469793        aes64ks1i   a5,a3,0x4
  f0:   0905b023        sd          a6,128(a1)
  f4:   0915b423        sd          a7,136(a1)
  f8:   0855b823        sd          t0,144(a1)
  fc:   0865bc23        sd          t1,152(a1)
 100:   7ec78633        aes64ks2    a2,a5,a2
 104:   7ea60533        aes64ks2    a0,a2,a0
 108:   30061813        aes64im     a6,a2
 10c:   31a51793        aes64ks1i   a5,a0,0xa
 110:   30051893        aes64im     a7,a0
 114:   7ee78733        aes64ks2    a4,a5,a4
 118:   7ed706b3        aes64ks2    a3,a4,a3
 11c:   30071293        aes64im     t0,a4
 120:   30069313        aes64im     t1,a3
 124:   31569793        aes64ks1i   a5,a3,0x5
 128:   0b05b023        sd          a6,160(a1)
 12c:   0b15b423        sd          a7,168(a1)
 130:   0a55b823        sd          t0,176(a1)
 134:   0a65bc23        sd          t1,184(a1)
 138:   7ec78633        aes64ks2    a2,a5,a2
 13c:   7ea60533        aes64ks2    a0,a2,a0
 140:   30061813        aes64im     a6,a2
 144:   31a51793        aes64ks1i   a5,a0,0xa
 148:   30051893        aes64im     a7,a0
 14c:   7ee78733        aes64ks2    a4,a5,a4
 150:   7ed706b3        aes64ks2    a3,a4,a3
 154:   30071713        aes64im     a4,a4
 158:   30069793        aes64im     a5,a3
 15c:   31669693        aes64ks1i   a3,a3,0x6
 160:   0d05b023        sd          a6,192(a1)
 164:   0d15b423        sd          a7,200(a1)
 168:   e9f8            sd          a4,208(a1)
 16a:   edfc            sd          a5,216(a1)
 16c:   7ec68633        aes64ks2    a2,a3,a2
 170:   7ea60533        aes64ks2    a0,a2,a0
 174:   f1f0            sd          a2,224(a1)
 176:   f5e8            sd          a0,232(a1)
 178:   8082            ret

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. O-riscv Target: RISC-V architecture S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants