From 091d21cefc45682554e082f15b11e0e755dbda27 Mon Sep 17 00:00:00 2001 From: Filip Navara Date: Wed, 2 Apr 2025 17:31:00 +0200 Subject: [PATCH 1/9] Change win-x86 configuration to build with FEATURE_EH_FUNCLETS --- src/coreclr/clr.featuredefines.props | 5 +---- src/coreclr/clrdefinitions.cmake | 4 +--- src/coreclr/jit/targetx86.h | 3 --- 3 files changed, 2 insertions(+), 10 deletions(-) diff --git a/src/coreclr/clr.featuredefines.props b/src/coreclr/clr.featuredefines.props index 883d0bda0f5574..1e4050c8d56e0a 100644 --- a/src/coreclr/clr.featuredefines.props +++ b/src/coreclr/clr.featuredefines.props @@ -2,6 +2,7 @@ true true + true true @@ -22,10 +23,6 @@ true - - true - - true diff --git a/src/coreclr/clrdefinitions.cmake b/src/coreclr/clrdefinitions.cmake index c7f6ae41d4d8aa..ee5fba323c9c6d 100644 --- a/src/coreclr/clrdefinitions.cmake +++ b/src/coreclr/clrdefinitions.cmake @@ -206,9 +206,7 @@ if(CLR_CMAKE_TARGET_WIN32) endif(CLR_CMAKE_TARGET_ARCH_AMD64 OR CLR_CMAKE_TARGET_ARCH_I386) endif(CLR_CMAKE_TARGET_WIN32) -if (NOT CLR_CMAKE_TARGET_ARCH_I386 OR NOT CLR_CMAKE_TARGET_WIN32) - add_compile_definitions($<$>>:FEATURE_EH_FUNCLETS>) -endif (NOT CLR_CMAKE_TARGET_ARCH_I386 OR NOT CLR_CMAKE_TARGET_WIN32) +add_compile_definitions($<$>>:FEATURE_EH_FUNCLETS>) if (CLR_CMAKE_TARGET_WIN32 AND (CLR_CMAKE_TARGET_ARCH_AMD64 OR CLR_CMAKE_TARGET_ARCH_ARM64)) add_definitions(-DFEATURE_SPECIAL_USER_MODE_APC) diff --git a/src/coreclr/jit/targetx86.h b/src/coreclr/jit/targetx86.h index e630a1ae842120..ac0130f5d274ce 100644 --- a/src/coreclr/jit/targetx86.h +++ b/src/coreclr/jit/targetx86.h @@ -52,9 +52,6 @@ // target #define FEATURE_EH 1 // To aid platform bring-up, eliminate exceptional EH clauses (catch, filter, // filter-handler, fault) and directly execute 'finally' clauses. -#if !defined(UNIX_X86_ABI) - #define FEATURE_EH_WINDOWS_X86 1 // Enable support for SEH regions -#endif #define ETW_EBP_FRAMED 1 // if 1 we cannot use EBP as a scratch register and must create EBP based // frames for most methods #define CSE_CONSTS 1 // Enable if we want to CSE constants From 0f0d750d1e1dc3df108de4b7b53ef6934fe37a1f Mon Sep 17 00:00:00 2001 From: Filip Navara Date: Sat, 24 May 2025 06:12:02 +0200 Subject: [PATCH 2/9] Update CLR ABI document --- docs/design/coreclr/botr/clr-abi.md | 102 +--------------------------- 1 file changed, 2 insertions(+), 100 deletions(-) diff --git a/docs/design/coreclr/botr/clr-abi.md b/docs/design/coreclr/botr/clr-abi.md index 32c89100205049..8d872db6bc188f 100644 --- a/docs/design/coreclr/botr/clr-abi.md +++ b/docs/design/coreclr/botr/clr-abi.md @@ -218,14 +218,10 @@ This section describes the conventions the JIT needs to follow when generating c ## Funclets -For all platforms except Windows/x86 on CoreCLR, all managed EH handlers (finally, fault, filter, filter-handler, and catch) are extracted into their own 'funclets'. To the OS they are treated just like first class functions (separate PDATA and XDATA (`RUNTIME_FUNCTION` entry), etc.). The CLR currently treats them just like part of the parent function in many ways. The main function and all funclets must be allocated in a single code allocation (see hot cold splitting). They 'share' GC info. Only the main function prolog can be hot patched. +For all platforms, managed EH handlers (finally, fault, filter, filter-handler, and catch) are extracted into their own 'funclets'. To the OS they are treated just like first class functions (separate PDATA and XDATA (`RUNTIME_FUNCTION` entry), etc.). The CLR currently treats them just like part of the parent function in many ways. The main function and all funclets must be allocated in a single code allocation (see hot cold splitting). They 'share' GC info. Only the main function prolog can be hot patched. The only way to enter a handler funclet is via a call. In the case of an exception, the call is from the VM's EH subsystem as part of exception dispatch/unwind. In the non-exceptional case, this is called local unwind or a non-local exit. In C# this is accomplished by simply falling-through/out of a try body or an explicit goto. In IL this is always accomplished via a LEAVE opcode, within a try body, targeting an IL offset outside the try body. In such cases the call is from the JITed code of the parent function. -For Windows/x86 on CoreCLR, all handlers are generated within the method body, typically in lexical order. A nested try/catch is generated completely within the EH region in which it is nested. These handlers are essentially "in-line funclets", but they do not look like normal functions: they do not have a normal prolog or epilog, although they do have special entry/exit and register conventions. Also, nested handlers are not un-nested as for funclets: the code for a nested handler is generated within the handler in which it is nested. - -For Windows/x86 on NativeAOT and Linux/x86, funclets are used just like on other platforms. - ## Cloned finallys RyuJIT attempts to speed the normal control flow by 'inlining' a called finally along the 'normal' control flow (i.e., leaving a try body in a non-exceptional manner via C# fall-through). This optimization is supported on all architectures. @@ -234,7 +230,7 @@ RyuJIT attempts to speed the normal control flow by 'inlining' a called finally In order to have proper forward progress and `Thread.Abort` semantics, there are restrictions on where a call-to-finally can be, and what the call site must look like. The return address can **NOT** be in the corresponding try body (otherwise the VM would think the finally protects itself). The return address **MUST** be within any outer protected region (so exceptions from the finally body are properly handled). -JIT64, and RyuJIT for non-x86, creates something similar to a jump island: a block of code outside the try body that calls the finally and then branches to the final target of the leave/non-local-exit. This jump island is then marked in the EH tables as if it were a cloned finally. The cloned finally clause prevents a Thread.Abort from firing before entering the handler. By having the return address outside of the try body we satisfy the other constraint. +RyuJIT creates something similar to a jump island: a block of code outside the try body that calls the finally and then branches to the final target of the leave/non-local-exit. This jump island is then marked in the EH tables as if it were a cloned finally. The cloned finally clause prevents a Thread.Abort from firing before entering the handler. By having the return address outside of the try body we satisfy the other constraint. ## ThreadAbortException considerations @@ -388,100 +384,6 @@ Any register value changes made in the funclet are lost. If a funclet wants to m Funclets are not required to preserve non-volatile registers. -## Windows/x86 EH considerations - -The Windows/x86 model is somewhat different than non-Windows/x86 model. Windows/X86-specific concerns are mentioned here. - -### catch / filter-handler regions - -When leaving a `catch` or `filter-handler` region, the JIT calls the helper `CORINFO_JIT_ENDCATCH` (implemented in the VM by the `JIT_EndCatch` function) before transferring control to the target location. The code to call to `CORINFO_JIT_ENDCATCH` is within the catch region itself. - -### finally / fault regions - -"finally" clauses are invoked in the non-exceptional code by the generated JIT code, and in the exceptional case by the VM. "fault" clauses are only executed in exceptional cases by the VM. - -On entry to the finally or fault, the top of the stack is the address that should be jumped to on exit from the finally, using a "pop eax; jmp eax" sequence. A simple 'ret' could be used, but we avoid it to avoid potentially creating an unbalanced processor call/ret buffer stack, and messing up call/ret prediction. - -There are no register or other stack arguments to a 'finally' or 'fault'. - -### ShadowSP slots - -X86 exception handlers (e.g., catch, finally) do not establish their own frames. They don't (really) have prologs and epilogs. However, they do use the stack, and need to restore the stack pointer of the enclosing exception handling region when the handler completes executing. - -To implement this requirement, for any function with EH, we create a frame-local variable to store a stack of "Shadow SP" values, or ShadowSP slots. In the JIT, the local var is called lvaShadowSPslotsVar, and in dumps it is called "EHSlots". The variable is created in lvaMarkLocalVars() and is sized as follows: -1. 1 slot is reserved for the VM (for ICodeManager::FixContext(ppEndRegion)). -2. 1 slot for each handler nesting level (total: ehMaxHndNestingCount). -3. 1 slot for a filter (we do this even if there aren't any filters; size optimization opportunity to not do this if there are no filters?) -4. 1 slot for zero termination - -Note that the since a slot on x86 is 4 bytes, the minimum size is 16 bytes. The idea is to have 1 slot for each handler that could be possibly be invoked at the same time. For example, for: - -```cs - try { - ... - } catch { - try { - ... - } catch { - ... - } - } -``` - -When the inner 'catch' is running, the outer 'catch' is also conceptually "on the stack", or in the middle of execution. So the maximum handler nesting count would be 2. - -The ShadowSP slots are filled in from the highest address downwards to the lowest address. The highest slot is reserved. The first address with a zero is a zero terminator. So, we always zero terminate by setting the second-to-highest slot to zero in the function prolog (if we didn't zero initialize all locals anyway). - -When calling a finally, we set the appropriate level to 0xFC (aka "finally call") and zero terminate the next-lower address. - -Thus, calling a finally from JIT generated code looks like: - -```asm - mov dword ptr [L_02+0x4 ebp-10H], 0 // This must happen before the 0xFC is written - mov dword ptr [L_02+0x8 ebp-0CH], 252 // 0xFC - push G_M52300_IG07 - jmp SHORT G_M52300_IG04 -``` - -In this case, `G_M52300_IG07` is not the address after the 'jmp', so a simple 'call' wouldn't work. - -The code this finally returns to looks like this: - -```asm - mov dword ptr [L_02+0x8 ebp-0CH], 0 - jmp SHORT G_M52300_IG05 -``` - -In this case, it zeros out the ShadowSP slot that it previously set to 0xFC, then jumps to the address that is the actual target of the leave from the finally. - -The JIT does this "end finally restore" by creating a GT_END_LFIN tree node, with the appropriate EH region ID as an operand, that generates this code. - -In the case of an exceptional 'finally' invocation, the VM sets up the 'return address' to whatever address it wants the JIT to return to. - -For catch handlers, the VM is completely in control of filling and reading the ShadowSP slots; the JIT just makes sure there is enough space. - -### ShadowSP slots frame location - -The ShadowSP slots are required to live in a very particular location, reported via the GC info header. Note that the GC info header does not contain an actual pointer or offset to the ShadowSP slots variable. Instead, the VM calculates the location from other data that does exist in the GC info header, as a negative offset from the EBP frame pointer (which must be established in functions with EH) using the function `GetFirstBaseSPslotPtr()` / `GetStartShadowSPSlotsOffset()`. The VM thus assumes the following frame layout: - -1. callee-saved registers <= EBP points to the top of this range -2. GS cookie -3. 1 slot if localloc is used (Saved localloc SP?) -4. 1 slot for CORINFO_GENERICS_CTXT_FROM_PARAMTYPEARG -- assumed for any function with EH, to avoid adding a flag to the GC info about whether it exists or not. -5. ShadowSP slots - -(note, these don't have to be in this order for this calculation, but they possibly do need to be in this order for other calculations.) See also `GetEndShadowSPSlotsOffset()`. - -The VM walks the ShadowSP slots in the function `GetHandlerFrameInfo()`, and sets it in various functions such as `EECodeManager::FixContext()`. - -### JIT implementation: finally - -An aside on the JIT implementation for x86. - -The JIT creates BBJ_CALLFINALLY/BBJ_ALWAYS pairs for calling the 'finally' clause. The BBJ_CALLFINALLY block will have a series of CORINFO_JIT_ENDCATCH calls appended at the end, if we need to "leave" a series of nested catches before calling the finally handler (due to a single 'leave' opcode attempting to leave multiple levels of different types of handlers). Then, a GT_END_LFIN statement with EH region ID as an argument is added to the step block where the finally returns to. This is used to generate code to zero out the appropriate level of the ShadowSP slot array after the finally has been executed and the final EH nesting depth is known. The BBJ_CALLFINALLY block itself generates the code to insert the 0xFC value into the ShadowSP slot array. If the 'finally' is invoked by the VM, in exceptional cases, then the VM itself updates the ShadowSP slot array before invoking the 'finally'. - -At the end of a finally or filter, a GT_RETFILT is inserted. For a finally, this is a TYP_VOID which is just a placeholder. For a filter, it takes an argument which evaluates to the return value from the filter. On legacy JIT, this tree triggers the generation of both the return value load (for filters) and the "funclet" exit sequence, which is either a "pop eax; jmp eax" for a finally, or a "ret" for a filter. When processing the BBJ_EHFINALLYRET or BBJ_EHFILTERRET block itself (at the end of code generation for the block), nothing is generated. In RyuJIT, the GT_RETFILT only loads up the return value (for filters) and does nothing for finally, and the block type processing after all the tree processing triggers the exit sequence to be generated. There is no real difference between these, except to centralize all "exit sequence" generation in the same place. - # EH Info, GC Info, and Hot & Cold Splitting All GC info offsets and EH info offsets treat the function and funclets as if it was one big method body. Thus all offsets are relative to the start of the main method. Funclets are assumed to always be at the end of (after) all of the main function code. Thus if the main function has any cold code, all funclets must be cold. Or conversely, if there is any hot funclet code, all of the main method must be hot. From 358970ff53017c22e2829696f0e4e4c85eb0cf46 Mon Sep 17 00:00:00 2001 From: Filip Navara Date: Sat, 24 May 2025 06:19:37 +0200 Subject: [PATCH 3/9] Bump R2R version --- src/coreclr/inc/readytorun.h | 11 +++++++++-- src/coreclr/nativeaot/Runtime/inc/ModuleHeaders.h | 4 ++-- .../tools/Common/Internal/Runtime/ModuleHeaders.cs | 4 ++-- 3 files changed, 13 insertions(+), 6 deletions(-) diff --git a/src/coreclr/inc/readytorun.h b/src/coreclr/inc/readytorun.h index a86b6ad7838b69..6aaa43e9175e96 100644 --- a/src/coreclr/inc/readytorun.h +++ b/src/coreclr/inc/readytorun.h @@ -19,10 +19,15 @@ // src/coreclr/nativeaot/Runtime/inc/ModuleHeaders.h // If you update this, ensure you run `git grep MINIMUM_READYTORUN_MAJOR_VERSION` // and handle pending work. -#define READYTORUN_MAJOR_VERSION 13 -#define READYTORUN_MINOR_VERSION 0x0001 +#define READYTORUN_MAJOR_VERSION 14 +#define READYTORUN_MINOR_VERSION 0x0000 +// Remove the x86 special case once the general minimum version is bumped +#ifdef TARGET_X86 +#define MINIMUM_READYTORUN_MAJOR_VERSION 14 +#else #define MINIMUM_READYTORUN_MAJOR_VERSION 13 +#endif // R2R Version 2.1 adds the InliningInfo section // R2R Version 2.2 adds the ProfileDataInfo section @@ -43,6 +48,8 @@ // R2R Version 13 removes usage of PSPSym, changes ABI for funclets to match NativeAOT, changes register for // exception parameter on AMD64, and redefines generics instance context stack slot in GCInfo v4 // to be SP/FP relative +// R2R Version 13.1 added long/ulong to float helper calls +// R2R Version 14 changed x86 code generation to use funclets struct READYTORUN_CORE_HEADER { diff --git a/src/coreclr/nativeaot/Runtime/inc/ModuleHeaders.h b/src/coreclr/nativeaot/Runtime/inc/ModuleHeaders.h index f8efbe166ff219..2b7464aa4ea97c 100644 --- a/src/coreclr/nativeaot/Runtime/inc/ModuleHeaders.h +++ b/src/coreclr/nativeaot/Runtime/inc/ModuleHeaders.h @@ -11,8 +11,8 @@ struct ReadyToRunHeaderConstants { static const uint32_t Signature = 0x00525452; // 'RTR' - static const uint32_t CurrentMajorVersion = 13; - static const uint32_t CurrentMinorVersion = 1; + static const uint32_t CurrentMajorVersion = 14; + static const uint32_t CurrentMinorVersion = 0; }; struct ReadyToRunHeader diff --git a/src/coreclr/tools/Common/Internal/Runtime/ModuleHeaders.cs b/src/coreclr/tools/Common/Internal/Runtime/ModuleHeaders.cs index a0d0370b253c33..e8fdf030893432 100644 --- a/src/coreclr/tools/Common/Internal/Runtime/ModuleHeaders.cs +++ b/src/coreclr/tools/Common/Internal/Runtime/ModuleHeaders.cs @@ -15,8 +15,8 @@ internal struct ReadyToRunHeaderConstants { public const uint Signature = 0x00525452; // 'RTR' - public const ushort CurrentMajorVersion = 13; - public const ushort CurrentMinorVersion = 1; + public const ushort CurrentMajorVersion = 14; + public const ushort CurrentMinorVersion = 0; } #if READYTORUN #pragma warning disable 0169 From 7a2bb27e709009fd16d372c560bb6b672d01b05f Mon Sep 17 00:00:00 2001 From: Filip Navara Date: Sat, 24 May 2025 06:21:41 +0200 Subject: [PATCH 4/9] Update JIT-EE interface GUID --- src/coreclr/inc/jiteeversionguid.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/coreclr/inc/jiteeversionguid.h b/src/coreclr/inc/jiteeversionguid.h index 5e5f8eb5637a4c..7f6aa4cdc66f1e 100644 --- a/src/coreclr/inc/jiteeversionguid.h +++ b/src/coreclr/inc/jiteeversionguid.h @@ -37,11 +37,11 @@ #include -constexpr GUID JITEEVersionIdentifier = { /* 7ce8764d-ac60-4e05-a6e4-448c1eb8cf35 */ - 0x7ce8764d, - 0xac60, - 0x4e05, - {0xa6, 0xe4, 0x44, 0x8c, 0x1e, 0xb8, 0xcf, 0x35} +constexpr GUID JITEEVersionIdentifier = { /* cb23fc5b-e31d-49cb-b4e0-b5666496b4fe */ + 0xcb23fc5b, + 0xe31d, + 0x49cb, + {0xb4, 0xe0, 0xb5, 0x66, 0x64, 0x96, 0xb4, 0xfe} }; #endif // JIT_EE_VERSIONING_GUID_H From cdbe9d20ac7bcec1c2b3387ddff6da28f90c88a1 Mon Sep 17 00:00:00 2001 From: Filip Navara Date: Sat, 24 May 2025 15:40:50 +0200 Subject: [PATCH 5/9] Report funclet prologs and epilogs as no-GC regions in x86 GC info --- src/coreclr/jit/emitinl.h | 6 +++--- src/coreclr/jit/emitpub.h | 2 +- src/coreclr/jit/gcencode.cpp | 2 +- src/coreclr/jit/gcinfo.cpp | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/src/coreclr/jit/emitinl.h b/src/coreclr/jit/emitinl.h index 02aab9d4ed910e..b3e5fd0246ccf6 100644 --- a/src/coreclr/jit/emitinl.h +++ b/src/coreclr/jit/emitinl.h @@ -585,15 +585,15 @@ inline bool insIsCMOV(instruction ins) * false. Returns the final result of the callback. */ template -bool emitter::emitGenNoGCLst(Callback& cb, bool skipAllPrologsAndEpilogs /* = false */) +bool emitter::emitGenNoGCLst(Callback& cb, bool skipMainPrologsAndEpilogs /* = false */) { for (insGroup* ig = emitIGlist; ig; ig = ig->igNext) { - if (skipAllPrologsAndEpilogs) + if (skipMainPrologsAndEpilogs) { if (ig == emitPrologIG) continue; - if (ig->igFlags & (IGF_EPILOG | IGF_FUNCLET_PROLOG | IGF_FUNCLET_EPILOG)) + if (ig->igFlags & IGF_EPILOG) continue; } if ((ig->igFlags & IGF_NOGCINTERRUPT) && ig->igSize > 0) diff --git a/src/coreclr/jit/emitpub.h b/src/coreclr/jit/emitpub.h index 5725c04e6f6352..037ea04bccf6ee 100644 --- a/src/coreclr/jit/emitpub.h +++ b/src/coreclr/jit/emitpub.h @@ -43,7 +43,7 @@ unsigned emitEndCodeGen(Compiler* comp, unsigned emitGetEpilogCnt(); template -bool emitGenNoGCLst(Callback& cb, bool skipAllPrologsAndEpilogs = false); +bool emitGenNoGCLst(Callback& cb, bool skipMainPrologsAndEpilogs = false); void emitBegProlog(); unsigned emitGetPrologOffsetEstimate(); diff --git a/src/coreclr/jit/gcencode.cpp b/src/coreclr/jit/gcencode.cpp index a07279ee98bb60..f1eb7cec6b2250 100644 --- a/src/coreclr/jit/gcencode.cpp +++ b/src/coreclr/jit/gcencode.cpp @@ -2223,7 +2223,7 @@ size_t GCInfo::gcMakeRegPtrTable(BYTE* dest, int mask, const InfoHdr& header, un if (header.noGCRegionCnt != 0) { NoGCRegionEncoder encoder(mask != 0 ? dest : NULL); - compiler->GetEmitter()->emitGenNoGCLst(encoder, /* skipAllPrologsAndEpilogs = */ true); + compiler->GetEmitter()->emitGenNoGCLst(encoder, /* skipMainPrologsAndEpilogs = */ true); totalSize += encoder.totalSize; if (mask != 0) dest += encoder.totalSize; diff --git a/src/coreclr/jit/gcinfo.cpp b/src/coreclr/jit/gcinfo.cpp index 3f6f5169d3950a..ee08413e714c83 100644 --- a/src/coreclr/jit/gcinfo.cpp +++ b/src/coreclr/jit/gcinfo.cpp @@ -585,7 +585,7 @@ void GCInfo::gcCountForHeader(UNALIGNED unsigned int* pUntrackedCount, if (compiler->codeGen->GetInterruptible()) { NoGCRegionCounter counter; - compiler->GetEmitter()->emitGenNoGCLst(counter, /* skipAllPrologsAndEpilogs = */ true); + compiler->GetEmitter()->emitGenNoGCLst(counter, /* skipMainPrologsAndEpilogs = */ true); noGCRegionCount = counter.noGCRegionCount; } From b9cd5acf528efb703963e45bc3de8b01450ce541 Mon Sep 17 00:00:00 2001 From: Filip Navara Date: Sun, 25 May 2025 06:41:41 +0200 Subject: [PATCH 6/9] Fix isFilterFunclet code path in EnumGcRefsX86 --- src/coreclr/vm/gc_unwind_x86.inl | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/src/coreclr/vm/gc_unwind_x86.inl b/src/coreclr/vm/gc_unwind_x86.inl index 62f18a693cf2c5..a5533019079d1d 100644 --- a/src/coreclr/vm/gc_unwind_x86.inl +++ b/src/coreclr/vm/gc_unwind_x86.inl @@ -3676,7 +3676,15 @@ bool EnumGcRefsX86(PREGDISPLAY pContext, // Filters are the only funclet that run during the 1st pass, and must have // both the leaf and the parent frame reported. In order to avoid double // reporting of the untracked variables, do not report them for the filter. - if (!isFilterFunclet) + if (isFilterFunclet) + { + count = info.untrackedCnt; + while (count-- > 0) + { + fastSkipSigned(table); + } + } + else #endif // FEATURE_EH_FUNCLETS { count = info.untrackedCnt; @@ -3734,7 +3742,6 @@ bool EnumGcRefsX86(PREGDISPLAY pContext, info.ebpFrame ? EBP - ptrAddr : ptrAddr - ESP, true))); } - } #if VERIFY_GC_TABLES From f21b6752fba33e3ac7ffae7a5d292d682c93303f Mon Sep 17 00:00:00 2001 From: Filip Navara Date: Sun, 25 May 2025 12:25:05 +0200 Subject: [PATCH 7/9] Implement ShouldParentToFuncletUseUnwindTargetLocationForGCReporting code path on x86 --- src/coreclr/inc/gc_unwind_x86.h | 5 ++++ src/coreclr/vm/gc_unwind_x86.inl | 45 ++++++++++++++++++++++++++++++ src/coreclr/vm/gcenv.ee.common.cpp | 30 ++++++++++++++++---- 3 files changed, 75 insertions(+), 5 deletions(-) diff --git a/src/coreclr/inc/gc_unwind_x86.h b/src/coreclr/inc/gc_unwind_x86.h index 46d44afec4f6c1..74fe4e753972a0 100644 --- a/src/coreclr/inc/gc_unwind_x86.h +++ b/src/coreclr/inc/gc_unwind_x86.h @@ -409,4 +409,9 @@ bool IsInNoGCRegion(hdrInfo * infoPtr, PTR_CBYTE table, unsigned curOffset); +unsigned FindFirstInterruptiblePoint(hdrInfo * infoPtr, + PTR_CBYTE table, + unsigned offs, + unsigned endOffs); + #endif // _UNWIND_X86_H diff --git a/src/coreclr/vm/gc_unwind_x86.inl b/src/coreclr/vm/gc_unwind_x86.inl index a5533019079d1d..c92986d358ab24 100644 --- a/src/coreclr/vm/gc_unwind_x86.inl +++ b/src/coreclr/vm/gc_unwind_x86.inl @@ -737,6 +737,51 @@ bool IsInNoGCRegion(hdrInfo * infoPtr, return false; } +/*****************************************************************************/ +unsigned FindFirstInterruptiblePoint(hdrInfo * infoPtr, + PTR_CBYTE table, + unsigned offs, + unsigned endOffs) +{ + if (!infoPtr->interruptible) + return -1; + + _ASSERTE(offs < endOffs); + + // NOTE: We do not check for prologs and epilogs of the main function body. Only + // explicit no-GC regions are considered. + + if (infoPtr->noGCRegionCnt == 0) + return offs; + +#if VERIFY_GC_TABLES + _ASSERTE(*castto(table, unsigned short *)++ == 0xBEEF); +#endif + + unsigned count = infoPtr->noGCRegionCnt; + while (count-- > 0) { + unsigned regionOffset = fastDecodeUnsigned(table); + if (offs < regionOffset) + { + // Offset is lower than the region start. Since the regions come in + // order we can assume that the offset is in GC safe region. + return offs; + } + + unsigned regionSize = fastDecodeUnsigned(table); + if (offs - regionOffset < regionSize) + { + // Offset is inside the no-GC region, so move it past the region and + // check if we still can match something in next iteration. + offs = regionOffset + regionSize; + if (offs >= endOffs) + return -1; + } + } + + return offs; +} + /*****************************************************************************/ static PTR_CBYTE skipToArgReg(const hdrInfo& info, PTR_CBYTE table) diff --git a/src/coreclr/vm/gcenv.ee.common.cpp b/src/coreclr/vm/gcenv.ee.common.cpp index 21723724cc5f02..2f88f2b501b656 100644 --- a/src/coreclr/vm/gcenv.ee.common.cpp +++ b/src/coreclr/vm/gcenv.ee.common.cpp @@ -5,7 +5,8 @@ #include "gcenv.h" #include -#if defined(FEATURE_EH_FUNCLETS) && defined(USE_GC_INFO_DECODER) +#if defined(FEATURE_EH_FUNCLETS) +#if defined(USE_GC_INFO_DECODER) struct FindFirstInterruptiblePointState { @@ -70,8 +71,22 @@ unsigned FindFirstInterruptiblePoint(CrawlFrame* pCF, unsigned offs, unsigned en return state.returnOffs; } +#else // USE_GC_INFO_DECODER -#endif // FEATURE_EH_FUNCLETS && USE_GC_INFO_DECODER +// Find the first interruptible point in the range [offs .. endOffs) (the beginning of the range is inclusive, +// the end is exclusive). Return -1 if no such point exists. +unsigned FindFirstInterruptiblePoint(CrawlFrame* pCF, unsigned offs, unsigned endOffs) +{ + EECodeInfo *codeInfo = pCF->GetCodeInfo(); + hdrInfo info = { 0 }; + PTR_CBYTE table = codeInfo->DecodeGCHdrInfo(&info, offs); + // NOTE: We make an assumption that we are not searching for the main function so we can ignore the + // prolog/epilog info. + return ::FindFirstInterruptiblePoint(&info, table, offs, endOffs); +} + +#endif +#endif // FEATURE_EH_FUNCLETS //----------------------------------------------------------------------------- // Determine whether we should report the generic parameter context @@ -317,16 +332,21 @@ StackWalkAction GcStackCrawlCallBack(CrawlFrame* pCF, VOID* pData) #endif // _DEBUG DWORD relOffsetOverride = NO_OVERRIDE_OFFSET; -#if defined(FEATURE_EH_FUNCLETS) && defined(USE_GC_INFO_DECODER) +#if defined(FEATURE_EH_FUNCLETS) if (pCF->ShouldParentToFuncletUseUnwindTargetLocationForGCReporting()) { + bool wantsReportOnlyLeaf = true; + +#if defined(USE_GC_INFO_DECODER) GCInfoToken gcInfoToken = pCF->GetGCInfoToken(); GcInfoDecoder _gcInfoDecoder( gcInfoToken, DECODE_CODE_LENGTH ); + wantsReportOnlyLeaf = _gcInfoDecoder.WantsReportOnlyLeaf(); +#endif // USE_GC_INFO_DECODER - if(_gcInfoDecoder.WantsReportOnlyLeaf()) + if (wantsReportOnlyLeaf) { // We're in a special case of unwinding from a funclet, and resuming execution in // another catch funclet associated with same parent function. We need to report roots. @@ -347,7 +367,7 @@ StackWalkAction GcStackCrawlCallBack(CrawlFrame* pCF, VOID* pData) } } -#endif // FEATURE_EH_FUNCLETS && USE_GC_INFO_DECODER +#endif // FEATURE_EH_FUNCLETS pCM->EnumGcRefs(pCF->GetRegisterSet(), pCF->GetCodeInfo(), From 58f1050079fa775873b8a425737f9155de275c7d Mon Sep 17 00:00:00 2001 From: Filip Navara Date: Sun, 25 May 2025 21:51:39 +0200 Subject: [PATCH 8/9] Fix hasReversePInvoke condition not to trigger when unwinding from funclet --- src/coreclr/vm/stackwalk.cpp | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/src/coreclr/vm/stackwalk.cpp b/src/coreclr/vm/stackwalk.cpp index 6e903da8dcae46..1226e99c162897 100644 --- a/src/coreclr/vm/stackwalk.cpp +++ b/src/coreclr/vm/stackwalk.cpp @@ -3153,9 +3153,19 @@ void StackFrameIterator::PostProcessingForManagedFrames(void) #endif // ELIMINATE_FEF #ifdef TARGET_X86 +#ifdef FEATURE_EH_FUNCLETS + bool hasReversePInvoke = false; + if (!m_crawl.codeInfo.IsFunclet()) + { + hdrInfo *gcHdrInfo; + m_crawl.codeInfo.DecodeGCHdrInfo(&gcHdrInfo); + hasReversePInvoke = gcHdrInfo->revPInvokeOffset != INVALID_REV_PINVOKE_OFFSET; + } +#else hdrInfo *gcHdrInfo; m_crawl.codeInfo.DecodeGCHdrInfo(&gcHdrInfo); bool hasReversePInvoke = gcHdrInfo->revPInvokeOffset != INVALID_REV_PINVOKE_OFFSET; +#endif // FEATURE_EH_FUNCLETS #endif // TARGET_X86 ProcessIp(GetControlPC(m_crawl.pRD)); From 03b9eeb92d96a4cb34ff5352c98bcdae2274be05 Mon Sep 17 00:00:00 2001 From: Filip Navara Date: Sun, 25 May 2025 22:05:02 +0200 Subject: [PATCH 9/9] Cleanup usages of WantsReportOnlyLeaf --- src/coreclr/vm/eetwain.cpp | 20 ------------- src/coreclr/vm/gcenv.ee.common.cpp | 47 ++++++++++-------------------- src/coreclr/vm/gcinfodecoder.cpp | 2 +- src/coreclr/vm/stackwalk.cpp | 25 ++++------------ 4 files changed, 22 insertions(+), 72 deletions(-) diff --git a/src/coreclr/vm/eetwain.cpp b/src/coreclr/vm/eetwain.cpp index 55a595e4b14e20..8933ebd37117ae 100644 --- a/src/coreclr/vm/eetwain.cpp +++ b/src/coreclr/vm/eetwain.cpp @@ -1290,16 +1290,6 @@ bool EECodeManager::EnumGcRefs( PREGDISPLAY pRD, if (relOffsetOverride != NO_OVERRIDE_OFFSET) { // We've been given an override offset for GC Info -#ifdef _DEBUG - GcInfoDecoder _gcInfoDecoder( - gcInfoToken, - DECODE_CODE_LENGTH - ); - - // We only use override offset for wantsReportOnlyLeaf - _ASSERTE(_gcInfoDecoder.WantsReportOnlyLeaf()); -#endif // _DEBUG - curOffs = relOffsetOverride; #ifdef TARGET_ARM @@ -2505,16 +2495,6 @@ bool InterpreterCodeManager::EnumGcRefs(PREGDISPLAY pContext, if (relOffsetOverride != NO_OVERRIDE_OFFSET) { // We've been given an override offset for GC Info -#ifdef _DEBUG - InterpreterGcInfoDecoder _gcInfoDecoder( - gcInfoToken, - DECODE_CODE_LENGTH - ); - - // We only use override offset for wantsReportOnlyLeaf - _ASSERTE(_gcInfoDecoder.WantsReportOnlyLeaf()); -#endif // _DEBUG - curOffs = relOffsetOverride; LOG((LF_GCINFO, LL_INFO1000, "Adjusted GC reporting offset to provided override offset. Now reporting GC refs for %s at offset %04x.\n", diff --git a/src/coreclr/vm/gcenv.ee.common.cpp b/src/coreclr/vm/gcenv.ee.common.cpp index 2f88f2b501b656..0df511d2eadcbe 100644 --- a/src/coreclr/vm/gcenv.ee.common.cpp +++ b/src/coreclr/vm/gcenv.ee.common.cpp @@ -335,37 +335,22 @@ StackWalkAction GcStackCrawlCallBack(CrawlFrame* pCF, VOID* pData) #if defined(FEATURE_EH_FUNCLETS) if (pCF->ShouldParentToFuncletUseUnwindTargetLocationForGCReporting()) { - bool wantsReportOnlyLeaf = true; - -#if defined(USE_GC_INFO_DECODER) - GCInfoToken gcInfoToken = pCF->GetGCInfoToken(); - GcInfoDecoder _gcInfoDecoder( - gcInfoToken, - DECODE_CODE_LENGTH - ); - wantsReportOnlyLeaf = _gcInfoDecoder.WantsReportOnlyLeaf(); -#endif // USE_GC_INFO_DECODER - - if (wantsReportOnlyLeaf) - { - // We're in a special case of unwinding from a funclet, and resuming execution in - // another catch funclet associated with same parent function. We need to report roots. - // Reporting at the original throw site gives incorrect liveness information. We choose to - // report the liveness information at the first interruptible instruction of the catch funclet - // that we are going to execute. We also only report stack slots, since no registers can be - // live at the first instruction of a handler, except the catch object, which the VM protects - // specially. If the catch funclet has not interruptible point, we fall back and just report - // what we used to: at the original throw instruction. This might lead to bad GC behavior - // if the liveness is not correct. - const EE_ILEXCEPTION_CLAUSE& ehClauseForCatch = pCF->GetEHClauseForCatch(); - relOffsetOverride = FindFirstInterruptiblePoint(pCF, ehClauseForCatch.HandlerStartPC, - ehClauseForCatch.HandlerEndPC); - _ASSERTE(relOffsetOverride != NO_OVERRIDE_OFFSET); - - STRESS_LOG3(LF_GCROOTS, LL_INFO1000, "Setting override offset = %u for method %pM ControlPC = %p\n", - relOffsetOverride, pMD, GetControlPC(pCF->GetRegisterSet())); - } - + // We're in a special case of unwinding from a funclet, and resuming execution in + // another catch funclet associated with same parent function. We need to report roots. + // Reporting at the original throw site gives incorrect liveness information. We choose to + // report the liveness information at the first interruptible instruction of the catch funclet + // that we are going to execute. We also only report stack slots, since no registers can be + // live at the first instruction of a handler, except the catch object, which the VM protects + // specially. If the catch funclet has not interruptible point, we fall back and just report + // what we used to: at the original throw instruction. This might lead to bad GC behavior + // if the liveness is not correct. + const EE_ILEXCEPTION_CLAUSE& ehClauseForCatch = pCF->GetEHClauseForCatch(); + relOffsetOverride = FindFirstInterruptiblePoint(pCF, ehClauseForCatch.HandlerStartPC, + ehClauseForCatch.HandlerEndPC); + _ASSERTE(relOffsetOverride != NO_OVERRIDE_OFFSET); + + STRESS_LOG3(LF_GCROOTS, LL_INFO1000, "Setting override offset = %u for method %pM ControlPC = %p\n", + relOffsetOverride, pMD, GetControlPC(pCF->GetRegisterSet())); } #endif // FEATURE_EH_FUNCLETS diff --git a/src/coreclr/vm/gcinfodecoder.cpp b/src/coreclr/vm/gcinfodecoder.cpp index f5e06efff64ef5..00bfa6b96f6d63 100644 --- a/src/coreclr/vm/gcinfodecoder.cpp +++ b/src/coreclr/vm/gcinfodecoder.cpp @@ -672,7 +672,7 @@ template bool TGcInfoDecoder::HasTailC template bool TGcInfoDecoder::WantsReportOnlyLeaf() { // Only AMD64 with JIT64 can return false here. -#ifdef TARGET_AMD64 +#if defined(TARGET_AMD64) && defined(DECODE_OLD_FORMATS) return ((m_headerFlags & GC_INFO_WANTS_REPORT_ONLY_LEAF) != 0); #else return true; diff --git a/src/coreclr/vm/stackwalk.cpp b/src/coreclr/vm/stackwalk.cpp index 1226e99c162897..9216e1470c345b 100644 --- a/src/coreclr/vm/stackwalk.cpp +++ b/src/coreclr/vm/stackwalk.cpp @@ -1684,29 +1684,15 @@ StackWalkAction StackFrameIterator::Filter(void) ProcessFuncletsForGCReporting: do { - // When enumerating GC references for "liveness" reporting, depending upon the architecture, - // the responsibility of who reports what varies: + // The funclet reports all references belonging to itself and its parent method. // - // 1) On ARM, ARM64, and X64 (using RyuJIT), the funclet reports all references belonging - // to itself and its parent method. This is indicated by the WantsReportOnlyLeaf flag being - // set in the GC information for a function. - // - // 2) X64 (using JIT64) has the reporting distributed between the funclets and the parent method. - // If some reference(s) get double reported, JIT64 can handle that by playing conservative. - // JIT64 does NOT set the WantsReportOnlyLeaf flag in the function GC information. - // - // 3) On ARM, the reporting is done by funclets (if present). Otherwise, the primary method - // does it. - // - // 4) x86 behaves like (1) - // - // For non-x86, the GcStackCrawlCallBack is invoked with a new flag indicating that - // the stackwalk is being done for GC reporting purposes - this flag is GC_FUNCLET_REFERENCE_REPORTING. + // The GcStackCrawlCallBack is invoked with a new flag indicating that the stackwalk is being done + // for GC reporting purposes - this flag is GC_FUNCLET_REFERENCE_REPORTING. // The presence of this flag influences how the stackwalker will enumerate frames; which frames will // result in the callback being invoked; etc. The idea is that we want to report only the // relevant frames via the callback that are active on the callstack. This removes the need to - // double report (even though JIT64 does it), reporting of dead frames, and makes the - // design of reference reporting more consistent (and easier to understand) across architectures. + // double report, reporting of dead frames, and makes the design of reference reporting more + // consistent (and easier to understand) across architectures. // // The algorithm is as follows (at a conceptual level): // @@ -1726,7 +1712,6 @@ StackWalkAction StackFrameIterator::Filter(void) // // Note: When a flag is passed to the callback indicating that the funclet for a parent frame has already // reported the references, RyuJIT will simply do nothing and return from the callback. - // JIT64, on the other hand, will ignore the flag and perform reporting (again). // // Note: For non-filter funclets there is a small window during unwind where we have conceptually unwound past a // funclet but have not yet reached the parent/handling frame. In this case we might need the parent to