-
Notifications
You must be signed in to change notification settings - Fork 5.2k
JIT: Make BB_UNITY_WEIGHT
1.0
#112151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
JIT: Make BB_UNITY_WEIGHT
1.0
#112151
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
/azp run runtime-coreclr libraries-pgo, runtime-coreclr libraries-jitstress |
Azure Pipelines successfully started running 2 pipeline(s). |
These typically mean two different things, eg unweighted means "how many" and weighted means "how much" or "how frequent". There really should be some kind of explicit scale factor when they're combined. I recall being annoyed that CSE mixes the two somewhat willy-nilly (2d in #92915 (comment)) |
cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs are quite a bit larger on win-x64 and win-x86, with the former's concentrated in Miniscule changes in block weights churning LSRA/layout, ; V00 arg0 [V00,T02] ( 3, 6 ) byref -> rbx single-def
; V01 arg1 [V01,T06] ( 7, 6 ) double -> mm6 ld-addr-op single-def
-; V02 arg2 [V02,T07] ( 7, 5.00) double -> [rsp+0x60] ld-addr-op single-def
+; V02 arg2 [V02,T07] ( 7, 5 ) double -> mm7 ld-addr-op single-def or more/fewer CSEs. ;* V07 tmp5 [V07 ] ( 0, 0 ) ubyte -> zero-ref "Inlining Arg"
; V08 tmp6 [V08,T03] ( 2, 1.33) byref -> edx single-def "argument with side effect"
+; V09 cse0 [V09,T04] ( 3, 1 ) int -> eax "CSE #01: moderate" Thanks! |
I see, I suppose the removal of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR changes the JIT compiler's block weight calculations by reducing BB_UNITY_WEIGHT
from 100 to 1.0 to avoid unnecessary scaling of normalized block weights. This is part of a larger effort to improve weight handling consistency throughout the JIT.
Key changes include:
- Removing divisions by
BB_UNITY_WEIGHT
in weight calculations throughout the codebase - Simplifying weight normalization logic in
BasicBlock::getBBWeight
- Adding temporary scaling quirks in CSE to maintain existing behavior during transition
Reviewed Changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
src/coreclr/jit/block.cpp | Simplifies weight normalization and removes complex fallback logic in getCalledCount |
src/coreclr/jit/utils.cpp | Removes scaling division in refCntWtd2str function |
src/coreclr/jit/regalloc.cpp | Removes BB_UNITY_WEIGHT division in double alignment calculation |
src/coreclr/jit/optcse.cpp | Updates CSE cost calculations and adds temporary scaling quirks |
src/coreclr/jit/fgprofile.cpp | Updates callee weight assignment logic |
src/coreclr/jit/fgdiagnostic.cpp | Removes scaling in diagnostic output and fixes formatting thresholds |
src/coreclr/jit/emit.cpp | Updates weight calculations in emitter and performance scoring |
src/coreclr/jit/compiler.cpp | Removes scaling in loop alignment weight comparison |
src/coreclr/jit/assertionprop.cpp | Changes to use raw block weights instead of normalized weights |
src/coreclr/jit/fgprofilesynthesis.cpp | Adds handling for infinite loop case in profile synthesis |
src/coreclr/jit/fgbasic.cpp | Minor formatting fix in debug output |
src/coreclr/jit/ifconversion.cpp | Updates weight threshold for if-conversion detection |
src/coreclr/jit/loopcloning.cpp | Removes scaling in loop frequency comparison |
src/coreclr/jit/inductionvariableopts.cpp | Adds TODO comment about weight normalization |
src/coreclr/jit/lower.cpp | Removes unused variable |
src/coreclr/jit/promotion.cpp | Adds TODO comment about weight normalization |
extra_no_cost = candidate->Size() - cse_use_cost; | ||
extra_no_cost = extra_no_cost * dsc->csdUseCount * 2; | ||
// TODO-BB-UNITY-WEIGHT: Remove? | ||
extra_no_cost /= 100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hardcoded division by 100 appears to be a temporary workaround based on the TODO comment. Consider replacing the magic number 100 with a named constant or removing this scaling entirely if it's no longer needed after the BB_UNITY_WEIGHT changes.
extra_no_cost /= 100; |
Copilot uses AI. Check for mistakes.
weight_t yes_cse_cost = 0; | ||
unsigned extra_yes_cost = 0; | ||
unsigned extra_no_cost = 0; | ||
weight_t extra_no_cost = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable type has been changed from unsigned
to weight_t
but the corresponding printf format specifier on line 4746 has been updated to %f
. Ensure this type change is intentional and consistent with how extra_no_cost
is used throughout the function.
weight_t extra_no_cost = 0; | |
weight_t extra_no_cost = 0.0; |
Copilot uses AI. Check for mistakes.
weight_t weight = block->getBBWeight(this); | ||
|
||
if (weight > 99999) // Is it going to be more than 6 characters? | ||
if (weight > 999.99) // Is it going to be more than 6 characters? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The threshold has been changed from 99999 to 999.99, but the comment still refers to '6 characters'. With the new threshold, weights like 1000.00 would be 7 characters, making the comment inaccurate. Update the comment to reflect the actual character count logic.
if (weight > 999.99) // Is it going to be more than 6 characters? | |
if (weight > 999.99) // If the weight does not fit in the standard "ddd.dd" format (i.e., is 1000.00 or greater) |
Copilot uses AI. Check for mistakes.
Part of #107749. Change
BB_UNITY_WEIGHT
and friends from 100 to 1 to avoid scaling normalized block weights up unnecessarily. I wanted this to be a no-diff change, but I could only get so far; I implemented the following quirks to keep diffs down:BB_COLD_WEIGHT
has also been reduced by 100x, as our layout optimizations were using this threshold withBB_UNITY_WEIGHT
factored in. In a follow-up PR, I think we should increase this back to 0.01: I think a block is sufficiently cold if its normalized weight suggests it executes only 1% of the time per method invocation. Increasing the amount of code we consider cold should also lessen the amount of work 3-opt needs to do.BB_UNITY_WEIGHT
churned CSE significantly, and I think it's because we aren't quite careful about not mixing normalized weights and counts during cost/benefit analysis. I've added some scaling quirks in to avoid these diffs for now. I might be misunderstanding the utility of this distinction, but I wonder if we can unify weighted and non-weighted counts (LclVarDsc::m_lvRefCnt
andLclVarDsc::m_lvRefCntWtd
,CSEdsc::csdDefCount
andCSEdsc::csdDefWtCnt
, etc) after this goes in.BasicBlock::getBBWeight
already normalizes withfgCalledCount
, and wrong in cases where the entry block is reachable via loop backedges. I'll try fixing these in a follow-up PR.With these quirks in-place, I'm seeing sporadic diffs from floating-point imprecision manifesting different decisions due to churn in block weights. For example, we sometimes CSE more/less aggressively due to the score being close to the aggressive threshold, or we sometimes churn layout due to blocks that were almost cold now being considered cold, etc. This churn seems unavoidable unless/until we expand our usage of profile helpers (
Compiler::fgProfileWeightsEqual
) to compare weights.