Skip to content

Conversation

amanasifkhalid
Copy link
Contributor

Part of #107749. Change BB_UNITY_WEIGHT and friends from 100 to 1 to avoid scaling normalized block weights up unnecessarily. I wanted this to be a no-diff change, but I could only get so far; I implemented the following quirks to keep diffs down:

  • BB_COLD_WEIGHT has also been reduced by 100x, as our layout optimizations were using this threshold with BB_UNITY_WEIGHT factored in. In a follow-up PR, I think we should increase this back to 0.01: I think a block is sufficiently cold if its normalized weight suggests it executes only 1% of the time per method invocation. Increasing the amount of code we consider cold should also lessen the amount of work 3-opt needs to do.
  • Changing BB_UNITY_WEIGHT churned CSE significantly, and I think it's because we aren't quite careful about not mixing normalized weights and counts during cost/benefit analysis. I've added some scaling quirks in to avoid these diffs for now. I might be misunderstanding the utility of this distinction, but I wonder if we can unify weighted and non-weighted counts (LclVarDsc::m_lvRefCnt and LclVarDsc::m_lvRefCntWtd, CSEdsc::csdDefCount and CSEdsc::csdDefWtCnt, etc) after this goes in.
  • On an unrelated note, I noticed some places where we try to normalize a summed weight by dividing it by the entry block's weight. This should be unnecessary since BasicBlock::getBBWeight already normalizes with fgCalledCount, and wrong in cases where the entry block is reachable via loop backedges. I'll try fixing these in a follow-up PR.

With these quirks in-place, I'm seeing sporadic diffs from floating-point imprecision manifesting different decisions due to churn in block weights. For example, we sometimes CSE more/less aggressively due to the score being close to the aggressive threshold, or we sometimes churn layout due to blocks that were almost cold now being considered cold, etc. This churn seems unavoidable unless/until we expand our usage of profile helpers (Compiler::fgProfileWeightsEqual) to compare weights.

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 4, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@amanasifkhalid amanasifkhalid marked this pull request as ready for review February 4, 2025 21:40
@amanasifkhalid
Copy link
Contributor Author

/azp run runtime-coreclr libraries-pgo, runtime-coreclr libraries-jitstress

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@AndyAyersMS
Copy link
Member

I wonder if we can unify weighted and non-weighted counts

These typically mean two different things, eg unweighted means "how many" and weighted means "how much" or "how frequent". There really should be some kind of explicit scale factor when they're combined.

I recall being annoyed that CSE mixes the two somewhat willy-nilly (2d in #92915 (comment))

@amanasifkhalid
Copy link
Contributor Author

amanasifkhalid commented Feb 5, 2025

libraries-pgo failures are #112196 and #111922. libraries-jitstress failure looks unrelated. Build Analysis is blocked by build timeouts.

cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs are quite a bit larger on win-x64 and win-x86, with the former's concentrated in libraries.pmi and the latter's in libraries.pmi and benchmarks.run_pgo. The jit-analyze summaries of these collections reveal some duplicate methods with diffs that might be inflating the total churn. Looking at the example diffs, I'm seeing the same patterns I noted above:

Miniscule changes in block weights churning LSRA/layout,

 ;  V00 arg0         [V00,T02] (  3,  6   )   byref  ->  rbx         single-def
 ;  V01 arg1         [V01,T06] (  7,  6   )  double  ->  mm6         ld-addr-op single-def
-;  V02 arg2         [V02,T07] (  7,  5.00)  double  ->  [rsp+0x60]  ld-addr-op single-def
+;  V02 arg2         [V02,T07] (  7,  5   )  double  ->  mm7         ld-addr-op single-def

or more/fewer CSEs.

 ;* V07 tmp5         [V07    ] (  0,  0   )   ubyte  ->  zero-ref    "Inlining Arg"
 ;  V08 tmp6         [V08,T03] (  2,  1.33)   byref  ->  edx         single-def "argument with side effect"
+;  V09 cse0         [V09,T04] (  3,  1   )     int  ->  eax         "CSE #01: moderate"

Thanks!

@amanasifkhalid
Copy link
Contributor Author

These typically mean two different things, eg unweighted means "how many" and weighted means "how much" or "how frequent". There really should be some kind of explicit scale factor when they're combined.

I see, I suppose the removal of BB_UNITY_WEIGHT scaling has made it easier to spot where we mix these two up. I see that we switch between using weighted/unweighted counts to drive heuristics depending on our optimization goals -- if we're prioritizing size over speed, then we ought to chase CSEs with the most uses rather than the hottest ones -- so we'd be losing something tangible by using just one.

@Copilot Copilot AI review requested due to automatic review settings August 18, 2025 20:57
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR changes the JIT compiler's block weight calculations by reducing BB_UNITY_WEIGHT from 100 to 1.0 to avoid unnecessary scaling of normalized block weights. This is part of a larger effort to improve weight handling consistency throughout the JIT.

Key changes include:

  • Removing divisions by BB_UNITY_WEIGHT in weight calculations throughout the codebase
  • Simplifying weight normalization logic in BasicBlock::getBBWeight
  • Adding temporary scaling quirks in CSE to maintain existing behavior during transition

Reviewed Changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/coreclr/jit/block.cpp Simplifies weight normalization and removes complex fallback logic in getCalledCount
src/coreclr/jit/utils.cpp Removes scaling division in refCntWtd2str function
src/coreclr/jit/regalloc.cpp Removes BB_UNITY_WEIGHT division in double alignment calculation
src/coreclr/jit/optcse.cpp Updates CSE cost calculations and adds temporary scaling quirks
src/coreclr/jit/fgprofile.cpp Updates callee weight assignment logic
src/coreclr/jit/fgdiagnostic.cpp Removes scaling in diagnostic output and fixes formatting thresholds
src/coreclr/jit/emit.cpp Updates weight calculations in emitter and performance scoring
src/coreclr/jit/compiler.cpp Removes scaling in loop alignment weight comparison
src/coreclr/jit/assertionprop.cpp Changes to use raw block weights instead of normalized weights
src/coreclr/jit/fgprofilesynthesis.cpp Adds handling for infinite loop case in profile synthesis
src/coreclr/jit/fgbasic.cpp Minor formatting fix in debug output
src/coreclr/jit/ifconversion.cpp Updates weight threshold for if-conversion detection
src/coreclr/jit/loopcloning.cpp Removes scaling in loop frequency comparison
src/coreclr/jit/inductionvariableopts.cpp Adds TODO comment about weight normalization
src/coreclr/jit/lower.cpp Removes unused variable
src/coreclr/jit/promotion.cpp Adds TODO comment about weight normalization

extra_no_cost = candidate->Size() - cse_use_cost;
extra_no_cost = extra_no_cost * dsc->csdUseCount * 2;
// TODO-BB-UNITY-WEIGHT: Remove?
extra_no_cost /= 100;
Copy link
Preview

Copilot AI Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hardcoded division by 100 appears to be a temporary workaround based on the TODO comment. Consider replacing the magic number 100 with a named constant or removing this scaling entirely if it's no longer needed after the BB_UNITY_WEIGHT changes.

Suggested change
extra_no_cost /= 100;

Copilot uses AI. Check for mistakes.

weight_t yes_cse_cost = 0;
unsigned extra_yes_cost = 0;
unsigned extra_no_cost = 0;
weight_t extra_no_cost = 0;
Copy link
Preview

Copilot AI Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable type has been changed from unsigned to weight_t but the corresponding printf format specifier on line 4746 has been updated to %f. Ensure this type change is intentional and consistent with how extra_no_cost is used throughout the function.

Suggested change
weight_t extra_no_cost = 0;
weight_t extra_no_cost = 0.0;

Copilot uses AI. Check for mistakes.

weight_t weight = block->getBBWeight(this);

if (weight > 99999) // Is it going to be more than 6 characters?
if (weight > 999.99) // Is it going to be more than 6 characters?
Copy link
Preview

Copilot AI Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The threshold has been changed from 99999 to 999.99, but the comment still refers to '6 characters'. With the new threshold, weights like 1000.00 would be 7 characters, making the comment inaccurate. Update the comment to reflect the actual character count logic.

Suggested change
if (weight > 999.99) // Is it going to be more than 6 characters?
if (weight > 999.99) // If the weight does not fit in the standard "ddd.dd" format (i.e., is 1000.00 or greater)

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants