JIT: Make `BB_UNITY_WEIGHT` 1.0 #112151

amanasifkhalid · 2025-02-04T20:47:19Z

Part of #107749. Change BB_UNITY_WEIGHT and friends from 100 to 1 to avoid scaling normalized block weights up unnecessarily. I wanted this to be a no-diff change, but I could only get so far; I implemented the following quirks to keep diffs down:

BB_COLD_WEIGHT has also been reduced by 100x, as our layout optimizations were using this threshold with BB_UNITY_WEIGHT factored in. In a follow-up PR, I think we should increase this back to 0.01: I think a block is sufficiently cold if its normalized weight suggests it executes only 1% of the time per method invocation. Increasing the amount of code we consider cold should also lessen the amount of work 3-opt needs to do.
Changing BB_UNITY_WEIGHT churned CSE significantly, and I think it's because we aren't quite careful about not mixing normalized weights and counts during cost/benefit analysis. I've added some scaling quirks in to avoid these diffs for now. I might be misunderstanding the utility of this distinction, but I wonder if we can unify weighted and non-weighted counts (LclVarDsc::m_lvRefCnt and LclVarDsc::m_lvRefCntWtd, CSEdsc::csdDefCount and CSEdsc::csdDefWtCnt, etc) after this goes in.
On an unrelated note, I noticed some places where we try to normalize a summed weight by dividing it by the entry block's weight. This should be unnecessary since BasicBlock::getBBWeight already normalizes with fgCalledCount, and wrong in cases where the entry block is reachable via loop backedges. I'll try fixing these in a follow-up PR.

With these quirks in-place, I'm seeing sporadic diffs from floating-point imprecision manifesting different decisions due to churn in block weights. For example, we sometimes CSE more/less aggressively due to the score being close to the aggressive threshold, or we sometimes churn layout due to blocks that were almost cold now being considered cold, etc. This churn seems unavoidable unless/until we expand our usage of profile helpers (Compiler::fgProfileWeightsEqual) to compare weights.

dotnet-policy-service · 2025-02-04T20:48:16Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

amanasifkhalid · 2025-02-04T21:40:26Z

/azp run runtime-coreclr libraries-pgo, runtime-coreclr libraries-jitstress

azure-pipelines · 2025-02-04T21:40:44Z

Azure Pipelines successfully started running 2 pipeline(s).

AndyAyersMS · 2025-02-05T01:18:00Z

I wonder if we can unify weighted and non-weighted counts

These typically mean two different things, eg unweighted means "how many" and weighted means "how much" or "how frequent". There really should be some kind of explicit scale factor when they're combined.

I recall being annoyed that CSE mixes the two somewhat willy-nilly (2d in #92915 (comment))

amanasifkhalid · 2025-02-05T17:01:06Z

libraries-pgo failures are #112196 and #111922. libraries-jitstress failure looks unrelated. Build Analysis is blocked by build timeouts.

cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs are quite a bit larger on win-x64 and win-x86, with the former's concentrated in libraries.pmi and the latter's in libraries.pmi and benchmarks.run_pgo. The jit-analyze summaries of these collections reveal some duplicate methods with diffs that might be inflating the total churn. Looking at the example diffs, I'm seeing the same patterns I noted above:

Miniscule changes in block weights churning LSRA/layout,

 ;  V00 arg0         [V00,T02] (  3,  6   )   byref  ->  rbx         single-def
 ;  V01 arg1         [V01,T06] (  7,  6   )  double  ->  mm6         ld-addr-op single-def
-;  V02 arg2         [V02,T07] (  7,  5.00)  double  ->  [rsp+0x60]  ld-addr-op single-def
+;  V02 arg2         [V02,T07] (  7,  5   )  double  ->  mm7         ld-addr-op single-def

or more/fewer CSEs.

 ;* V07 tmp5         [V07    ] (  0,  0   )   ubyte  ->  zero-ref    "Inlining Arg"
 ;  V08 tmp6         [V08,T03] (  2,  1.33)   byref  ->  edx         single-def "argument with side effect"
+;  V09 cse0         [V09,T04] (  3,  1   )     int  ->  eax         "CSE #01: moderate"

Thanks!

amanasifkhalid · 2025-02-05T17:13:35Z

These typically mean two different things, eg unweighted means "how many" and weighted means "how much" or "how frequent". There really should be some kind of explicit scale factor when they're combined.

I see, I suppose the removal of BB_UNITY_WEIGHT scaling has made it easier to spot where we mix these two up. I see that we switch between using weighted/unweighted counts to drive heuristics depending on our optimization goals -- if we're prioritizing size over speed, then we ought to chase CSEs with the most uses rather than the hottest ones -- so we'd be losing something tangible by using just one.

Copilot

Pull Request Overview

This PR changes the JIT compiler's block weight calculations by reducing BB_UNITY_WEIGHT from 100 to 1.0 to avoid unnecessary scaling of normalized block weights. This is part of a larger effort to improve weight handling consistency throughout the JIT.

Key changes include:

Removing divisions by BB_UNITY_WEIGHT in weight calculations throughout the codebase
Simplifying weight normalization logic in BasicBlock::getBBWeight
Adding temporary scaling quirks in CSE to maintain existing behavior during transition

Reviewed Changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/coreclr/jit/block.cpp	Simplifies weight normalization and removes complex fallback logic in `getCalledCount`
src/coreclr/jit/utils.cpp	Removes scaling division in `refCntWtd2str` function
src/coreclr/jit/regalloc.cpp	Removes `BB_UNITY_WEIGHT` division in double alignment calculation
src/coreclr/jit/optcse.cpp	Updates CSE cost calculations and adds temporary scaling quirks
src/coreclr/jit/fgprofile.cpp	Updates callee weight assignment logic
src/coreclr/jit/fgdiagnostic.cpp	Removes scaling in diagnostic output and fixes formatting thresholds
src/coreclr/jit/emit.cpp	Updates weight calculations in emitter and performance scoring
src/coreclr/jit/compiler.cpp	Removes scaling in loop alignment weight comparison
src/coreclr/jit/assertionprop.cpp	Changes to use raw block weights instead of normalized weights
src/coreclr/jit/fgprofilesynthesis.cpp	Adds handling for infinite loop case in profile synthesis
src/coreclr/jit/fgbasic.cpp	Minor formatting fix in debug output
src/coreclr/jit/ifconversion.cpp	Updates weight threshold for if-conversion detection
src/coreclr/jit/loopcloning.cpp	Removes scaling in loop frequency comparison
src/coreclr/jit/inductionvariableopts.cpp	Adds TODO comment about weight normalization
src/coreclr/jit/lower.cpp	Removes unused variable
src/coreclr/jit/promotion.cpp	Adds TODO comment about weight normalization

Copilot · 2025-08-18T20:59:01Z

src/coreclr/jit/optcse.cpp

        extra_no_cost = candidate->Size() - cse_use_cost;
        extra_no_cost = extra_no_cost * dsc->csdUseCount * 2;
+        // TODO-BB-UNITY-WEIGHT: Remove?
+        extra_no_cost /= 100;


This hardcoded division by 100 appears to be a temporary workaround based on the TODO comment. Consider replacing the magic number 100 with a named constant or removing this scaling entirely if it's no longer needed after the BB_UNITY_WEIGHT changes.

Suggested change

extra_no_cost /= 100;

Copilot · 2025-08-18T20:59:02Z

src/coreclr/jit/optcse.cpp

    weight_t yes_cse_cost   = 0;
    unsigned extra_yes_cost = 0;
-    unsigned extra_no_cost  = 0;
+    weight_t extra_no_cost  = 0;


The variable type has been changed from unsigned to weight_t but the corresponding printf format specifier on line 4746 has been updated to %f. Ensure this type change is intentional and consistent with how extra_no_cost is used throughout the function.

Suggested change

weight_t extra_no_cost = 0;

weight_t extra_no_cost = 0.0;

Copilot · 2025-08-18T20:59:02Z

src/coreclr/jit/fgdiagnostic.cpp

        weight_t weight = block->getBBWeight(this);

-        if (weight > 99999) // Is it going to be more than 6 characters?
+        if (weight > 999.99) // Is it going to be more than 6 characters?


The threshold has been changed from 99999 to 999.99, but the comment still refers to '6 characters'. With the new threshold, weights like 1000.00 would be 7 characters, making the comment inaccurate. Update the comment to reflect the actual character count logic.

Suggested change

if (weight > 999.99) // Is it going to be more than 6 characters?

if (weight > 999.99) // If the weight does not fit in the standard "ddd.dd" format (i.e., is 1000.00 or greater)

amanasifkhalid added 4 commits February 4, 2025 09:40

Clean up getBBWeight

c00ed34

Change most getBBWeight call sites to not use BB_UNITY_WEIGHT

d8932db

Make BB_UNITY_WEIGHT 1.0, and add quirks

16abfce

Remove some useless BB_UNITY_WEIGHT references

abc2160

ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 4, 2025

dotnet-policy-service bot assigned amanasifkhalid Feb 4, 2025

amanasifkhalid marked this pull request as ready for review February 4, 2025 21:40

build-analysis bot mentioned this pull request Feb 5, 2025

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

3 tasks

This was referenced Feb 5, 2025

System.Data.Common.Tests Assert failure on Linx x64 CI test run #108070

Closed

JIT: Assertion failed '!"Inconsistent profile data"' in System.Text.Json.Tests with random GDV #112196

Closed

This was referenced Mar 24, 2025

[Perf] Windows/x64: BenchStone Regressions on 1/29/2025 7:05:56 PM +00:00 #112136

Closed

Benchmark Regressions from Profile Maintenance and Block Layout Changes #113108

Closed

Merge from main

156252d

Copilot AI review requested due to automatic review settings August 18, 2025 20:57

Copilot AI reviewed Aug 18, 2025

View reviewed changes

build-analysis bot mentioned this pull request Aug 19, 2025

System.Data.OleDb.Tests timeout in net48 x86 Release leg #87783

Closed

JulieLeeMSFT unassigned amanasifkhalid Oct 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JIT: Make `BB_UNITY_WEIGHT` 1.0 #112151

JIT: Make `BB_UNITY_WEIGHT` 1.0 #112151

amanasifkhalid commented Feb 4, 2025

Uh oh!

dotnet-policy-service bot commented Feb 4, 2025

Uh oh!

amanasifkhalid commented Feb 4, 2025

Uh oh!

azure-pipelines bot commented Feb 4, 2025

Uh oh!

AndyAyersMS commented Feb 5, 2025

Uh oh!

amanasifkhalid commented Feb 5, 2025 •

edited

Loading

Uh oh!

amanasifkhalid commented Feb 5, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 18, 2025

Uh oh!

Copilot AI Aug 18, 2025

Uh oh!

Copilot AI Aug 18, 2025

Uh oh!

Uh oh!

	if (weight > 999.99) // Is it going to be more than 6 characters?
	if (weight > 999.99) // If the weight does not fit in the standard "ddd.dd" format (i.e., is 1000.00 or greater)

JIT: Make BB_UNITY_WEIGHT 1.0 #112151

Are you sure you want to change the base?

JIT: Make BB_UNITY_WEIGHT 1.0 #112151

Conversation

amanasifkhalid commented Feb 4, 2025

Uh oh!

dotnet-policy-service bot commented Feb 4, 2025

Uh oh!

amanasifkhalid commented Feb 4, 2025

Uh oh!

azure-pipelines bot commented Feb 4, 2025

Uh oh!

AndyAyersMS commented Feb 5, 2025

Uh oh!

amanasifkhalid commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amanasifkhalid commented Feb 5, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JIT: Make `BB_UNITY_WEIGHT` 1.0 #112151

JIT: Make `BB_UNITY_WEIGHT` 1.0 #112151

amanasifkhalid commented Feb 5, 2025 •

edited

Loading