chores(fill, pytest): mark slow tests that were not marked as such yet #2220

felix314159 · 2025-09-29T10:01:08Z

🗒️ Description

I wrote a function for the filler plugin that while filling detects when a non-slow test takes more than 0.5 sec during any of its phases (setup, call, teardown). It then warns that this test should be marked as slow. I locally filled all non-slow tests and then manually added the slow marker to them. Some of these take more than 3s, so this PR should substantially speed up our non-slow filling. LMK whether you think the standalone function for detecting such scenarios should also be merged or not, its just a def pytest_runtest_logreport(report) with some logic in ./src/pytest_plugins/filler/filler.py.

🔗 Related Issues or PRs

N/A.

✅ Checklist

All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
```
uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
```
All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
All: Considered adding an entry to CHANGELOG.md.
All: Considered updating the online docs in the ./docs/ directory.
All: Set appropriate labels for the changes (only maintainers can apply labels).
Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

danceratopz · 2025-09-29T11:12:42Z

Oooh, that's quite a lot of tests. Wdyt about having a nightly that runs all tests on main and/or run them all upon merge so that we catch any errors that we'd otherwise potentially miss? Can be a separate pr of course.

LMK whether you think the standalone function for detecting such scenarios should also be merged or not, its just a def pytest_runtest_logreport(report) with some logic in ./src/pytest_plugins/filler/filler.py.

Btw, I think you could achieve the same using a pytest built-in:

  --durations-min=N     Minimal duration in seconds for inclusion in slowest list. Default: 0.005 (or 0.0 if -vv is given).

felix314159 · 2025-09-29T12:31:58Z

Oooh, that's quite a lot of tests. Wdyt about having a nightly that runs all tests on main and/or run them all upon merge so that we catch any errors that we'd otherwise potentially miss? Can be a separate pr of course.

LMK whether you think the standalone function for detecting such scenarios should also be merged or not, its just a def pytest_runtest_logreport(report) with some logic in ./src/pytest_plugins/filler/filler.py.

Btw, I think you could achieve the same using a pytest built-in:
  --durations-min=N     Minimal duration in seconds for inclusion in slowest list. Default: 0.005 (or 0.0 if -vv is given).

I don't know what you mean with so that we catch any errors that we'd otherwise potentially miss. Before and after this PR none of this qualifies as an error. I don't think we should add an extra CI run just for this, we can just do it manually twice a year or so. Ty for mentioning the pytest built-in it seems useful, it does use slightly different logic (duration there seems to mean setup+call+teardown whereas I looked at each of these three in isolation, not that it matters). What would concern me with CI runs for this is that:

hardware could change without us noticing (e.g. the meaning of a test taking 1 s would be useless info without guarantees about the hardware being used)
other stuff runs in parallel (e.g. can we get lots of false positives because the hardware used to run that test at that time was overloaded by sth completely unrelated?)

So all in all, I would prefer just to merge this and revisit it when we think that 'not slow' fills feel slow

danceratopz · 2025-09-29T15:14:57Z

I don't think we should add an extra CI run just for this, we can just do it manually twice a year or so.

I consider filling the tests an important check of the framework source and the test source code. Post-Weld, it will also be relevant for testing changes in the specs. Otherwise, why would we fill the tests in the CI at all?

Anything that's not ran in a regular CI, can come and bite us when we least expect it. For example, when we want to push out a release.

felix314159 · 2025-09-29T15:33:33Z

Ah, does this mean whenever we push any commit and CI runs it actually skips everything marked as slow? In that case I agree with you, that we should fill slow tests nightly

spencer-tb · 2025-10-02T12:56:51Z

I agree with Dan on the CI side. I think we can/should refine this further post weld. What if for now we make this limit 5s -> marking tests over 5s with slow just to get this merged?

We could instead (in the future) add a marker for fast filling tests of each EIP. So we'd have some tests from each EIP with this marker. Then in CI only fill these tests for all forks to test basic functionality. Our nightly would then fill all tests to check we haven't broken anything further as a sanity check.

marioevz · 2025-10-02T15:39:35Z

We could instead (in the future) add a marker for fast filling tests of each EIP. So we'd have some tests from each EIP with this marker. Then in CI only fill these tests for all forks to test basic functionality. Our nightly would then fill all tests to check we haven't broken anything further as a sanity check.

I really like this idea, have a pytest.mark.sanity marker to mark the minimum amount of tests that are required to have full coverage of our framework.

I think @SamWilsn did something like this on EELS where he found the minimal set of tests that provided full coverage of EELS, we could do the same and mark the minimum amount of tests that verify our whole EEST framework and then only run those sanity tests for PR merging.

danceratopz · 2025-10-02T15:56:10Z

We could instead (in the future) add a marker for fast filling tests of each EIP. So we'd have some tests from each EIP with this marker. Then in CI only fill these tests for all forks to test basic functionality. Our nightly would then fill all tests to check we haven't broken anything further as a sanity check.

I really like this idea, have a pytest.mark.sanity marker to mark the minimum amount of tests that are required to have full coverage of our framework.

Sounds great! But filling the tests is also a test of the tests, not just a test of the fw. I.e., you are testing that if you change the fw, that you didn't break any tests.

Also if you change tests/cancun/eip4788_beacon_root/conftest.py you want to run all tests under tests/cancun/eip4788_beacon_root/ in CI for verification.

tldr; it'll require some logic to pick a good subset. I would prefer to optimize filling.

One other good reason to run all (reasonable) tests, is that we should run hasher in CI by default to get the hash. Then it's trivial to verify fixture changes against main for pure refactors.

I think @SamWilsn did something like this on EELS where he found the minimal set of tests that provided full coverage of EELS, we could do the same and mark the minimum amount of tests that verify our whole EEST framework and then only run those sanity tests for PR merging.

https://github.com/SamWilsn/minimize-tests

marioevz

I'm inclined to merge now for a couple of reasons:

It'll speed up merging PRs
We can revisit slow marked tests once we are welded.
Once welded, we can run all slow tests at least once to mark the weld as successful.

I can open an issue to track this post-weld.

marioevz

Removed marks for some of the tests which have no other tests in the suite, in order to not remove coverage of those functions completely.

tests/prague/eip7623_increase_calldata_cost/test_transaction_validity.py

tests/prague/eip6110_deposits/test_deposits.py

tests/prague/eip7002_el_triggerable_withdrawals/test_withdrawal_requests.py

tests/prague/eip7251_consolidations/test_consolidations.py

tests/prague/eip2537_bls_12_381_precompiles/test_bls12_pairing.py

mark slow tests that were not marked as such yet

37f2974

felix314159 added the P-low label Sep 29, 2025

marioevz approved these changes Oct 3, 2025

View reviewed changes

marioevz mentioned this pull request Oct 3, 2025

[Post-Weld] Revisit slow marked tests #2261

Open

marioevz reviewed Oct 3, 2025

View reviewed changes

Apply suggestions from code review

96f2691

marioevz merged commit 322fd26 into ethereum:main Oct 3, 2025
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chores(fill, pytest): mark slow tests that were not marked as such yet #2220

chores(fill, pytest): mark slow tests that were not marked as such yet #2220

Uh oh!

felix314159 commented Sep 29, 2025

Uh oh!

danceratopz commented Sep 29, 2025

Uh oh!

felix314159 commented Sep 29, 2025 •

edited

Loading

Uh oh!

danceratopz commented Sep 29, 2025 •

edited

Loading

Uh oh!

felix314159 commented Sep 29, 2025

Uh oh!

spencer-tb commented Oct 2, 2025

Uh oh!

marioevz commented Oct 2, 2025

Uh oh!

danceratopz commented Oct 2, 2025

Uh oh!

marioevz left a comment

Uh oh!

marioevz left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chores(fill, pytest): mark slow tests that were not marked as such yet #2220

chores(fill, pytest): mark slow tests that were not marked as such yet #2220

Uh oh!

Conversation

felix314159 commented Sep 29, 2025

🗒️ Description

🔗 Related Issues or PRs

✅ Checklist

Uh oh!

danceratopz commented Sep 29, 2025

Uh oh!

felix314159 commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danceratopz commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

felix314159 commented Sep 29, 2025

Uh oh!

spencer-tb commented Oct 2, 2025

Uh oh!

marioevz commented Oct 2, 2025

Uh oh!

danceratopz commented Oct 2, 2025

Uh oh!

marioevz left a comment

Choose a reason for hiding this comment

Uh oh!

marioevz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

felix314159 commented Sep 29, 2025 •

edited

Loading

danceratopz commented Sep 29, 2025 •

edited

Loading