Skip to content

Conversation

felix314159
Copy link
Collaborator

🗒️ Description

I wrote a function for the filler plugin that while filling detects when a non-slow test takes more than 0.5 sec during any of its phases (setup, call, teardown). It then warns that this test should be marked as slow. I locally filled all non-slow tests and then manually added the slow marker to them. Some of these take more than 3s, so this PR should substantially speed up our non-slow filling. LMK whether you think the standalone function for detecting such scenarios should also be merged or not, its just a def pytest_runtest_logreport(report) with some logic in ./src/pytest_plugins/filler/filler.py.

🔗 Related Issues or PRs

N/A.

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered adding an entry to CHANGELOG.md.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
  • Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
  • Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

@danceratopz
Copy link
Member

Oooh, that's quite a lot of tests. Wdyt about having a nightly that runs all tests on main and/or run them all upon merge so that we catch any errors that we'd otherwise potentially miss? Can be a separate pr of course.

LMK whether you think the standalone function for detecting such scenarios should also be merged or not, its just a def pytest_runtest_logreport(report) with some logic in ./src/pytest_plugins/filler/filler.py.

Btw, I think you could achieve the same using a pytest built-in:

  --durations-min=N     Minimal duration in seconds for inclusion in slowest list. Default: 0.005 (or 0.0 if -vv is given).

@felix314159
Copy link
Collaborator Author

felix314159 commented Sep 29, 2025

Oooh, that's quite a lot of tests. Wdyt about having a nightly that runs all tests on main and/or run them all upon merge so that we catch any errors that we'd otherwise potentially miss? Can be a separate pr of course.

LMK whether you think the standalone function for detecting such scenarios should also be merged or not, its just a def pytest_runtest_logreport(report) with some logic in ./src/pytest_plugins/filler/filler.py.

Btw, I think you could achieve the same using a pytest built-in:

  --durations-min=N     Minimal duration in seconds for inclusion in slowest list. Default: 0.005 (or 0.0 if -vv is given).

I don't know what you mean with so that we catch any errors that we'd otherwise potentially miss. Before and after this PR none of this qualifies as an error. I don't think we should add an extra CI run just for this, we can just do it manually twice a year or so. Ty for mentioning the pytest built-in it seems useful, it does use slightly different logic (duration there seems to mean setup+call+teardown whereas I looked at each of these three in isolation, not that it matters). What would concern me with CI runs for this is that:

  • hardware could change without us noticing (e.g. the meaning of a test taking 1 s would be useless info without guarantees about the hardware being used)
  • other stuff runs in parallel (e.g. can we get lots of false positives because the hardware used to run that test at that time was overloaded by sth completely unrelated?)

So all in all, I would prefer just to merge this and revisit it when we think that 'not slow' fills feel slow

@danceratopz
Copy link
Member

danceratopz commented Sep 29, 2025

I don't think we should add an extra CI run just for this, we can just do it manually twice a year or so.

I consider filling the tests an important check of the framework source and the test source code. Post-Weld, it will also be relevant for testing changes in the specs. Otherwise, why would we fill the tests in the CI at all?

Anything that's not ran in a regular CI, can come and bite us when we least expect it. For example, when we want to push out a release.

@felix314159
Copy link
Collaborator Author

Ah, does this mean whenever we push any commit and CI runs it actually skips everything marked as slow? In that case I agree with you, that we should fill slow tests nightly

@spencer-tb
Copy link
Contributor

I agree with Dan on the CI side. I think we can/should refine this further post weld. What if for now we make this limit 5s -> marking tests over 5s with slow just to get this merged?

We could instead (in the future) add a marker for fast filling tests of each EIP. So we'd have some tests from each EIP with this marker. Then in CI only fill these tests for all forks to test basic functionality. Our nightly would then fill all tests to check we haven't broken anything further as a sanity check.

@marioevz
Copy link
Member

marioevz commented Oct 2, 2025

We could instead (in the future) add a marker for fast filling tests of each EIP. So we'd have some tests from each EIP with this marker. Then in CI only fill these tests for all forks to test basic functionality. Our nightly would then fill all tests to check we haven't broken anything further as a sanity check.

I really like this idea, have a pytest.mark.sanity marker to mark the minimum amount of tests that are required to have full coverage of our framework.

I think @SamWilsn did something like this on EELS where he found the minimal set of tests that provided full coverage of EELS, we could do the same and mark the minimum amount of tests that verify our whole EEST framework and then only run those sanity tests for PR merging.

@danceratopz
Copy link
Member

We could instead (in the future) add a marker for fast filling tests of each EIP. So we'd have some tests from each EIP with this marker. Then in CI only fill these tests for all forks to test basic functionality. Our nightly would then fill all tests to check we haven't broken anything further as a sanity check.

I really like this idea, have a pytest.mark.sanity marker to mark the minimum amount of tests that are required to have full coverage of our framework.

Sounds great! But filling the tests is also a test of the tests, not just a test of the fw. I.e., you are testing that if you change the fw, that you didn't break any tests.

Also if you change tests/cancun/eip4788_beacon_root/conftest.py you want to run all tests under tests/cancun/eip4788_beacon_root/ in CI for verification.

tldr; it'll require some logic to pick a good subset. I would prefer to optimize filling.

One other good reason to run all (reasonable) tests, is that we should run hasher in CI by default to get the hash. Then it's trivial to verify fixture changes against main for pure refactors.

I think @SamWilsn did something like this on EELS where he found the minimal set of tests that provided full coverage of EELS, we could do the same and mark the minimum amount of tests that verify our whole EEST framework and then only run those sanity tests for PR merging.

https://github.com/SamWilsn/minimize-tests

Copy link
Member

@marioevz marioevz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to merge now for a couple of reasons:

  • It'll speed up merging PRs
  • We can revisit slow marked tests once we are welded.
  • Once welded, we can run all slow tests at least once to mark the weld as successful.

I can open an issue to track this post-weld.

Copy link
Member

@marioevz marioevz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed marks for some of the tests which have no other tests in the suite, in order to not remove coverage of those functions completely.

@marioevz marioevz merged commit 322fd26 into ethereum:main Oct 3, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants