Skip to content

Conversation

u-kai
Copy link
Contributor

@u-kai u-kai commented Jun 11, 2025

What does it do ?

Remove flaky log assertions from TestPodSource to fix race conditions that caused intermittent test failures with "Expected no debug messages" errors.

Example of the race condition failure: : link

Motivation

The TestPodSource test was experiencing flaky failures due to race conditions when running in parallel.
The issue occurred because:

  1. Global logger state conflicts: Multiple tests with t.Parallel() were simultaneously
    modifying the global logrus logger through LogsUnderTestWithLogLevel()
  2. Implementation detail testing: Testing log output (an implementation detail) rather than
    focusing on the core functionality

Solution: Remove log testing entirely and focus on endpoint generation correctness, which is
the actual functionality being tested.

More

  • Yes, this PR title follows Conventional Commits
  • Yes, I added unit tests
  • Yes, I updated end user documentation accordingly

@k8s-ci-robot k8s-ci-robot requested a review from szuecs June 11, 2025 01:03
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 11, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @u-kai. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jun 11, 2025
@mloiseleur
Copy link
Collaborator

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 11, 2025
@ivankatliarchuk
Copy link
Contributor

If you removing this capability from tests, we need a compensation. Testing similar capabilities but without paralel execution

@u-kai
Copy link
Contributor Author

u-kai commented Jun 11, 2025

@ivankatliarchuk
Thank you for review!

I personally don't think the log assertions in this test are essential, especially since it's just Debug logging.
I prioritized enabling t.Parallel() to speed up the test suite, and I believe removing the log assertions makes the test more robust since log output tends to change frequently.
Could you clarify why you think it's important to keep testing the logs in this case?

@ivankatliarchuk
Copy link
Contributor

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 11, 2025
@ivankatliarchuk
Copy link
Contributor

compensation for removed testing capability is required.

@ivankatliarchuk
Copy link
Contributor

as well as static analysis for correct use of test.Parallel() is needed

@u-kai
Copy link
Contributor Author

u-kai commented Jun 11, 2025

@ivankatliarchuk

compensation for removed testing capability is required.

The only part that is no longer being tested after my change is the log output.
As I understand it, your suggestion for a compensation would mean explicitly verifying that the expected log messages are produced — is that correct?
To my knowledge, very few of the existing tests currently verify that the log output is correct, so I just want to make sure I am interpreting your request properly.
Could you please confirm whether this understanding is correct?

@ivankatliarchuk
Copy link
Contributor

Yes, there is no need to validate same logs for each test, so only few tests have logging valided. We could have separate test that just verify log messages. Agree that it does single log level testing, not efficient.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 11, 2025
@u-kai
Copy link
Contributor Author

u-kai commented Jun 11, 2025

@ivankatliarchuk
Thanks for the helpful advice! I've added a separate test for log verification as you suggested.
It would be great if you could take another look when you have a chance.

Copy link
Contributor

@ivankatliarchuk ivankatliarchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

}

func TestPodSourceLogs(t *testing.T) {
t.Parallel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need Parallel for that? Just a comment, not necessary need to action

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's needed here, because as discussed below, using t.Parallel at this level is safe and allows us to parallelize the tests.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 11, 2025
@ivankatliarchuk
Copy link
Contributor

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 11, 2025
Copy link
Collaborator

@mloiseleur mloiseleur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my suggestion: t.Parallel() should be inside the for loop.
apart from that, LGTM. Thanks for your help on this 👍

@u-kai
Copy link
Contributor Author

u-kai commented Jun 12, 2025

@mloiseleur
Thanks for your review.
I applied your suggestion and ran the tests multiple times. However, I noticed that the test sometimes fails like this:

--- FAIL: TestPodSourceLogs (0.00s)
    --- FAIL: TestPodSourceLogs/when_ignoreNonHostNetworkPods=false,_no_skip_logs_should_be_generated (0.10s)
        pod_test.go:895:
                Error Trace:    /Users/kai/external-dns/internal/testutils/log.go:91
                Error:          Should be false
                Test:           TestPodSourceLogs/when_ignoreNonHostNetworkPods=false,_no_skip_logs_should_be_generated
                Messages:       Expected log message found when should not: skipping pod my-pod1-65699. hostNetwork=false
FAIL

This happens because the parallel tests share state and have conflicting expectations about whether certain log messages should or should not appear.
While this can be avoided by making sure test data (like variable names) don’t overlap between tests, this requires extra care and could easily introduce mistakes in the future.

For this reason, I personally prefer keeping the current approach without using t.Parallel() here — what do you think?

@mloiseleur
Copy link
Collaborator

For this reason, I personally prefer keeping the current approach without using t.Parallel() here — what do you think?

Make sense to me. We want the CI to be as reliable as we can 👍

@u-kai
Copy link
Contributor Author

u-kai commented Jun 12, 2025

Make sense to me. We want the CI to be as reliable as we can 👍

Got it, I'll leave it as is then.

@mloiseleur
Copy link
Collaborator

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 13, 2025
@ivankatliarchuk
Copy link
Contributor

/test pull-external-dns-unit-test

@ivankatliarchuk
Copy link
Contributor

/test all

Copy link
Contributor

@ivankatliarchuk ivankatliarchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ivankatliarchuk, mloiseleur

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [ivankatliarchuk,mloiseleur]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit f5a3667 into kubernetes-sigs:master Jun 13, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants