Skip to content

Conversation

corylanou
Copy link
Collaborator

Summary

Fixes a race condition in TestStore_Integration that was causing spurious test failures at exactly 10.01s.

Root Cause

The test has a goroutine race condition where the insert goroutine calls t.Errorf() after the main test has completed:

  1. Main test goroutine: Exits immediately when 10s timer closes the done channel
  2. Insert goroutine: May still be mid-operation using t.Context()
  3. Race condition: When main test exits, t.Context() gets cancelled
  4. Spurious failure: Insert goroutine gets "database locked" error and calls t.Errorf() after test completion

In Go 1.14+, calling testing methods from a goroutine after the test has finished causes the test to fail.

Evidence

Solution

  • Error channel: Capture insert errors from the goroutine
  • Shutdown awareness: Check if done channel is closed before treating errors as failures
  • Defer cleanup: Use defer to check error channel on any test exit path
  • Expected behavior: Errors during shutdown (after done closes) are ignored as expected

Test Results

The fix eliminates spurious failures while still catching legitimate insert errors that occur during normal test execution.

# Before: Intermittent failures at 10.01s
--- FAIL: TestStore_Integration (10.01s)

# After: Reliable passes  
--- PASS: TestStore_Integration (10.18s)

Changes

  • Add insertErr channel to capture errors from insert goroutine
  • Add shutdown-aware error handling: only report errors if not shutting down
  • Use defer to ensure error checking happens on any test exit
  • Maintain original clean loop structure with return statements

🤖 Generated with Claude Code

Fixes a race condition in TestStore_Integration that was causing spurious
test failures at exactly 10.01s. The issue occurred when the insert
goroutine would call t.Errorf() after the main test had already completed,
which Go's test framework treats as a failure.

Root cause:
- Main test exits immediately when the 10s timer closes the 'done' channel
- Insert goroutine may still be mid-operation using t.Context()
- When main test exits, t.Context() gets cancelled
- Insert goroutine gets "database locked" or context cancelled error
- Calling t.Errorf() from goroutine after test completion causes failure

Solution:
- Add error channel to capture insert errors from goroutine
- Check if 'done' channel is closed before treating errors as failures
- Use defer to check error channel, ensuring it runs on any test exit
- Errors during shutdown (after 'done' closes) are ignored as expected

This eliminates the spurious failures while still catching legitimate
insert errors that occur during normal test execution.

Fixes spurious CI failure: https://github.com/benbjohnson/litestream/actions/runs/17583602203/job/49945685558

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@corylanou corylanou merged commit ffb6264 into main Sep 10, 2025
9 checks passed
@corylanou corylanou deleted the fix-flaky-store-integration-test-clean branch September 10, 2025 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants