Skip to content

Conversation

corylanou
Copy link
Collaborator

Summary

This PR introduces a comprehensive testing harness (litestream-test) for validating Litestream's behavior with large databases, various write patterns, and edge cases. This is the first implementation based on discussions with @benbjohnson about the need for thorough testing, particularly for databases larger than 1GB and various operational scenarios.

Background

Based on requirements outlined in [internal planning documents], we need comprehensive testing for:

  • Databases larger than 1GB (critical SQLite lock page edge case at 0x40000000)
  • Various write patterns and frequencies
  • Database shrinking scenarios (grow then delete data)
  • Interruption and recovery scenarios (checkpoint during Litestream downtime)
  • Multi-level compaction validation
  • LTX file continuity checking

Implementation

New Binary: /cmd/litestream-test/

Four primary commands implemented:

  1. populate - Quickly create test databases to target sizes

    • Configurable page sizes (critical for 1GB lock page testing)
    • Multiple tables with indexes
    • Batch inserts for efficiency
    • Support for databases from MB to GB scale
  2. load - Generate continuous workload on databases

    • Write patterns: constant, burst, random, wave
    • Configurable read/write ratios
    • Multiple concurrent workers
    • Real-time statistics reporting
  3. validate - Verify replication integrity

    • Quick check, integrity check, checksum comparison
    • LTX file continuity checking
    • Full data validation between source and restored DBs
    • Integration with existing Litestream restore
  4. shrink - Test database shrinking scenarios

    • Configurable delete percentage
    • Optional VACUUM and checkpoint operations
    • All SQLite checkpoint modes supported (PASSIVE, FULL, RESTART, TRUNCATE)

Testing

Comprehensive testing was performed using the demo script available here:
Test Harness Demo Script (Gist)

Test Results

Database Creation

  • Created databases from 5MB to 50MB successfully
  • Verified correct page size configuration (4KB, 8KB tested)
  • Tables with proper indexes created

Write Patterns

  • Burst pattern: Correctly generated bursts (205 writes in 5s, then 0, then 204)
  • Random pattern: Generated variable rates as expected
  • Wave pattern: Smooth oscillating pattern confirmed
  • Load generation successfully added 600+ rows during tests

Page Size Configuration

  • Successfully created database with 8KB pages (verified: 8192 bytes)
  • Ready for 1GB lock page boundary testing

Shrink Operations

  • Deleted 40% of data (10,239 rows)
  • FULL checkpoint executed successfully
  • Database size reduced from 66MB to 45MB

Example Usage

# Create a 1GB database for lock page testing
./bin/litestream-test populate -db /tmp/test.db -target-size 1GB -page-size 8192

# Generate continuous load
./bin/litestream-test load -db /tmp/test.db -write-rate 100 -duration 1m -pattern burst

# Test shrinking
./bin/litestream-test shrink -db /tmp/test.db -delete-percentage 50 -checkpoint

# Validate replication
./bin/litestream-test validate -source-db /tmp/test.db -replica-url s3://bucket/test

Key Features

  • ✅ Handles SQLite lock page at 1GB boundary (page calculation included)
  • ✅ Supports interruption/recovery test scenarios
  • ✅ Uses crypto/rand for secure random data generation
  • ✅ Comprehensive error handling and structured logging
  • ✅ Compatible with existing Litestream binaries

Next Steps

This is the first pass of the testing framework. Future enhancements could include:

  • Automated test suites for CI/CD
  • Multi-database concurrent testing
  • Network failure simulation
  • Performance regression detection
  • VFS testing capabilities

Notes

  • All code follows existing Litestream patterns and conventions
  • Uses same dependencies (go-sqlite3, slog)
  • Passes all pre-commit hooks (goimports, go-vet, staticcheck)
  • No changes to existing Litestream code

cc: @benbjohnson @corylanou

corylanou and others added 5 commits September 24, 2025 14:42
- Implement populate command for quickly creating test databases
- Add load command for generating continuous read/write workloads
- Create validate command for integrity and checksum verification
- Add shrink command for testing database shrinking scenarios
- Support configurable page sizes, write patterns, and validation modes
- Handle databases crossing 1GB boundary (SQLite lock page edge case)

Part of comprehensive testing framework for validating Litestream behavior
with large databases, various write patterns, and edge cases.
- Move all test scripts to cmd/litestream-test/scripts/ directory
- Create comprehensive README.md documenting all test scripts
- Move test results documentation to .local/test-results/ (gitignored)
- Clean up root directory test artifacts

Test scripts include:
- reproduce-critical-bug.sh: Reproduces checkpoint during downtime bug
- test-1gb-boundary.sh: Tests SQLite 1GB lock page handling
- test-fresh-start.sh: Tests fresh database creation workflow
- test-rapid-checkpoints.sh: Stress tests rapid checkpoint cycling
- test-wal-growth.sh: Tests 100MB+ WAL file handling
- test-concurrent-operations.sh: Tests 5 concurrent database replications
- verify-test-setup.sh: Ensures local builds are used

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
…entation

- Add S3 LTX file retention cleanup testing scripts:
  - test-s3-retention-small-db.sh: Tests 50MB database with 2min retention
  - test-s3-retention-large-db.sh: Tests 1.5GB database crossing SQLite lock page
  - test-s3-retention-comprehensive.sh: Master script with comparative analysis
  - S3-RETENTION-TESTING.md: Complete documentation and usage guide

- Scripts use local Python S3 mock for isolated testing
- Validate Ben's concern about LTX file cleanup after retention period
- Include critical SQLite 1GB lock page boundary testing

- Update scripts/README.md with new S3 retention test documentation
- Clean up temporary markdown files and analysis artifacts

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Disable MD031 (blanks-around-fences) and MD032 (blanks-around-lists)
- Fix README.md heading style to use consistent setext format
- Convert bold text to proper headings in S3-RETENTION-TESTING.md
- Add required blank lines around headings

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@corylanou corylanou force-pushed the feat/litestream-test-harness branch from a92760a to b817367 Compare September 24, 2025 19:43
corylanou and others added 3 commits September 25, 2025 07:48
Add four new testing scripts for automated validation:
- analyze-test-results.sh: Parse and analyze test output logs
- test-overnight-s3.sh: Extended S3 replication stress testing
- test-overnight.sh: General overnight stress testing suite
- test-quick-validation.sh: Fast validation for common scenarios

These scripts provide comprehensive test coverage for replication,
retention, and recovery scenarios across different storage backends.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Populate database before starting litestream to ensure data exists
- Adjust aggressive test parameters for shorter test duration
- Improve monitoring with WAL size tracking and replica metrics
- Fix table detection for accurate row counting
- Better error filtering to exclude non-critical warnings
- Enhanced summary with clearer success/failure indicators

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@corylanou corylanou merged commit ee36d3e into main Sep 25, 2025
9 checks passed
@corylanou corylanou deleted the feat/litestream-test-harness branch September 25, 2025 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants