Skip to content

Conversation

macneale4
Copy link
Contributor

@macneale4 macneale4 commented Jul 9, 2025

Currently we have two access patterns where we need to iterate over all chunks in a storage file: Archive/Unarchive and FSCK. In both cases, we are loading one chunk at a time and disk IO becomes a bottleneck. This change updates the iteration methods to iterate over the data blocks of table files and archive files such that we can load data in larger batches.

There are not new tests for this. archive and fsck tests should cover this.

Testing on a 500Mb database: 8.7s to fsck before change, 5.0s after change.

@coffeegoddd
Copy link
Contributor

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000
version result total
25aba8f ok 5937457
version total_tests
25aba8f 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
fe25b3d ok 5937457
version total_tests
fe25b3d 5937457
correctness_percentage
100.0

@macneale4 macneale4 force-pushed the macneale4-claude/storage-iterate branch from 77c3efd to 447df97 Compare July 9, 2025 18:29
@coffeegoddd
Copy link
Contributor

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000
version result total
1193ca1 ok 5937457
version total_tests
1193ca1 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
2e9851a ok 5937457
version total_tests
2e9851a 5937457
correctness_percentage
100.0

@macneale4 macneale4 force-pushed the macneale4-claude/storage-iterate branch from 2e9851a to 535c415 Compare July 9, 2025 19:44
@coffeegoddd
Copy link
Contributor

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000
version result total
535c415 ok 5937457
version total_tests
535c415 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
365656d ok 5937457
version total_tests
365656d 5937457
correctness_percentage
100.0

@macneale4 macneale4 marked this pull request as ready for review July 9, 2025 21:54
@macneale4 macneale4 changed the title Faster chunk iteration Faster storage chunk iteration Jul 9, 2025
@macneale4 macneale4 requested a review from reltuk July 9, 2025 21:56
@coffeegoddd
Copy link
Contributor

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000
version result total
389748b ok 5937457
version total_tests
389748b 5937457
correctness_percentage
100.0

Copy link
Contributor

@reltuk reltuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the structure of these files, I think it makes more sense to put the chunk readers through bufio than to try to roll our own with the outer loop and tracking offset, etc.

#9515

@coffeegoddd
Copy link
Contributor

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000
version result total
e683779 ok 5937457
version total_tests
e683779 5937457
correctness_percentage
100.0

@macneale4 macneale4 force-pushed the macneale4-claude/storage-iterate branch from e683779 to e04c46c Compare July 28, 2025 18:59
@coffeegoddd
Copy link
Contributor

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000
version result total
e04c46c ok 5937457
version total_tests
e04c46c 5937457
correctness_percentage
100.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants