Faster storage chunk iteration #9477

macneale4 · 2025-07-09T16:49:49Z

Currently we have two access patterns where we need to iterate over all chunks in a storage file: Archive/Unarchive and FSCK. In both cases, we are loading one chunk at a time and disk IO becomes a bottleneck. This change updates the iteration methods to iterate over the data blocks of table files and archive files such that we can load data in larger batches.

There are not new tests for this. archive and fsck tests should cover this.

Testing on a 500Mb database: 8.7s to fsck before change, 5.0s after change.

coffeegoddd · 2025-07-09T17:43:15Z

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`25aba8f`	ok	5937457

version	total_tests
`25aba8f`	5937457

correctness_percentage
100.0

coffeegoddd · 2025-07-09T17:56:44Z

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`fe25b3d`	ok	5937457

version	total_tests
`fe25b3d`	5937457

correctness_percentage
100.0

coffeegoddd · 2025-07-09T19:08:46Z

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`1193ca1`	ok	5937457

version	total_tests
`1193ca1`	5937457

correctness_percentage
100.0

coffeegoddd · 2025-07-09T19:17:19Z

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`2e9851a`	ok	5937457

version	total_tests
`2e9851a`	5937457

correctness_percentage
100.0

coffeegoddd · 2025-07-09T20:17:57Z

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`535c415`	ok	5937457

version	total_tests
`535c415`	5937457

correctness_percentage
100.0

coffeegoddd · 2025-07-09T20:26:37Z

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`365656d`	ok	5937457

version	total_tests
`365656d`	5937457

correctness_percentage
100.0

coffeegoddd · 2025-07-09T22:12:27Z

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`389748b`	ok	5937457

version	total_tests
`389748b`	5937457

correctness_percentage
100.0

reltuk

Given the structure of these files, I think it makes more sense to put the chunk readers through bufio than to try to roll our own with the outer loop and tracking offset, etc.

#9515

coffeegoddd · 2025-07-15T23:52:20Z

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`e683779`	ok	5937457

version	total_tests
`e683779`	5937457

correctness_percentage
100.0

…te.sh

coffeegoddd · 2025-07-28T19:34:04Z

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`e04c46c`	ok	5937457

version	total_tests
`e04c46c`	5937457

correctness_percentage
100.0

coffeegoddd added the correctness_approved label Jul 9, 2025

macneale4 force-pushed the macneale4-claude/storage-iterate branch from 77c3efd to 447df97 Compare July 9, 2025 18:29

macneale4 force-pushed the macneale4-claude/storage-iterate branch from 2e9851a to 535c415 Compare July 9, 2025 19:44

macneale4 marked this pull request as ready for review July 9, 2025 21:54

macneale4 changed the title ~~Faster chunk iteration~~ Faster storage chunk iteration Jul 9, 2025

macneale4 requested a review from reltuk July 9, 2025 21:56

reltuk reviewed Jul 14, 2025

View reviewed changes

macneale4 and others added 9 commits July 28, 2025 11:53

First pass on faster chunk iteration

96a3ccb

Iterating over TableFile chunks faster

56db2a9

Remove unnecessary alloc

6052693

[ga-format-pr] Run go/utils/repofmt/format_repo.sh and go/Godeps/upda…

a822735

…te.sh

Use 4Mb window. And clean up so many comments

7f47693

[ga-format-pr] Run go/utils/repofmt/format_repo.sh and go/Godeps/upda…

657e157

…te.sh

One more meaningless comment removed

1a27947

go/store/nbs: Use bufio in iterateAll implementations. (#9515)

c579c5c

Update iterateAllChunks docs to mention duplicates

e04c46c

macneale4 force-pushed the macneale4-claude/storage-iterate branch from e683779 to e04c46c Compare July 28, 2025 18:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Faster storage chunk iteration #9477

Faster storage chunk iteration #9477

Uh oh!

macneale4 commented Jul 9, 2025 •

edited

Loading

Uh oh!

coffeegoddd commented Jul 9, 2025

Uh oh!

coffeegoddd commented Jul 9, 2025

Uh oh!

coffeegoddd commented Jul 9, 2025

Uh oh!

coffeegoddd commented Jul 9, 2025

Uh oh!

coffeegoddd commented Jul 9, 2025

Uh oh!

coffeegoddd commented Jul 9, 2025

Uh oh!

coffeegoddd commented Jul 9, 2025

Uh oh!

reltuk left a comment

Uh oh!

coffeegoddd commented Jul 15, 2025

Uh oh!

coffeegoddd commented Jul 28, 2025

Uh oh!

Uh oh!

Uh oh!

Faster storage chunk iteration #9477

Are you sure you want to change the base?

Faster storage chunk iteration #9477

Uh oh!

Conversation

macneale4 commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coffeegoddd commented Jul 9, 2025

Uh oh!

coffeegoddd commented Jul 9, 2025

Uh oh!

coffeegoddd commented Jul 9, 2025

Uh oh!

coffeegoddd commented Jul 9, 2025

Uh oh!

coffeegoddd commented Jul 9, 2025

Uh oh!

coffeegoddd commented Jul 9, 2025

Uh oh!

coffeegoddd commented Jul 9, 2025

Uh oh!

reltuk left a comment

Choose a reason for hiding this comment

Uh oh!

coffeegoddd commented Jul 15, 2025

Uh oh!

coffeegoddd commented Jul 28, 2025

Uh oh!

Uh oh!

macneale4 commented Jul 9, 2025 •

edited

Loading