feat: Adds statistics measurement for compact-throughput #26754

devanbenz · 2025-08-27T22:28:25Z

This PR adds the following as an available /debug/vars field.

"stats" : { "compact-throughput-usage-percentage" : <percentage used> }

This will show the current compaction throughput usage WRT the available limiter tokens as defined by the golang rate limiter.

https://pkg.go.dev/golang.org/x/time/rate#Limiter.Limit
https://pkg.go.dev/golang.org/x/time/rate#Limiter.Tokens

The algorithm for finding our usage is the following

percentage = 100 * (1 - (tokens) / limit)

This PR is a work in progress that will calculate and return the percentage of compact-throughput limiter usage in the form of a percentage

regarding the algorithm to be used the expected output is the following ❯ curl -s -XPOST 'http://localhost:8086/debug/vars' | grep "compact-throughput-usage" "stats": {"compact-throughput-usage":42.1}, where 42.1 is a number in percentage use

devanbenz · 2025-08-28T19:37:12Z

cmd/influxd/run/server.go

 		return fmt.Errorf("open points writer: %s", err)
 	}

+	s.Monitor.WithLimiter(s.TSDBStore.EngineOptions.CompactionThroughputLimiter)


I think I'm going to change this to WithCompactThroughputLimiter or something. Thinking about if in the future we want to add other limiters to the metrics collection for /debug/vars. 🤔

davidby-influx

This seems only to cover Burst and not the long term limit.

davidby-influx · 2025-08-28T19:53:22Z

cmd/influxd/run/server.go

 		return fmt.Errorf("open points writer: %s", err)
 	}

+	s.Monitor.WithLimiter(s.TSDBStore.EngineOptions.CompactionThroughputLimiter)


davidby-influx · 2025-08-28T19:55:36Z

monitor/service.go

 	m.RegisterDiagnosticsClient("network", &network{})
 	m.RegisterDiagnosticsClient("system", &system{})
+
+	if m.Limiter != nil {


Do we need a lock around the accesses to m.limiter?

Or are we assuming Open is safely single-threaded?

I believe Open should be single threaded. It's called during Server.Open

for _, service := range s.Services { if err := service.Open(); err != nil { return fmt.Errorf("open service: %s", err) } }

influxdb/cmd/influxd/run/server.go

Line 425 in d64fc9e

func (s *Server) Open() error {

Which is called from within Main.Run.

influxdb/cmd/influxd/main.go

Lines 81 to 83 in d64fc9e

if err := cmd.Run(args...); err != nil {

return fmt.Errorf("run: %s", err)

}

davidby-influx · 2025-08-28T20:00:30Z

monitor/service.go

 	m.DeregisterDiagnosticsClient("network")
 	m.DeregisterDiagnosticsClient("system")
+	m.DeregisterDiagnosticsClient("stats")
+	m.DeregisterDiagnosticsClient("config")


Was this missing from a previous PR?

davidby-influx · 2025-08-28T20:09:31Z

monitor/stats.go

+// available = current tokens in the rate limiter bucket (can be negative when in debt)
+// burst = maximum tokens the bucket can hold
+// usage percentage = ((burst - available) / burst) * 100
+func (s *stats) CompactThroughputUsage() float64 {


This only calculates the burst percentage (maximum tokens that can be requested). But we also need the overall percentage used of the limit over time (Limiter.Limit), don't we?

I can likely generate two fields

compact-throughput-burst-usage compact-throughput-usage

And return the usage of our burst limit and our standard limit?

I think the burst limiter is much less interesting; it's going to be highly variable moment to moment, and what I think Support wants is to answer the question: are we, on average, running up against our compaction throughput limit? Check with the FR author on the requirements.

Added a comment on the FR.

The complex part of this would be tracking a time window that's meaningful for grabbing our bytes per second. Running curl and getting this data at any moment in time would likely be meaningless without a moving average as far as I'm aware.

Will see what Andrew's input is

I believe the correct computation on this is

100*(1-rate.Limit(l.Tokens())/l.Limit())

See this Go Playground for an example, and how token debt is handled.

Are we wanting the percentage to go over 100%, or be limited to 100%? The limiter should keep the actual usage more or less at or under 100%.

I think it would be okay to go over 100%.

I'm confused about how going above 100% should be interpreted. Is there an EAR or feature request that drove this feature? Maybe if I see that it will make more sense to me.

davidby-influx

LGTM

gwossum

The truncation to 2 decimal places could be improved. Also have a question about going over 100% on the usage.

gwossum · 2025-09-23T20:22:07Z

monitor/stats.go

+	i := fmt.Sprintf("%.2f", compactThroughputUsage)
+	compactThroughputUsageTrunc, err := strconv.ParseFloat(i, 2)
+	if err != nil {
+		return nil, err
+	}


Rounding the the percentage to 2 decimal places is an excellent idea! The less significant digits of a floating point number are usually noise. This is not the best way to round it, though. Rounding to 2 decimal places would be:

compactThroughputUsageTrunc := math.Round(compactThroughputUsage * 100.0) / 100.0

This is less work and memory allocation than converting to a string and back. It also allows more control over how we round than using fmt.Sprintf. For instance, math.Round could be changed to math.Ceil or math.Floor if we wanted to change the rounding direction.

gwossum · 2025-09-23T22:39:26Z

monitor/stats.go

+// available = current tokens in the rate limiter bucket (can be negative when in debt)
+// burst = maximum tokens the bucket can hold
+// usage percentage = ((burst - available) / burst) * 100
+func (s *stats) CompactThroughputUsage() float64 {


I'm confused about how going above 100% should be interpreted. Is there an EAR or feature request that drove this feature? Maybe if I see that it will make more sense to me.

gwossum

LGTM. The floating point calculations could be further refined, but it's overkill for what we're doing here.

feat: Adds statistics measurement for compact-throughput

c2b1baa

This PR is a work in progress that will calculate and return the percentage of compact-throughput limiter usage in the form of a percentage

devanbenz self-assigned this Aug 27, 2025

devanbenz added 5 commits August 28, 2025 11:58

feat: fmt'ing

385f35d

feat: Round up decimal to have 1 precision

f937361

feat: remove hanging log message

da85112

feat: Add test for stats monitor

cdf4c13

devanbenz marked this pull request as ready for review August 28, 2025 19:11

devanbenz requested a review from davidby-influx August 28, 2025 19:11

devanbenz commented Aug 28, 2025

View reviewed changes

davidby-influx requested changes Aug 28, 2025

View reviewed changes

devanbenz added 3 commits September 18, 2025 16:00

feat: Modify compaction throughput usage calculation

899948b

fix: checkfmt

d9d3a12

fix: update test

538fed5

devanbenz requested a review from davidby-influx September 22, 2025 19:44

davidby-influx previously approved these changes Sep 22, 2025

View reviewed changes

Merge branch 'master-1.x' into db/compact-tp-stats

bbd4d8f

gwossum requested changes Sep 24, 2025

View reviewed changes

feat: Use math.Round for precision

c68dd6a

devanbenz dismissed davidby-influx’s stale review via c68dd6a September 24, 2025 18:36

devanbenz requested review from gwossum and davidby-influx September 24, 2025 20:37

gwossum approved these changes Sep 30, 2025

View reviewed changes

devanbenz merged commit 879e34a into master-1.x Oct 1, 2025
9 checks passed

devanbenz deleted the db/compact-tp-stats branch October 1, 2025 16:49

	if err := cmd.Run(args...); err != nil {
	return fmt.Errorf("run: %s", err)
	}

feat: Adds statistics measurement for compact-throughput #26754

feat: Adds statistics measurement for compact-throughput #26754

Conversation

devanbenz commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidby-influx left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidby-influx Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devanbenz Aug 28, 2025 • edited by davidby-influx Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidby-influx left a comment

Choose a reason for hiding this comment

Uh oh!

gwossum left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gwossum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

devanbenz commented Aug 27, 2025 •

edited

Loading

davidby-influx Aug 28, 2025 •

edited

Loading

devanbenz Aug 28, 2025 •

edited by davidby-influx

Loading