Skip to content

Conversation

devanbenz
Copy link

@devanbenz devanbenz commented Aug 27, 2025

This PR adds the following as an available /debug/vars field.

"stats" : { "compact-throughput-usage-percentage" : <percentage used> }

This will show the current compaction throughput usage WRT the available limiter tokens as defined by the golang rate limiter.

https://pkg.go.dev/golang.org/x/time/rate#Limiter.Limit
https://pkg.go.dev/golang.org/x/time/rate#Limiter.Tokens

The algorithm for finding our usage is the following

percentage = 100 * (1 - (tokens) / limit)

This PR is a work in progress that will calculate and return the
percentage of compact-throughput limiter usage in the form of a percentage
@devanbenz devanbenz self-assigned this Aug 27, 2025
regarding the algorithm to be used

the expected output is the following

❯ curl -s -XPOST 'http://localhost:8086/debug/vars' | grep "compact-throughput-usage"
"stats": {"compact-throughput-usage":42.1},

where 42.1 is a number in percentage use
@devanbenz devanbenz marked this pull request as ready for review August 28, 2025 19:11
return fmt.Errorf("open points writer: %s", err)
}

s.Monitor.WithLimiter(s.TSDBStore.EngineOptions.CompactionThroughputLimiter)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm going to change this to WithCompactThroughputLimiter or something. Thinking about if in the future we want to add other limiters to the metrics collection for /debug/vars. 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

Copy link
Contributor

@davidby-influx davidby-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems only to cover Burst and not the long term limit.

return fmt.Errorf("open points writer: %s", err)
}

s.Monitor.WithLimiter(s.TSDBStore.EngineOptions.CompactionThroughputLimiter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

m.RegisterDiagnosticsClient("network", &network{})
m.RegisterDiagnosticsClient("system", &system{})

if m.Limiter != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a lock around the accesses to m.limiter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or are we assuming Open is safely single-threaded?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Open should be single threaded. It's called during Server.Open

	for _, service := range s.Services {
		if err := service.Open(); err != nil {
			return fmt.Errorf("open service: %s", err)
		}
	}

func (s *Server) Open() error {

Which is called from within Main.Run.

if err := cmd.Run(args...); err != nil {
return fmt.Errorf("run: %s", err)
}

m.DeregisterDiagnosticsClient("network")
m.DeregisterDiagnosticsClient("system")
m.DeregisterDiagnosticsClient("stats")
m.DeregisterDiagnosticsClient("config")
Copy link
Contributor

@davidby-influx davidby-influx Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this missing from a previous PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

// available = current tokens in the rate limiter bucket (can be negative when in debt)
// burst = maximum tokens the bucket can hold
// usage percentage = ((burst - available) / burst) * 100
func (s *stats) CompactThroughputUsage() float64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only calculates the burst percentage (maximum tokens that can be requested). But we also need the overall percentage used of the limit over time (Limiter.Limit), don't we?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can likely generate two fields

compact-throughput-burst-usage

compact-throughput-usage

And return the usage of our burst limit and our standard limit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the burst limiter is much less interesting; it's going to be highly variable moment to moment, and what I think Support wants is to answer the question: are we, on average, running up against our compaction throughput limit? Check with the FR author on the requirements.

Copy link
Author

@devanbenz devanbenz Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment on the FR.

The complex part of this would be tracking a time window that's meaningful for grabbing our bytes per second. Running curl and getting this data at any moment in time would likely be meaningless without a moving average as far as I'm aware.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will see what Andrew's input is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the correct computation on this is

100*(1-rate.Limit(l.Tokens())/l.Limit())

See this Go Playground for an example, and how token debt is handled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we wanting the percentage to go over 100%, or be limited to 100%? The limiter should keep the actual usage more or less at or under 100%.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be okay to go over 100%.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about how going above 100% should be interpreted. Is there an EAR or feature request that drove this feature? Maybe if I see that it will make more sense to me.

davidby-influx
davidby-influx previously approved these changes Sep 22, 2025
Copy link
Contributor

@davidby-influx davidby-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@gwossum gwossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The truncation to 2 decimal places could be improved. Also have a question about going over 100% on the usage.

monitor/stats.go Outdated
Comment on lines 29 to 33
i := fmt.Sprintf("%.2f", compactThroughputUsage)
compactThroughputUsageTrunc, err := strconv.ParseFloat(i, 2)
if err != nil {
return nil, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rounding the the percentage to 2 decimal places is an excellent idea! The less significant digits of a floating point number are usually noise. This is not the best way to round it, though. Rounding to 2 decimal places would be:

compactThroughputUsageTrunc := math.Round(compactThroughputUsage * 100.0) / 100.0

This is less work and memory allocation than converting to a string and back. It also allows more control over how we round than using fmt.Sprintf. For instance, math.Round could be changed to math.Ceil or math.Floor if we wanted to change the rounding direction.

// available = current tokens in the rate limiter bucket (can be negative when in debt)
// burst = maximum tokens the bucket can hold
// usage percentage = ((burst - available) / burst) * 100
func (s *stats) CompactThroughputUsage() float64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about how going above 100% should be interpreted. Is there an EAR or feature request that drove this feature? Maybe if I see that it will make more sense to me.

Copy link
Member

@gwossum gwossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The floating point calculations could be further refined, but it's overkill for what we're doing here.

@devanbenz devanbenz merged commit 879e34a into master-1.x Oct 1, 2025
9 checks passed
@devanbenz devanbenz deleted the db/compact-tp-stats branch October 1, 2025 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants