-
Notifications
You must be signed in to change notification settings - Fork 3.7k
feat: Adds statistics measurement for compact-throughput #26754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR is a work in progress that will calculate and return the percentage of compact-throughput limiter usage in the form of a percentage
regarding the algorithm to be used the expected output is the following ❯ curl -s -XPOST 'http://localhost:8086/debug/vars' | grep "compact-throughput-usage" "stats": {"compact-throughput-usage":42.1}, where 42.1 is a number in percentage use
cmd/influxd/run/server.go
Outdated
return fmt.Errorf("open points writer: %s", err) | ||
} | ||
|
||
s.Monitor.WithLimiter(s.TSDBStore.EngineOptions.CompactionThroughputLimiter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm going to change this to WithCompactThroughputLimiter
or something. Thinking about if in the future we want to add other limiters to the metrics collection for /debug/vars. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems only to cover Burst and not the long term limit.
cmd/influxd/run/server.go
Outdated
return fmt.Errorf("open points writer: %s", err) | ||
} | ||
|
||
s.Monitor.WithLimiter(s.TSDBStore.EngineOptions.CompactionThroughputLimiter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
m.RegisterDiagnosticsClient("network", &network{}) | ||
m.RegisterDiagnosticsClient("system", &system{}) | ||
|
||
if m.Limiter != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a lock around the accesses to m.limiter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or are we assuming Open
is safely single-threaded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe Open
should be single threaded. It's called during Server.Open
for _, service := range s.Services {
if err := service.Open(); err != nil {
return fmt.Errorf("open service: %s", err)
}
}
influxdb/cmd/influxd/run/server.go
Line 425 in d64fc9e
func (s *Server) Open() error { |
Which is called from within Main.Run
.
Lines 81 to 83 in d64fc9e
if err := cmd.Run(args...); err != nil { | |
return fmt.Errorf("run: %s", err) | |
} |
m.DeregisterDiagnosticsClient("network") | ||
m.DeregisterDiagnosticsClient("system") | ||
m.DeregisterDiagnosticsClient("stats") | ||
m.DeregisterDiagnosticsClient("config") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this missing from a previous PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
// available = current tokens in the rate limiter bucket (can be negative when in debt) | ||
// burst = maximum tokens the bucket can hold | ||
// usage percentage = ((burst - available) / burst) * 100 | ||
func (s *stats) CompactThroughputUsage() float64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only calculates the burst percentage (maximum tokens that can be requested). But we also need the overall percentage used of the limit over time (Limiter.Limit), don't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can likely generate two fields
compact-throughput-burst-usage
compact-throughput-usage
And return the usage of our burst limit and our standard limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the burst limiter is much less interesting; it's going to be highly variable moment to moment, and what I think Support wants is to answer the question: are we, on average, running up against our compaction throughput limit? Check with the FR author on the requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment on the FR.
The complex part of this would be tracking a time window that's meaningful for grabbing our bytes per second. Running curl and getting this data at any moment in time would likely be meaningless without a moving average as far as I'm aware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will see what Andrew's input is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the correct computation on this is
100*(1-rate.Limit(l.Tokens())/l.Limit())
See this Go Playground for an example, and how token debt is handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we wanting the percentage to go over 100%, or be limited to 100%? The limiter should keep the actual usage more or less at or under 100%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be okay to go over 100%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused about how going above 100% should be interpreted. Is there an EAR or feature request that drove this feature? Maybe if I see that it will make more sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The truncation to 2 decimal places could be improved. Also have a question about going over 100% on the usage.
monitor/stats.go
Outdated
i := fmt.Sprintf("%.2f", compactThroughputUsage) | ||
compactThroughputUsageTrunc, err := strconv.ParseFloat(i, 2) | ||
if err != nil { | ||
return nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rounding the the percentage to 2 decimal places is an excellent idea! The less significant digits of a floating point number are usually noise. This is not the best way to round it, though. Rounding to 2 decimal places would be:
compactThroughputUsageTrunc := math.Round(compactThroughputUsage * 100.0) / 100.0
This is less work and memory allocation than converting to a string and back. It also allows more control over how we round than using fmt.Sprintf
. For instance, math.Round
could be changed to math.Ceil
or math.Floor
if we wanted to change the rounding direction.
// available = current tokens in the rate limiter bucket (can be negative when in debt) | ||
// burst = maximum tokens the bucket can hold | ||
// usage percentage = ((burst - available) / burst) * 100 | ||
func (s *stats) CompactThroughputUsage() float64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused about how going above 100% should be interpreted. Is there an EAR or feature request that drove this feature? Maybe if I see that it will make more sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The floating point calculations could be further refined, but it's overkill for what we're doing here.
This PR adds the following as an available
/debug/vars
field.This will show the current compaction throughput usage WRT the available limiter tokens as defined by the golang rate limiter.
https://pkg.go.dev/golang.org/x/time/rate#Limiter.Limit
https://pkg.go.dev/golang.org/x/time/rate#Limiter.Tokens
The algorithm for finding our usage is the following