-
Notifications
You must be signed in to change notification settings - Fork 3.7k
feat: Adds statistics measurement for compact-throughput #26754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
c2b1baa
385f35d
f937361
2991925
da85112
cdf4c13
899948b
d9d3a12
538fed5
bbd4d8f
c68dd6a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,6 +16,7 @@ import ( | |
"github.com/influxdata/influxdb/logger" | ||
"github.com/influxdata/influxdb/models" | ||
"github.com/influxdata/influxdb/monitor/diagnostics" | ||
"github.com/influxdata/influxdb/pkg/limiter" | ||
"github.com/influxdata/influxdb/services/meta" | ||
"github.com/influxdata/influxdb/tsdb" | ||
"go.uber.org/zap" | ||
|
@@ -100,7 +101,8 @@ type Monitor struct { | |
// TSDB configuration for diagnostics | ||
TSDBConfig *tsdb.Config | ||
|
||
Logger *zap.Logger | ||
Logger *zap.Logger | ||
Limiter limiter.Rate | ||
} | ||
|
||
// PointsWriter is a simplified interface for writing the points the monitor gathers. | ||
|
@@ -150,6 +152,15 @@ func (m *Monitor) Open() error { | |
m.RegisterDiagnosticsClient("runtime", &goRuntime{}) | ||
m.RegisterDiagnosticsClient("network", &network{}) | ||
m.RegisterDiagnosticsClient("system", &system{}) | ||
|
||
if m.Limiter != nil { | ||
m.RegisterDiagnosticsClient("stats", &stats{ | ||
comp: compactThroughputStats{ | ||
limiter: m.Limiter, | ||
}, | ||
}) | ||
} | ||
|
||
if m.TSDBConfig != nil { | ||
m.RegisterDiagnosticsClient("config", m.TSDBConfig) | ||
} | ||
|
@@ -200,6 +211,12 @@ func (m *Monitor) writePoints(p models.Points) error { | |
return nil | ||
} | ||
|
||
func (m *Monitor) WithCompactThroughputLimiter(limiter limiter.Rate) { | ||
m.mu.Lock() | ||
defer m.mu.Unlock() | ||
m.Limiter = limiter | ||
} | ||
|
||
// Close closes the monitor system. | ||
func (m *Monitor) Close() error { | ||
if !m.open() { | ||
|
@@ -222,6 +239,8 @@ func (m *Monitor) Close() error { | |
m.DeregisterDiagnosticsClient("runtime") | ||
m.DeregisterDiagnosticsClient("network") | ||
m.DeregisterDiagnosticsClient("system") | ||
m.DeregisterDiagnosticsClient("stats") | ||
m.DeregisterDiagnosticsClient("config") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Was this missing from a previous PR? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes |
||
return nil | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
package monitor | ||
|
||
import ( | ||
"math" | ||
|
||
"github.com/influxdata/influxdb/monitor/diagnostics" | ||
"github.com/influxdata/influxdb/pkg/limiter" | ||
"golang.org/x/time/rate" | ||
) | ||
|
||
// stats captures statistics | ||
type stats struct { | ||
comp compactThroughputStats | ||
} | ||
|
||
type compactThroughputStats struct { | ||
limiter limiter.Rate | ||
} | ||
|
||
// CompactThroughputUsage calculates the percentage of burst capacity currently consumed by compaction. | ||
func (s *stats) CompactThroughputUsage() float64 { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This only calculates the burst percentage (maximum tokens that can be requested). But we also need the overall percentage used of the limit over time (Limiter.Limit), don't we? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can likely generate two fields
And return the usage of our burst limit and our standard limit? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the burst limiter is much less interesting; it's going to be highly variable moment to moment, and what I think Support wants is to answer the question: are we, on average, running up against our compaction throughput limit? Check with the FR author on the requirements. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a comment on the FR. The complex part of this would be tracking a time window that's meaningful for grabbing our bytes per second. Running curl and getting this data at any moment in time would likely be meaningless without a moving average as far as I'm aware. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will see what Andrew's input is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe the correct computation on this is
See this Go Playground for an example, and how token debt is handled. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we wanting the percentage to go over 100%, or be limited to 100%? The limiter should keep the actual usage more or less at or under 100%. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be okay to go over 100%. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm confused about how going above 100% should be interpreted. Is there an EAR or feature request that drove this feature? Maybe if I see that it will make more sense to me. |
||
percentage := 100 * (1 - rate.Limit(s.comp.limiter.Tokens())/s.comp.limiter.Limit()) | ||
return float64(percentage) | ||
} | ||
|
||
func (s *stats) Diagnostics() (*diagnostics.Diagnostics, error) { | ||
compactThroughputUsage := s.CompactThroughputUsage() | ||
compactThroughputUsageTrunc := math.Round(compactThroughputUsage*100.0) / 100.0 | ||
d := map[string]interface{}{ | ||
"compact-throughput-usage-percentage": compactThroughputUsageTrunc, | ||
} | ||
|
||
return diagnostics.RowFromMap(d), nil | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
package monitor_test | ||
|
||
import ( | ||
"testing" | ||
|
||
"github.com/influxdata/influxdb/monitor" | ||
"github.com/influxdata/influxdb/pkg/limiter" | ||
"github.com/influxdata/influxdb/tsdb" | ||
"github.com/stretchr/testify/require" | ||
) | ||
|
||
func TestDiagnostics_Stats(t *testing.T) { | ||
s := monitor.New(nil, monitor.Config{}, &tsdb.Config{}) | ||
compactLimiter := limiter.NewRate(100, 100) | ||
|
||
s.WithCompactThroughputLimiter(compactLimiter) | ||
|
||
require.NoError(t, s.Open(), "opening monitor") | ||
defer func() { | ||
require.NoError(t, s.Close(), "closing monitor") | ||
}() | ||
|
||
d, err := s.Diagnostics() | ||
require.NoError(t, err, "getting diagnostics") | ||
|
||
diags, ok := d["stats"] | ||
require.True(t, ok, "expected stats diagnostic client to be registered") | ||
|
||
got, exp := diags.Columns, []string{"compact-throughput-usage-percentage"} | ||
require.Equal(t, exp, got) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a lock around the accesses to
m.limiter
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or are we assuming
Open
is safely single-threaded?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe
Open
should be single threaded. It's called duringServer.Open
influxdb/cmd/influxd/run/server.go
Line 425 in d64fc9e
Which is called from within
Main.Run
.influxdb/cmd/influxd/main.go
Lines 81 to 83 in d64fc9e