-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Checklist
- 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 2. Please use English, otherwise it will be closed.
Motivation
Currently, there are three approaches for observability in the PD-Disaggregation architecture: Request Tracing, PD Metrics, TimeState Log. Each of them has its own interface and implementation. Although these methods serve different purposes, they share similar underlying logic(Record timestamps and generate latency information). To improve maintainability and reduce duplication, I propose unifying these components under a single, shared interface. The specific export format would then be determined by command-line arguments.
Current State:
-
Request Tracing @sufeng-buaa :
I am the author. I implemented a tracing package based on OpenTelemetry, which uses global variables to store the trace context. A set of APIs is exposed for static instrumentation to collect telemetry data.
[Feature] Sglang Tracing: Fine-Grained Tracking for Request Latency - Part 1 #9962
[Feature] Sglang Tracing: Fine-Grained Tracking for Request Latency - Part 2 #10804 -
PD Metric @acelyc111 :
The request object caches the timestamp of the previous milestone. The latency between the current milestone and the previous one is calculated and exported to a metric collector.
[PD metrics] Add latency Histogram metrics of each stage for generate requests #8710 -
TimeState Log @merrymercy @LJL36:
@merrymercy introduced TimeStats in Support incremental streaming of logprob/token_ids between scheduler and detokenizer #6225 to record timestamps at key points during a request's lifecycle, and export latency information to logs after request completion. But the actual timestamp data was not filled. @LJL36 submitted a PR [PD] TimeStats for PD disaggregation #10815 that completes the TimeStats
Proposed Implementation
I suggest migrating the trace context from the tracing package into the request object, naming it something like trace_metric_context. During request execution, a unified interface can be used to record information such as RequestStage, timestamps, and other attributes. On the backend, based on different observability configurations, we can implement separate data processing pipelines, ultimately exporting the data to appropriate destinations—OpenTelemetry collector, Prometheus, or log files.
@acelyc111 @LJL36 @zhanghaotong Would you be interested in discussing the design and collaborating on the implementation? cc @stmatengss @ishandhanani @tonyluj @changhuaixin
Related resources
No response