-
Notifications
You must be signed in to change notification settings - Fork 3.1k
[Refactor] OAI Server components #7167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
zhyncs
merged 40 commits into
sgl-project:main
from
JustinTong0323:refactor_oai_server_serving
Jun 17, 2025
Merged
Changes from 12 commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
1d17465
Add refactored OpenAI API server modules implementation
JustinTong0323 42bb560
Merge branch 'main' into refactor_oai_server_serving
JustinTong0323 d9ceddd
feat: add serving_embedding
JustinTong0323 f8d604b
Refactors request handling in OpenAI endpoints
JustinTong0323 a86bf27
Adds documentation to OpenAI API endpoints
JustinTong0323 5ddc8fc
Simplifies getting enable_thinking value
JustinTong0323 2ddbb40
rename serving_engine to serving_base
JustinTong0323 26771ad
Merge branch 'main' into refactor_oai_server_serving
JustinTong0323 4596b52
Makes chat template caching instance-specific
JustinTong0323 47d54dc
Refactors logprobs processing
JustinTong0323 8ac4349
Update python/sglang/srt/entrypoints/openai/protocol.py
JustinTong0323 00b202c
Improve test cases for eagle infer (#7173)
merrymercy fb4ae05
fix CI
JustinTong0323 81f5e41
Merge branch 'main' into refactor_oai_server_serving
JustinTong0323 2a10db7
Merge branch 'main' into refactor_oai_server_serving
JustinTong0323 3b28fdb
Removes unused utility functions
JustinTong0323 012bcb5
Refactors request validation for OpenAI endpoints
JustinTong0323 27341ae
Improves OpenAI serving base class logic
JustinTong0323 286751a
Refactors error handling for OpenAI endpoints
JustinTong0323 50d57d1
Refactors request ID generation
JustinTong0323 960f917
Removes RequestContext
JustinTong0323 30663a5
Simplifies enable_thinking handling and remove unused functions
JustinTong0323 eb6784d
Refactors sampling parameter building
JustinTong0323 47da102
Renames OpenAI serving handler classes
JustinTong0323 177efdc
Merge branch 'main' into refactor_oai_server_serving
JustinTong0323 c5a60e0
cleanup docs and imports
JustinTong0323 d433e43
Fixes usage calculation in streaming mode
JustinTong0323 ba42ea1
Refactors error response handling in OpenAIServingBase
JustinTong0323 48586bf
Apply suggestions from code review
JustinTong0323 3e03b74
Refactors test fixtures for clarity and remove some tests
JustinTong0323 ac908e1
Enables tool call constraint in sampling params
JustinTong0323 69e41f7
move the `text = content["text"]` in serving_chat for Better readability
JustinTong0323 590db9a
lint
JustinTong0323 4c140c8
remove redundant logic
JustinTong0323 7190e6f
logic for generate_completion_prompt
JustinTong0323 40e97fc
Add comments back
JustinTong0323 84f6037
Merge branch 'main' into refactor_oai_server_serving
JustinTong0323 b95a288
fix tests
JustinTong0323 cc28f37
fix lint
JustinTong0323 ea30a8c
Merge branch 'main' into refactor_oai_server_serving
zhyncs File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,48 +0,0 @@ | ||
# Copyright 2023-2024 SGLang Team | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# ============================================================================== | ||
""" | ||
OpenAI-compatible API server module for SGLang. | ||
|
||
This module provides OpenAI-compatible API endpoints that allow existing OpenAI client | ||
applications to seamlessly work with SGLang models. The implementation includes: | ||
|
||
Key Features: | ||
- Full OpenAI API compatibility for chat completions, text completions, and embeddings | ||
- Streaming support for real-time response generation | ||
- Batch processing capabilities for multiple requests | ||
- Function calling and tool use support | ||
- Multimodal input support (text, images, audio) | ||
- Advanced reasoning capabilities with separate reasoning content | ||
- Custom sampling parameters and constraints (regex, JSON schema, EBNF) | ||
- LoRA adapter support for fine-tuned models | ||
- Cache reporting and token usage tracking | ||
|
||
Supported Endpoints: | ||
- /v1/chat/completions - Chat-based completions with conversation history | ||
- /v1/completions - Text completions for single prompts | ||
- /v1/embeddings - Text/multimodal embeddings generation | ||
- /v1/models - Model listing and information | ||
|
||
The module is structured with separate handlers for each endpoint type, all inheriting | ||
from a common base class that provides shared functionality like request validation, | ||
error handling, and response formatting. | ||
|
||
Architecture: | ||
- OpenAIServingBase: Abstract base class for all endpoint handlers | ||
- ChatCompletionHandler: Handles chat completion requests | ||
- CompletionHandler: Handles text completion requests | ||
- EmbeddingHandler: Handles embedding requests | ||
- Protocol classes: Pydantic models for request/response validation | ||
- Utility functions: Shared helpers for formatting and validation | ||
""" | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.