-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Checklist
- 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 2. Please use English, otherwise it will be closed.
Motivation
In production environments where LoRA adapters are deployed for model fine-tuning, SGLang currently lacks support for MOE models. While vLLM provides partial support for MOE models, it has a significant limitation: "vLLM currently does not support fused MoE LoRA inference. Please ensure that the loaded LoRA model does not contain expert weights." This restriction severely limits the practical application of LoRA in MOE scenarios, particularly when the LoRA adapters include expert-specific weights that are crucial for maintaining the specialized capabilities of different experts.
The absence of comprehensive MOE LoRA support in SGLang prevents users from leveraging the full potential of LoRA fine-tuning for MOE models, especially in scenarios where expert weights need to be adapted for domain-specific tasks.
Related resources
https://github.com/woct0rdho/transformers-qwen3-moe-fused
https://huggingface.co/chenrm/qwen3-30b-a3b-abliterated-lora