Skip to content

[Feature] Comprehensive LoRA Adapter Support for MOE Models: Including Expert Weights Integration #9897

@ttw0

Description

@ttw0

Checklist

Motivation

In production environments where LoRA adapters are deployed for model fine-tuning, SGLang currently lacks support for MOE models. While vLLM provides partial support for MOE models, it has a significant limitation: "vLLM currently does not support fused MoE LoRA inference. Please ensure that the loaded LoRA model does not contain expert weights." This restriction severely limits the practical application of LoRA in MOE scenarios, particularly when the LoRA adapters include expert-specific weights that are crucial for maintaining the specialized capabilities of different experts.
The absence of comprehensive MOE LoRA support in SGLang prevents users from leveraging the full potential of LoRA fine-tuning for MOE models, especially in scenarios where expert weights need to be adapted for domain-specific tasks.

Related resources

https://github.com/woct0rdho/transformers-qwen3-moe-fused
https://huggingface.co/chenrm/qwen3-30b-a3b-abliterated-lora

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions