Skip to content

Commit 134fa43

Browse files
authored
[NVIDIA] Change to use num_local_experts (#8453)
1 parent ccfe52a commit 134fa43

File tree

2 files changed

+3
-2
lines changed

2 files changed

+3
-2
lines changed

docs/backend/server_arguments.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -214,7 +214,8 @@ Please consult the documentation below and [server_args.py](https://github.com/s
214214
| `--ep-size` | The expert parallelism size. | 1 |
215215
| `--enable-ep-moe` | Enabling expert parallelism for moe. The ep size is equal to the tp size. | False |
216216
| `--enable-deepep-moe` | Enabling DeepEP MoE implementation for EP MoE. | False |
217-
| `--enable-flashinfer-moe` | Enabling Flashinfer MoE implementation. | False |
217+
| `--enable-flashinfer-cutlass-moe` | Enabling Flashinfer Cutlass MoE implementation for high throughput. | False |
218+
| `--enable-flashinfer-trtllm-moe` | Enabling Flashinfer Trtllm MoE implementation for low latency. | False |
218219
| `--deepep-mode` | Select the mode when enable DeepEP MoE, could be `normal`, `low_latency` or `auto`. Default is `auto`, which means `low_latency` for decode batch and `normal` for prefill batch. | auto |
219220
| `--ep-num-redundant-experts` | Allocate this number of redundant experts in expert parallel. | 0 |
220221
| `--ep-dispatch-algorithm` | The algorithm to choose ranks for redundant experts in expert parallel. | None |

python/sglang/srt/layers/moe/ep_moe/layer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1268,7 +1268,7 @@ def forward(self, hidden_states: torch.Tensor, router_logits: torch.Tensor):
12681268
topk_group=self.topk_group,
12691269
intermediate_size=self.w2_weight.shape[2],
12701270
local_expert_offset=self.start_expert_id,
1271-
local_num_experts=self.num_experts_per_partition,
1271+
local_num_experts=self.num_local_experts,
12721272
routed_scaling_factor=self.routed_scaling_factor,
12731273
tile_tokens_dim=_get_tile_tokens_dim(
12741274
hidden_states.shape[0], self.top_k, self.num_experts

0 commit comments

Comments
 (0)