[Bug] tensor_model_parallel_all_reduce' is not defined

### Describe the bug

I attempted to serve the Phi-4 Lora Fine-tuning model by setting tensor parallel size 2 using the sglang framework, but the following error occurred.

> [Error Log]

```
[2025-01-17 01:51:55 TP0] LoRA manager ready.
[2025-01-17 01:51:57 TP1] Load weight end. type=Phi3ForCausalLM, dtype=torch.float16, avail mem=15.70 GB
[2025-01-17 01:52:00 TP1] LoRA manager ready.
[2025-01-17 01:52:00 TP0] Memory pool end. avail mem=39.54 GB
[2025-01-17 01:52:02 TP1] Memory pool end. avail mem=13.43 GB
[2025-01-17 01:52:02 TP1] max_total_num_tokens=16384, max_prefill_tokens=16384, max_running_requests=2049, context_len=16384
[2025-01-17 01:52:02 TP0] max_total_num_tokens=16384, max_prefill_tokens=16384, max_running_requests=2049, context_len=16384
[2025-01-17 01:52:02] INFO:     Started server process [649817]
[2025-01-17 01:52:02] INFO:     Waiting for application startup.
[2025-01-17 01:52:02] INFO:     Application startup complete.
[2025-01-17 01:52:02] INFO:     Uvicorn running on http://0.0.0.0:8001 (Press CTRL+C to quit)
[2025-01-17 01:52:03] INFO:     127.0.0.1:47632 - "GET /get_model_info HTTP/1.1" 200 OK
[2025-01-17 01:52:03 TP0] Prefill batch. #new-seq: 1, #new-token: 6, #cached-token: 0, cache hit rate: 0.00%, token usage: 0.00, #running-req: 0, #queue-req: 0
[2025-01-17 01:52:11 TP0] TpModelWorkerClient hit an exception: Traceback (most recent call last):
  File "/home/work/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 101, in forward_thread_func
    self.forward_thread_func_()
  File "/home/work/.local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 132, in forward_thread_func_
    logits_output, next_token_ids = self.worker.forward_batch_generation(
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/sglang/srt/managers/tp_worker.py", line 154, in forward_batch_generation
    logits_output = self.model_runner.forward(forward_batch)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/sglang/srt/model_executor/model_runner.py", line 679, in forward
    return self.forward_extend(forward_batch)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/sglang/srt/model_executor/model_runner.py", line 648, in forward_extend
    return self.model.forward(
           ^^^^^^^^^^^^^^^^^^^
  File "/home/work/.local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/sglang/srt/models/llama.py", line 337, in forward
    hidden_states = self.model(input_ids, positions, forward_batch, input_embeds)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/sglang/srt/models/llama.py", line 288, in forward
    hidden_states, residual = layer(
                              ^^^^^^
  File "/home/work/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/sglang/srt/models/llama.py", line 237, in forward
    hidden_states = self.self_attn(
                    ^^^^^^^^^^^^^^^
  File "/home/work/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/sglang/srt/models/llama.py", line 175, in forward
    output, _ = self.o_proj(attn_output)
                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/work/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/sglang/srt/lora/lora.py", line 248, in forward
    output_ = tensor_model_parallel_all_reduce(output_parallel)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NameError: name 'tensor_model_parallel_all_reduce' is not defined
```

### Reproduction

Model Name: Microsoft Phi-4

nohup python -m sglang.launch_server --model-path /home/work/ai/Microsoft_Phi-4/phi-4_quantized_8bit --lora-paths lora=/home/work/ai/Microsoft_Phi-4/lora_tuning_1221 --port 8001 --mem-fraction-static 0.8 --host 0.0.0.0 --dtype auto --disable-radix-cache --disable-cuda-graph --quantization gptq_marlin  --max-total-tokens 16384 --tp 2 &

### Environment

Python: 3.11.11 (main, Dec 11 2024, 16:28:39) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2: CUDA GPU
GPU 0,1,2 Compute Capability: 8.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.8, V11.8.89
CUDA Driver Version: 535.54.03
PyTorch: 2.5.1+cu124
sglang: 0.4.0
flashinfer: 0.1.6+cu121torch2.4
triton: 3.1.0
transformers: 4.48.0
torchao: 0.6.1
numpy: 1.26.4
aiohttp: 3.11.8
fastapi: 0.115.5
hf_transfer: 0.1.8
huggingface_hub: 0.27.0
interegular: 0.3.3
modelscope: 1.20.1
orjson: 3.10.12
packaging: 24.2
psutil: 6.1.0
pydantic: 2.10.4
multipart: 0.0.17
zmq: 26.2.0
uvicorn: 0.32.1
uvloop: 0.21.0
vllm: 0.6.4.post1
openai: 1.58.1
anthropic: Module Not Found
decord: 0.6.0
NVIDIA Topology: 
        GPU0    GPU1    GPU2    NIC0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      PIX     NODE    SYS     1,3,5,7,9,11    1               N/A
GPU1    PIX      X      NODE    SYS     1,3,5,7,9,11    1               N/A
GPU2    NODE    NODE     X      SYS     1,3,5,7,9,11    1               N/A
NIC0    SYS     SYS     SYS      X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0


ulimit soft: 1048576

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] tensor_model_parallel_all_reduce' is not defined #2931

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] tensor_model_parallel_all_reduce' is not defined #2931

Description

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions