-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of python collect_env.py
cuDNN version : Probably one of the following:
/usr/lib/aarch64-linux-gnu/libcudnn.so.9.12.0
/usr/lib/aarch64-linux-gnu/libcudnn_adv.so.9.12.0
/usr/lib/aarch64-linux-gnu/libcudnn_cnn.so.9.12.0
/usr/lib/aarch64-linux-gnu/libcudnn_engines_precompiled.so.9.12.0
/usr/lib/aarch64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.12.0
/usr/lib/aarch64-linux-gnu/libcudnn_graph.so.9.12.0
/usr/lib/aarch64-linux-gnu/libcudnn_heuristic.so.9.12.0
/usr/lib/aarch64-linux-gnu/libcudnn_ops.so.9.12.0
HIP runtime version : N/A
MIOpen runtime version : N/A
Is XNNPACK available : True
==============================
CPU Info
==============================
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 14
On-line CPU(s) list: 0-13
Vendor ID: ARM
Model name: -
Model: 0
Thread(s) per core: 1
Core(s) per cluster: 14
Socket(s): -
Cluster(s): 1
Stepping: r0p0
CPU(s) scaling MHz: 45%
CPU max MHz: 2601.0000
CPU min MHz: 54.0000
BogoMIPS: 2000.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh bti ecv afp wfxt
L1d cache: 896 KiB (14 instances)
L1i cache: 896 KiB (14 instances)
L2 cache: 14 MiB (14 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-13
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
==============================
Versions of relevant libraries
=============
[pip3] nvidia-cudnn-cu13==9.13.1.26
[pip3] nvidia-cudnn-frontend==1.14.1
[pip3] nvidia-cufft==12.0.0.15
[pip3] nvidia-cufile==1.15.0.42
[pip3] nvidia-curand==10.4.0.35
[pip3] nvidia-cusolver==12.0.3.29
[pip3] nvidia-cusparse==12.6.2.49
[pip3] nvidia-cusparselt-cu13==0.8.1
[pip3] nvidia-cutlass-dsl==4.2.1
[pip3] nvidia-ml-py==13.580.82
[pip3] nvidia-nccl-cu13==2.28.3
[pip3] nvidia-nvjitlink==13.0.39
[pip3] nvidia-nvshmem-cu13==3.4.5
[pip3] nvidia-nvtx==13.0.39
[pip3] pyzmq==27.1.0
[pip3] torch==2.9.0+cu130
[pip3] torchaudio==2.9.0
[pip3] torchvision==0.24.0
[pip3] transformers==4.56.2
[pip3] triton==3.5.0
[conda] Could not collect
==============================
vLLM Info
==============================
ROCM Version : Could not collect
vLLM Version : 0.1.dev10013+gf28cc5943.d20250929 (git sha: f28cc5943, date: 20250929)
vLLM Build Flags:
CUDA Archs: 11.0a; ROCm: Disabled
GPU Topology:
GPU0 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X 0-13 0 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
==============================
Environment Variables
==============================
CUDAARCHS=110a
CUDA_ARCHITECTURES=110a
CUDA_ARCH_LIST=110a
MAX_JOBS=8
NVCC_THREADS=1
NVIDIA_DRIVER_CAPABILITIES=all
NVIDIA_VISIBLE_DEVICES=all
TORCH_CUDA_ARCH_LIST=11.0a
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
🐛 Describe the bug
OMP_NUM_THREADS=8 VLLM_USE_AITER_UNIFIED_ATTENTION=1 VLLM_ATTENTION_BACKEND=FLASHINFER VLLM_USE_FLASHINFER_MOE_FP8=1 vllm serve --async-scheduling --gpu-memory-utilization 0.8 --enable-auto-tool-choice --tool-call-parser hermes --model=Qwen/Qwen3-30B-A3B-Instruct-2507-FP8
Then I got
(Worker pid=2958) INFO 09-30 13:59:50 [default_loader.py:267] Loading weights took 29.90 seconds
(Worker pid=2958) INFO 09-30 13:59:51 [gpu_model_runner.py:2730] Model loading took 29.0972 GiB and 32.575064 seconds
(Worker pid=2958) INFO 09-30 13:59:59 [backends.py:548] Using cache directory: /home/jasl/.cache/vllm/torch_compile_cache/b222e3a507/rank_0_0/backbone for vLLM's torch.compile
(Worker pid=2958) INFO 09-30 13:59:59 [backends.py:559] Dynamo bytecode transform time: 7.64 s
(Worker pid=2958) INFO 09-30 14:00:00 [backends.py:197] Cache the graph for dynamic shape for later use
(Worker pid=2958) INFO 09-30 14:00:24 [backends.py:218] Compiling a graph for dynamic shape takes 24.27 s
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] WorkerProc hit an exception.
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] Traceback (most recent call last):
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 666, in worker_busy_loop
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] output = func(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return func(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 263, in determine_available_memory
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] self.model_runner.profile_run()
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3455, in profile_run
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] = self._dummy_run(self.max_num_tokens, is_profile=True)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return func(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3215, in _dummy_run
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] outputs = self.model(
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/compilation/cuda_graph.py", line 121, in __call__
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return self.runnable(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return self._call_impl(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return forward_call(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_moe.py", line 675, in forward
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] hidden_states = self.model(input_ids, positions, intermediate_tensors,
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/compilation/decorators.py", line 310, in __call__
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] output = self.compiled_callable(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return fn(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_moe.py", line 403, in forward
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] def forward(
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return super().__call__(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return self._call_impl(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return forward_call(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return fn(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return self._wrapped_call(self, *args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 413, in __call__
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] raise e
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 400, in __call__
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return self._call_impl(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return forward_call(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "<eval_with_key>.98", line 449, in forward
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] submod_2 = self.submod_2(getitem_3, s72, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_weight_scale_, getitem_4, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_parameters_weight_, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_weight_scale_, l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); getitem_3 = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_weight_scale_ = getitem_4 = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_parameters_weight_ = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_weight_scale_ = l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_ = None
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/compilation/cuda_graph.py", line 121, in __call__
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return self.runnable(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/compilation/piecewise_backend.py", line 90, in __call__
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return self.compiled_graph_for_general_shape(*args)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return fn(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1130, in forward
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return compiled_fn(full_args)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 353, in runtime_wrapper
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] all_outs = call_func_at_runtime_with_args(
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] out = normalize_as_list(f(args))
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 724, in inner_fn
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] outs = compiled_fn(args)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 526, in wrapper
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return compiled_fn(runtime_args)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_inductor/output_code.py", line 613, in __call__
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return self.current_callable(inputs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_inductor/utils.py", line 2962, in run
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] out = model(new_inputs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.cache/vllm/torch_compile_cache/b222e3a507/rank_0_0/inductor_cache/3n/c3nzlrtzcrjju5baj2343pmuikuinhe4q7jocnacrmjywvx42dxq.py", line 687, in call
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] buf5 = torch.ops.vllm.moe_forward.default(buf3, buf4, 'model.layers.0.mlp.experts')
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_ops.py", line 841, in __call__
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return self._op(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2144, in moe_forward
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return self.forward_impl(hidden_states, router_logits)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2025, in forward_impl
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] final_hidden_states = self.quant_method.apply(
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/quantization/fp8.py", line 1049, in apply
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] assert self.block_quant is None
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] AssertionError
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] Traceback (most recent call last):
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 666, in worker_busy_loop
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] output = func(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return func(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 263, in determine_available_memory
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] self.model_runner.profile_run()
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3455, in profile_run
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] = self._dummy_run(self.max_num_tokens, is_profile=True)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return func(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3215, in _dummy_run
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] outputs = self.model(
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/compilation/cuda_graph.py", line 121, in __call__
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return self.runnable(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return self._call_impl(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return forward_call(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_moe.py", line 675, in forward
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] hidden_states = self.model(input_ids, positions, intermediate_tensors,
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/compilation/decorators.py", line 310, in __call__
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] output = self.compiled_callable(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 832, in compile_wrapper
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return fn(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_moe.py", line 403, in forward
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] def forward(
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 414, in __call__
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return super().__call__(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return self._call_impl(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return forward_call(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return fn(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return self._wrapped_call(self, *args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 413, in __call__
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] raise e
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 400, in __call__
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return self._call_impl(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return forward_call(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "<eval_with_key>.98", line 449, in forward
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] submod_2 = self.submod_2(getitem_3, s72, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_weight_scale_, getitem_4, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_mlp_modules_gate_parameters_weight_, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_weight_scale_, l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); getitem_3 = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_weight_scale_ = getitem_4 = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_mlp_modules_gate_parameters_weight_ = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_weight_scale_ = l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_ = None
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/compilation/cuda_graph.py", line 121, in __call__
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return self.runnable(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/compilation/piecewise_backend.py", line 90, in __call__
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return self.compiled_graph_for_general_shape(*args)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return fn(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_functorch/aot_autograd.py", line 1130, in forward
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return compiled_fn(full_args)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 353, in runtime_wrapper
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] all_outs = call_func_at_runtime_with_args(
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] out = normalize_as_list(f(args))
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 724, in inner_fn
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] outs = compiled_fn(args)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 526, in wrapper
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return compiled_fn(runtime_args)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_inductor/output_code.py", line 613, in __call__
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return self.current_callable(inputs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_inductor/utils.py", line 2962, in run
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] out = model(new_inputs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.cache/vllm/torch_compile_cache/b222e3a507/rank_0_0/inductor_cache/3n/c3nzlrtzcrjju5baj2343pmuikuinhe4q7jocnacrmjywvx42dxq.py", line 687, in call
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] buf5 = torch.ops.vllm.moe_forward.default(buf3, buf4, 'model.layers.0.mlp.experts')
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/torch/_ops.py", line 841, in __call__
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return self._op(*args, **kwargs)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2144, in moe_forward
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] return self.forward_impl(hidden_states, router_logits)
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/layer.py", line 2025, in forward_impl
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] final_hidden_states = self.quant_method.apply(
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/quantization/fp8.py", line 1049, in apply
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] assert self.block_quant is None
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671] AssertionError
(Worker pid=2958) ERROR 09-30 14:00:26 [multiproc_executor.py:671]
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] EngineCore failed to start.
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] Traceback (most recent call last):
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 703, in run_engine_core
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 499, in __init__
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 93, in __init__
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 191, in _initialize_kv_caches
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] self.model_executor.determine_available_memory())
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 85, in determine_available_memory
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 262, in collective_rpc
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] result = result.result()
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] ^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] File "/usr/lib/python3.12/concurrent/futures/_base.py", line 456, in result
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] return self.__get_result()
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] raise self._exception
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] result = self.fn(*self.args, **self.kwargs)
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 248, in get_response
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] raise RuntimeError(
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:26 [core.py:712] RuntimeError: Worker failed with error '', please check the stack trace above for the root cause
(EngineCore_DP0 pid=2936) ERROR 09-30 14:00:28 [multiproc_executor.py:154] Worker proc VllmWorker-0 died unexpectedly, shutting down executor.
(EngineCore_DP0 pid=2936) Process EngineCore_DP0:
(EngineCore_DP0 pid=2936) Traceback (most recent call last):
(EngineCore_DP0 pid=2936) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=2936) self.run()
(EngineCore_DP0 pid=2936) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=2936) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=2936) File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 716, in run_engine_core
(EngineCore_DP0 pid=2936) raise e
(EngineCore_DP0 pid=2936) File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 703, in run_engine_core
(EngineCore_DP0 pid=2936) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=2936) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2936) File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 499, in __init__
(EngineCore_DP0 pid=2936) super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=2936) File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 93, in __init__
(EngineCore_DP0 pid=2936) self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=2936) File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 191, in _initialize_kv_caches
(EngineCore_DP0 pid=2936) self.model_executor.determine_available_memory())
(EngineCore_DP0 pid=2936) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2936) File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 85, in determine_available_memory
(EngineCore_DP0 pid=2936) return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=2936) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2936) File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 262, in collective_rpc
(EngineCore_DP0 pid=2936) result = result.result()
(EngineCore_DP0 pid=2936) ^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2936) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 456, in result
(EngineCore_DP0 pid=2936) return self.__get_result()
(EngineCore_DP0 pid=2936) ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2936) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore_DP0 pid=2936) raise self._exception
(EngineCore_DP0 pid=2936) File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
(EngineCore_DP0 pid=2936) result = self.fn(*self.args, **self.kwargs)
(EngineCore_DP0 pid=2936) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2936) File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 248, in get_response
(EngineCore_DP0 pid=2936) raise RuntimeError(
(EngineCore_DP0 pid=2936) RuntimeError: Worker failed with error '', please check the stack trace above for the root cause
(APIServer pid=2909) Traceback (most recent call last):
(APIServer pid=2909) File "/home/jasl/.venv/bin/vllm", line 10, in <module>
(APIServer pid=2909) sys.exit(main())
(APIServer pid=2909) ^^^^^^
(APIServer pid=2909) File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 54, in main
(APIServer pid=2909) args.dispatch_function(args)
(APIServer pid=2909) File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 57, in cmd
(APIServer pid=2909) uvloop.run(run_server(args))
(APIServer pid=2909) File "/home/jasl/.venv/lib/python3.12/site-packages/uvloop/__init__.py", line 109, in run
(APIServer pid=2909) return __asyncio.run(
(APIServer pid=2909) ^^^^^^^^^^^^^^
(APIServer pid=2909) File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
(APIServer pid=2909) return runner.run(main)
(APIServer pid=2909) ^^^^^^^^^^^^^^^^
(APIServer pid=2909) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=2909) return self._loop.run_until_complete(task)
(APIServer pid=2909) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2909) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=2909) File "/home/jasl/.venv/lib/python3.12/site-packages/uvloop/__init__.py", line 61, in wrapper
(APIServer pid=2909) return await main
(APIServer pid=2909) ^^^^^^^^^^
(APIServer pid=2909) File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1884, in run_server
(APIServer pid=2909) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=2909) File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1902, in run_server_worker
(APIServer pid=2909) async with build_async_engine_client(
(APIServer pid=2909) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=2909) return await anext(self.gen)
(APIServer pid=2909) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2909) File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client
(APIServer pid=2909) async with build_async_engine_client_from_engine_args(
(APIServer pid=2909) File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=2909) return await anext(self.gen)
(APIServer pid=2909) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2909) File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 225, in build_async_engine_client_from_engine_args
(APIServer pid=2909) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=2909) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2909) File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/utils/__init__.py", line 1571, in inner
(APIServer pid=2909) return fn(*args, **kwargs)
(APIServer pid=2909) ^^^^^^^^^^^^^^^^^^^
(APIServer pid=2909) File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config
(APIServer pid=2909) return cls(
(APIServer pid=2909) ^^^^
(APIServer pid=2909) File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 134, in __init__
(APIServer pid=2909) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=2909) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2909) File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
(APIServer pid=2909) return AsyncMPClient(*client_args)
(APIServer pid=2909) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2909) File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 769, in __init__
(APIServer pid=2909) super().__init__(
(APIServer pid=2909) File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 448, in __init__
(APIServer pid=2909) with launch_core_engines(vllm_config, executor_class,
(APIServer pid=2909) File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=2909) next(self.gen)
(APIServer pid=2909) File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 732, in launch_core_engines
(APIServer pid=2909) wait_for_engine_startup(
(APIServer pid=2909) File "/home/jasl/.venv/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
(APIServer pid=2909) raise RuntimeError("Engine core initialization failed. "
(APIServer pid=2909) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working