[DO NOT MERGE] Hao integration #31

zhisbug · 2023-04-07T10:21:24Z

No description provided.

zhuohan123 · 2023-05-24T04:42:47Z

Changes in this PR have been added to the latest main branch.

Enabled int8 weights by default

…i_docker Docker.ubi: add missing package git

Within the existing `decoding` request parameter section: ```protobuf enum ResponseFormat { // Plain text, no constraints TEXT = 0; // Valid json JSON = 1; } message StringChoices { repeated string choices = 1; } // Mutually-exclusive guided decoding options oneof guided { // Output will be in the specified format ResponseFormat format = 3; // Output will follow the provided JSON schema string json_schema = 4; // Output will follow the provided regex pattern string regex = 5; // Output will be exactly one of the specified choices StringChoices choice = 6; // Output will follow the provided context free grammar string grammar = 7; } ``` Signed-off-by: Nick Hill <[email protected]>

…on_opts vLLM lm head optimization (tpp)

Signed-off-by: Chen Zhang <[email protected]>

Co-authored-by: dengyunyang <[email protected]>

…oject#26) * indexer medatata to separate prefill and decode * deep_gemm prefill kernel * decode kernel, can run for single batch * bug fixing insert decode k into kv before gemm * don't use tilelang quant function * faster non-looping torch for kv cache insertion * add chunked prefill impl * change quant kernel back to tilelang for promotion * fix format (vllm-project#31) Signed-off-by: Chen Zhang <[email protected]> * update unit tests * Fp8 indexer prefill (vllm-project#33) * init Signed-off-by: Chen Zhang <[email protected]> * can run --------- Signed-off-by: Chen Zhang <[email protected]> * remove debug comment Signed-off-by: Chen Zhang <[email protected]> * cleanup * further cleanup --------- Signed-off-by: Chen Zhang <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: Chen Zhang <[email protected]>

zhisbug added 9 commits April 4, 2023 14:06

changes

440915b

merge main

5eadcff

update stop_str

c858c58

recover

3f23520

fix a stop_str name

6442e3e

update

7b121fa

update

b85250e

not using fast tokenizer

0bdd814

add support for koala and alpaca

8e56ab6

zhuohan123 closed this May 24, 2023

zhuohan123 deleted the hao-integration branch June 18, 2023 07:25

shanshanpt mentioned this pull request Nov 17, 2023

Run long conetxt error : CUDA error: an illegal memory access was encountered #1700

Closed

junior-zsy mentioned this pull request Nov 20, 2023

Error with 32k Long Text in chatglm2-6b-32k Model #1725

Closed

slyalin pushed a commit to slyalin/vllm that referenced this pull request Apr 22, 2024

Merge pull request vllm-project#31 from slyalin/int8_enabled_by_default

469a4d0

Enabled int8 weights by default

z103cb pushed a commit to dtrifiro/vllm that referenced this pull request May 21, 2024

Merge pull request vllm-project#31 from z103cb/ibm_main_add_git_to_ub…

38eed8a

…i_docker Docker.ubi: add missing package git

ZHJ19970917 mentioned this pull request Jul 14, 2024

[Bug]: When using qwen-32b-chat-awq with multi-threaded access, errors occur after approximately several hundred visits.”vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.“ #6421

Closed

bigPYJ1151 pushed a commit to bigPYJ1151/vllm that referenced this pull request Jul 31, 2024

Merge pull request vllm-project#31 from intel-sandbox/jianan/generati…

25e4d7b

…on_opts vLLM lm head optimization (tpp)

alixiaodi mentioned this pull request Aug 2, 2024

[Bug]: #7072

Closed

surak mentioned this pull request Apr 1, 2025

[Bug]: building docker from Dockerfile #15872

Closed

1 task

hao-cold mentioned this pull request May 13, 2025

[Bug]: CUDA error: an illegal instruction was encountered #18045

Closed

1 task

markmc mentioned this pull request May 21, 2025

[Bug][Failing Test]: Distributed Comm Ops - distributed/test_shm_broadcast.py #18492

Closed

1 task

zerosurplus mentioned this pull request Jun 16, 2025

[Bug]: torch.distributed.DistNetworkError: The client socket has timed out after 600000ms while trying to connect to (172.17.0.9, 46229). #19670

Open

1 task

xiaomofang mentioned this pull request Jul 31, 2025

[Bug]: There is an issue with speculative inference in Eagle mode, where the context length of vLLM inference is constrained by the draft model. #21986

Open

1 task

zyongye pushed a commit to zyongye/vllm that referenced this pull request Aug 5, 2025

Move responses_api.py to examples (vllm-project#31)

e0bf571

Signed-off-by: Chen Zhang <[email protected]>

zyongye pushed a commit to zyongye/vllm that referenced this pull request Aug 6, 2025

Move responses_api.py to examples (vllm-project#31)

19e469f

Signed-off-by: Chen Zhang <[email protected]>

Bounty-hunter added a commit to Bounty-hunter/vllm that referenced this pull request Sep 25, 2025

setting keepalive time (vllm-project#31)

3f3c455

Co-authored-by: dengyunyang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[DO NOT MERGE] Hao integration #31

[DO NOT MERGE] Hao integration #31

Uh oh!

zhisbug commented Apr 7, 2023

Uh oh!

zhuohan123 commented May 24, 2023

Uh oh!

Uh oh!

Uh oh!

[DO NOT MERGE] Hao integration #31

[DO NOT MERGE] Hao integration #31

Uh oh!

Conversation

zhisbug commented Apr 7, 2023

Uh oh!

zhuohan123 commented May 24, 2023

Uh oh!

Uh oh!