Skip to content

Conversation

zhisbug
Copy link
Collaborator

@zhisbug zhisbug commented Apr 7, 2023

No description provided.

@zhuohan123
Copy link
Member

Changes in this PR have been added to the latest main branch.

@zhuohan123 zhuohan123 closed this May 24, 2023
@zhuohan123 zhuohan123 deleted the hao-integration branch June 18, 2023 07:25
slyalin pushed a commit to slyalin/vllm that referenced this pull request Apr 22, 2024
z103cb pushed a commit to dtrifiro/vllm that referenced this pull request May 21, 2024
…i_docker

Docker.ubi: add missing package git
joerunde pushed a commit to joerunde/vllm that referenced this pull request Jun 17, 2024
Within the existing `decoding` request parameter section:
  
  ```protobuf
  enum ResponseFormat {
    // Plain text, no constraints
    TEXT = 0;
    // Valid json
    JSON = 1;
  }

  message StringChoices {
    repeated string choices = 1;
  }

  // Mutually-exclusive guided decoding options
  oneof guided {
    // Output will be in the specified format
    ResponseFormat format = 3;
    // Output will follow the provided JSON schema
    string json_schema = 4;
    // Output will follow the provided regex pattern
    string regex = 5;
    // Output will be exactly one of the specified choices
    StringChoices choice = 6;
    // Output will follow the provided context free grammar
    string grammar = 7;
  }
```

Signed-off-by: Nick Hill <[email protected]>
bigPYJ1151 pushed a commit to bigPYJ1151/vllm that referenced this pull request Jul 31, 2024
@alixiaodi alixiaodi mentioned this pull request Aug 2, 2024
zyongye pushed a commit to zyongye/vllm that referenced this pull request Aug 5, 2025
zyongye pushed a commit to zyongye/vllm that referenced this pull request Aug 6, 2025
Bounty-hunter added a commit to Bounty-hunter/vllm that referenced this pull request Sep 25, 2025
heheda12345 added a commit to heheda12345/vllm that referenced this pull request Sep 29, 2025
…oject#26)

* indexer medatata to separate prefill and decode

* deep_gemm prefill kernel

* decode kernel, can run for single batch

* bug fixing insert decode k into kv before gemm

* don't use tilelang quant function

* faster non-looping torch for kv cache insertion

* add chunked prefill impl

* change quant kernel back to tilelang for promotion

* fix format (vllm-project#31)

Signed-off-by: Chen Zhang <[email protected]>

* update unit tests

* Fp8 indexer prefill (vllm-project#33)

* init

Signed-off-by: Chen Zhang <[email protected]>

* can run

---------

Signed-off-by: Chen Zhang <[email protected]>

* remove debug comment

Signed-off-by: Chen Zhang <[email protected]>

* cleanup

* further cleanup

---------

Signed-off-by: Chen Zhang <[email protected]>
Co-authored-by: mgoin <[email protected]>
Co-authored-by: Chen Zhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants