Skip to content

Conversation

WoosukKwon
Copy link
Collaborator

@WoosukKwon WoosukKwon commented Mar 26, 2023

TODO:

  • Test against HF implementation
  • Add TP support (@zhuohan123)

@WoosukKwon
Copy link
Collaborator Author

@zhuohan123 Please feel free to approve and merge this PR once you think it's ready.

@zhuohan123 zhuohan123 self-requested a review March 29, 2023 06:37
@zhuohan123 zhuohan123 merged commit 80a2f81 into main Mar 30, 2023
@WoosukKwon WoosukKwon deleted the llama branch April 12, 2023 03:12
v1nc3nt27 pushed a commit to v1nc3nt27/vllm that referenced this pull request Sep 12, 2023
dont error if user doesnt have kernels installed
bigPYJ1151 pushed a commit to bigPYJ1151/vllm that referenced this pull request Dec 29, 2023
@alixiaodi alixiaodi mentioned this pull request Aug 2, 2024
zeroorhero pushed a commit to zeroorhero/vllm that referenced this pull request Sep 23, 2024
juncgu pushed a commit to juncgu/vllm that referenced this pull request May 8, 2025
Suggestion: Generalize/streamline async loading (remote prefill) side
zyongye pushed a commit to zyongye/vllm that referenced this pull request Aug 5, 2025
zyongye pushed a commit to zyongye/vllm that referenced this pull request Aug 6, 2025
zyongye pushed a commit to zyongye/vllm that referenced this pull request Aug 7, 2025
heheda12345 added a commit to heheda12345/vllm that referenced this pull request Sep 29, 2025
* code from ds

Signed-off-by: youkaichao <[email protected]>

* doc from ds

Signed-off-by: youkaichao <[email protected]>

* Fixes for support_materials/2-tilelang/

Signed-off-by: mgoin <[email protected]>

* Fix example 1

Signed-off-by: mgoin <[email protected]>

* Fix Einsum in deepgemm

* Fix `libc10.so` unimported error

* fix reference code

Signed-off-by: youkaichao <[email protected]>

* adding missing indexer args

* passing index args into the module

* init

Signed-off-by: Chen Zhang <[email protected]>

* build indexer k cache medadata

* prefill indexer, but weight_proj will output -inf

* unqiantized paged indexer, still have -inf issue

* remove support material

* adding topk_indices mask

* add weight scale

* unittest infrastructure and fix weight_proj, numeric error due to quantization

* varlen prefill passed

* paged prefill

* add indices mask

---------

Signed-off-by: youkaichao <[email protected]>
Signed-off-by: mgoin <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
Co-authored-by: youkaichao <[email protected]>
Co-authored-by: mgoin <[email protected]>
Co-authored-by: Wentao Ye <[email protected]>
Co-authored-by: Chen Zhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants