Skip to content

Commit eede58e

Browse files
GaoYusongMahmoudAshraf97
authored andcommitted
Fix: resolve prefill of retracted request out-of-memory issue when ignore_eos is enabled (sgl-project#7434)
1 parent 2d96b83 commit eede58e

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

python/sglang/srt/managers/schedule_policy.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -455,7 +455,9 @@ def add_req_state(r, insert_sort=False):
455455
if not self.is_hybrid:
456456
# Skip this logic for swa. The SWA has different memory management, and
457457
# this mechanism is underestimating the memory usage.
458-
cur_rem_tokens = self.cur_rem_tokens - len(req.origin_input_ids)
458+
cur_rem_tokens = self.cur_rem_tokens - self.ceil_paged_tokens(
459+
req.extend_input_len
460+
)
459461
tokens_freed = 0
460462
for i, (tokens_left, tokens_occupied) in enumerate(self.req_states):
461463
# tokens_left gives a reservative calculation as the last token is not stored

0 commit comments

Comments
 (0)