You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
if (m_num_registered_snapkv_aggregated_scores < m_snapkv_window_size) {
56
+
OPENVINO_ASSERT(num_snapkv_scores + m_num_registered_snapkv_aggregated_scores <= m_snapkv_window_size, "Total number of aggregated SnapKV scores during prefill phase may not be larger than the configured SnapKV window size");
Copy file name to clipboardExpand all lines: src/cpp/src/continuous_batching/cache_eviction.hpp
+32-5Lines changed: 32 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -29,9 +29,17 @@ class EvictionScoreManager {
29
29
* @param num_decoder_layers Number of independent KV caches (each corresponding to a single attention layer) in the underlying LLM.
30
30
* @param max_pool_window_size Window size for the max pooling step applied to the newly registered scores before aggregation.
31
31
* @param aggregation_mode Aggregation mode for the scores across register calls.
32
-
* @param ignore_first_n_blocks Number of blocks from the beginning of the per-token score vector, the scores for which will be disregarded and never aggregated.
32
+
* @param ignore_first_n_blocks Number of blocks from the beginning of the per-token score vector, the scores for which will
33
+
* be disregarded and never aggregated.
34
+
* @param snapkv_window_size Window size for the SnapKV algorithm in effect. If non-zero, then by the start of the generation phase
35
+
* for the tracked sequence (when the total number of `num_snapkv_scores` passed to each `register_new_token_scores` call reaches
36
+
* the `snapkv_window_size`) the internal occurence counters will be:
37
+
* `| S | S | ... | S | S - 1 | S - 2 | ... | 2 | 1 |`,
38
+
* where `S` is equal to `snapkv_window_size`. In contrast, if this is set to 0, then the initial counter state would be
39
+
* `| L | L - 1 | ... | 2 | 1 |`,
40
+
* where L is the prompt size of the sequence in tokens.
* Registers new token scores and aggregates them internally as necessary. The token scores provided may be corresponding not to all
@@ -42,8 +50,9 @@ class EvictionScoreManager {
42
50
* scores in a corresponding decoder layer.
43
51
* @param skipped_logical_block_ids Logical block indices which had been skipped during inference call that produced the new scores, and
44
52
* which are missing from the new scores.
53
+
* @param num_snapkv_scores Number of latest token scores that were aggregated together when computing the registered score. If SnapKV is not used, this should be set to 0.
0 commit comments