[KVCache] Per Layer Sliding Window #17928

joshua-j-hong · 2025-05-07T19:02:02Z

Adds per layer sliding window functionality to the KV Cache. Correctness is mostly achieved, but there are some cases where single tokens are strange. The corresponding MLC-LLM PR is mlc-ai/mlc-llm#3248

A full list of changes and additions are below

Add a new attention type for per-layer sliding window called MHA_SLIDING
Add corresponding vectors for per-layer sliding window offset calculations
For sliding window attention enabled KV-cache, regular sliding window is disabled to prevent page eviction
Gemma3 has different rope parameters for local sliding window layers. This should be passed as a parameter for the KVCache, but currently these values are hardcoded

Update main

joshua-j-hong · 2025-05-08T19:08:43Z

With some further testing and investigation, there is an additional MLC-LLM/TVM bug related to excessive prefilling (even without the per-layer sliding window changes outlined here) that may be causing inference slowdown.

MasterJH5574

LGTM, thank you @joshua-j-hong!

MasterJH5574 · 2025-06-11T18:54:13Z

Just see some conflicts with upstream. Likely we need to do a rebase. Related changes are the recent FFI refactor and a namespace rename from relax_vm to vm. I'll check the PR again after updating.

joshua-j-hong · 2025-06-21T20:31:40Z

Conflicts and tests are fixed. Current plans are for a future change that will add optional parameters to the KVCache, some of which will be for layer sliding window. This will ensure that no hardcoding of values will be needed and the KVCache will be backwards compatible.

Adds per layer sliding window functionality to the KV Cache. Correctness is mostly achieved, but there are some cases where single tokens are strange. The corresponding MLC-LLM PR is mlc-ai/mlc-llm#3248 A full list of changes and additions are below - Add a new attention type for per-layer sliding window called `MHA_SLIDING` - Add corresponding vectors for per-layer sliding window offset calculations - For sliding window attention enabled KV-cache, regular sliding window is disabled to prevent page eviction - Gemma3 has different rope parameters for local sliding window layers. This should be passed as a parameter for the KVCache, but currently these values are hardcoded

joshua-j-hong added 7 commits February 21, 2025 22:06

Merge pull request #1 from apache/main

070116d

Update main

Merge branch 'apache:main' into main

68ea7d0

Merge branch 'apache:main' into main

35d03f6

Merge branch 'apache:main' into main

cd053ac

Merge branch 'apache:main' into main

cb2267f

Merge branch 'apache:main' into main

26ce958

Merge branch 'apache:main' into main

7e605bb

joshua-j-hong changed the title ~~KV Cache Per Layer Sliding Window~~ [KVCache] Per Layer Sliding Window May 7, 2025

Sliding window changes, with correctness/performance bug

936d500

joshua-j-hong force-pushed the jjhong_KV_alt_sliding_window branch from 3fb27bb to 936d500 Compare May 8, 2025 03:42

Joshua Hong and others added 4 commits May 19, 2025 01:16

Various fixes, code cleanup, but still nvidia memory error

f894271

Fix correctness bug

6a07656

Fix correctness issues due to rope frequency

989d6ec

Code cleanup

9b46bf2

joshua-j-hong mentioned this pull request Jun 10, 2025

[KVCache] Per Layer Sliding Window mlc-ai/mlc-llm#3248

Merged

Add static cast

ff16486

MasterJH5574 approved these changes Jun 11, 2025

View reviewed changes

joshua-j-hong and others added 8 commits June 15, 2025 11:52

Merge branch 'apache:main' into main

d399c3d

Merge branch 'main' into jjhong_KV_alt_sliding_window

5c6ccd5

Unity segfault fix

5a4e413

Fix bug

085693b

Test log

7f1d186

Change sliding window size to temporary constant

13880f1

Fix lint

675d0cf

Fix c++ lint

95dff31

MasterJH5574 approved these changes Jun 23, 2025

View reviewed changes

MasterJH5574 merged commit 23bcbc5 into apache:main Jun 23, 2025
10 checks passed

ysh329 mentioned this pull request Jul 16, 2025

[Release] v0.21.0 Release Candidate Notes #18150

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[KVCache] Per Layer Sliding Window #17928

[KVCache] Per Layer Sliding Window #17928

Uh oh!

joshua-j-hong commented May 7, 2025 •

edited

Loading

Uh oh!

joshua-j-hong commented May 8, 2025 •

edited

Loading

Uh oh!

MasterJH5574 left a comment

Uh oh!

MasterJH5574 commented Jun 11, 2025 •

edited

Loading

Uh oh!

joshua-j-hong commented Jun 21, 2025

Uh oh!

Uh oh!

Uh oh!

[KVCache] Per Layer Sliding Window #17928

[KVCache] Per Layer Sliding Window #17928

Uh oh!

Conversation

joshua-j-hong commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joshua-j-hong commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MasterJH5574 left a comment

Choose a reason for hiding this comment

Uh oh!

MasterJH5574 commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joshua-j-hong commented Jun 21, 2025

Uh oh!

Uh oh!

Uh oh!

joshua-j-hong commented May 7, 2025 •

edited

Loading

joshua-j-hong commented May 8, 2025 •

edited

Loading

MasterJH5574 commented Jun 11, 2025 •

edited

Loading