v2.3.2 (sync with upstream llama.cpp) #179

ngxson · 2025-06-06T21:12:31Z

Summary by CodeRabbit

Chores
- Updated the application version to 2.3.2.
- Updated internal references and URLs to use the new 2.3.2 version of WebAssembly binaries.
- Synchronized subproject dependencies to the latest commit.

coderabbitai · 2025-06-06T21:12:38Z

Walkthrough

The changes update the handling of key-value cache operations in the codebase by switching from direct context-based API calls to memory-based API calls. Additionally, the project updates the llama.cpp submodule, increments the package version to 2.3.2, and updates CDN URLs for WebAssembly binaries to match the new version.

Changes

Files / Groups	Change Summary
cpp/actions.hpp	Replaced direct llama_kv_self_* calls with llama_memory_* calls using memory pointer.
llama.cpp	Updated submodule commit reference to a newer version.
package.json	Incremented version from 2.3.1 to 2.3.2.
src/wasm-from-cdn.ts	Updated CDN URLs for Wasm binaries from version 2.3.1 to 2.3.2.

Sequence Diagram(s)

sequenceDiagram
    participant App
    participant Context
    participant Memory

    App->>Context: llama_get_memory(ctx)
    App->>Memory: llama_memory_* (remove, add, clear)
    Memory-->>App: Operation result

Poem

A hop and a skip, the cache now anew,
With memory pointers guiding us through.
The version hops up, the binaries align,
Llama’s new tricks are working just fine.
With each little patch, we leap to the sun—
Onward and upward, our work is well done!
🐇✨

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e4bd5e7 and a432e98.

⛔ Files ignored due to path filters (2)

src/multi-thread/wllama.wasm is excluded by !**/*.wasm
src/single-thread/wllama.wasm is excluded by !**/*.wasm

📒 Files selected for processing (4)

cpp/actions.hpp (5 hunks)
llama.cpp (1 hunks)
package.json (1 hunks)
src/wasm-from-cdn.ts (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

src/wasm-from-cdn.ts (1)

Learnt from: ngxson
PR: ngxson/wllama#0
File: :0-0
Timestamp: 2025-04-18T08:24:14.434Z
Learning: The file `generate_wasm_from_cdn.ts` in the wllama project is a generated script that gets stored on CDN, making it impossible to use `require('../package.json')` to dynamically access version information. This is why version references in this file need to be manually updated when the package version changes.

🔇 Additional comments (7)

llama.cpp (1)

1-1: Update llama.cpp submodule reference
Bump the submodule commit to 745aa5319b9930068aff5e87cf5e9eef7227339b to align with the v2.3.2 release.

package.json (1)

3-3: Version bump looks correct and consistent.

The version increment from "2.3.1" to "2.3.2" aligns with the PR title and corresponds to the CDN URL updates in the other files.

src/wasm-from-cdn.ts (1)

5-6: CDN URLs updated consistently with version bump.

The URLs for both single-thread and multi-thread WASM binaries have been correctly updated to reference version 2.3.2, maintaining consistency with the package.json version change.
cpp/actions.hpp (4)

652-652: Memory-based API migration implemented correctly.

The introduction of llama_get_memory(app.ctx) to obtain a memory pointer follows the new API pattern consistently.

689-690: KV clear operation migrated consistently.

The action_kv_clear function correctly uses the new memory-based API pattern.

771-771: Benchmark and perplexity functions updated consistently.

Both test functions correctly use llama_memory_clear(llama_get_memory(app.ctx), true) to clear the memory cache before starting their operations.

Also applies to: 842-842

658-659:
✅ Verification successful

Key-value cache operations migrated to memory-based API.

The replacement of llama_kv_self_* functions with llama_memory_* equivalents is consistent with the upstream llama.cpp changes. The function mappings appear correct:

llama_memory_seq_rm for removing sequences

llama_memory_seq_add for adding sequences

llama_memory_clear for clearing cache

Please verify that all llama_kv_self_* function calls have been migrated throughout the codebase:

Also applies to: 668-668, 672-672

🏁 Script executed:
#!/bin/bash
# Description: Check for any remaining llama_kv_self_* function calls that might need migration
echo "Searching for any remaining llama_kv_self_* function calls..."
rg "llama_kv_self_" --type cpp --type c

echo "Searching for new llama_memory_* function calls to verify migration..."
rg "llama_memory_" --type cpp --type c -A 2
Length of output: 1316
Key-value cache migration complete — no remaining llama_kv_self_* calls found

All llama_kv_self_* usages have been replaced with the appropriate llama_memory_* API calls. Verification in cpp/actions.hpp shows:

llama_memory_seq_rm and llama_memory_seq_add around lines 658–659 (and similar blocks at 668, 672)

llama_memory_clear in all previous clear points

No further changes are needed.

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

v2.3.2

a432e98

ngxson merged commit 367be2f into master Jun 6, 2025
5 of 6 checks passed

felladrin mentioned this pull request Jul 2, 2025

v2.3.2 update made some large models stop working due to Out Of Memory #183

Open

coderabbitai bot mentioned this pull request Jul 14, 2025

sync with latest upstream llama.cpp #187

Merged

coderabbitai bot mentioned this pull request Jul 24, 2025

if KV rm fails, we should clear the whole cache #188

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v2.3.2 (sync with upstream llama.cpp) #179

v2.3.2 (sync with upstream llama.cpp) #179

Uh oh!

ngxson commented Jun 6, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jun 6, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

Uh oh!

Uh oh!

v2.3.2 (sync with upstream llama.cpp) #179

v2.3.2 (sync with upstream llama.cpp) #179

Uh oh!

Conversation

ngxson commented Jun 6, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

Uh oh!

Uh oh!

ngxson commented Jun 6, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jun 6, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)