-
Notifications
You must be signed in to change notification settings - Fork 261
fix: with the new llama.cpp version and chat templates rag_framework … #1937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…can no longer assume chunks can only be of type content. Adjusted the code so it doesnt break Signed-off-by: Brian <[email protected]>
Reviewer's guide (collapsed on small PRs)Reviewer's GuideThis PR modifies the rag_framework script to support the new llama.cpp version and updated chat templates by removing the hardcoded assumption that all chunks are of type “content” and introducing type checks and graceful handling for other chunk types. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @bmahabirbu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request resolves a critical compatibility issue within the rag_framework
script, which was encountering breakage due to recent changes in how llama.cpp
handles chat templates and response chunks. The modifications ensure that the framework can gracefully process AI responses, even when the structure of chunk.choices
varies, thereby enhancing the stability and robustness of the RAG system.
Highlights
- Compatibility Fix: Updated the
rag_framework
script to ensure compatibility with newer versions ofllama.cpp
and its chat templates. - Robust Chunk Processing: Implemented a null check for
chunk.choices
when processing AI responses, preventing potential errors if the choices array is empty or undefined. - Code Refinement: Introduced a local variable
content
to storechunk.choices[0].delta.content
for improved readability and maintainability within the response collection loop.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point by creating a comment using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands on the current page.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in pull request comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a potential IndexError
when processing streaming responses from the language model. The change correctly adds a check to ensure the choices
list in a response chunk is not empty before accessing its elements. This makes the code more robust, especially with newer llama.cpp
versions that might send chunks without content. The introduction of a content
variable also improves readability and avoids redundant attribute access. The fix is correct and well-implemented.
easy way to test
grab the output and paste without the -c "" like this
then inside the container run llama.cpp server in background
check the logs to see it starts correctly cd to /usr/bin copy and paste the changed file I have here finally run |
I think you need to rebase for the test to pass. |
LGTM |
I ran into the same bug, thanks for the PR! |
hmm a lot of the tests are failing because of |
…can no longer assume chunks (containers#1937) can only be of type content. Adjusted the code so it doesnt break Signed-off-by: Brian <[email protected]>
…can no longer assume chunks
can only be of type content. Adjusted the code so it doesnt break
Summary by Sourcery
Update RAG framework script to handle new chunk types introduced by the latest llama.cpp version and chat templates
Bug Fixes:
Enhancements: