-
Notifications
You must be signed in to change notification settings - Fork 467
[Misc] Add KV cache event synchronization configuration examples #1410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Misc] Add KV cache event synchronization configuration examples #1410
Conversation
Add sample configurations for enabling KV cache event synchronization: - Network policy to allow ZMQ traffic on ports 5557-5558 - vLLM deployment with KV events enabled via CLI arguments - vLLM deployment with KV events enabled via environment variables These samples demonstrate how to configure vLLM pods to publish KV cache events via ZMQ for real-time cache state synchronization with AIBrix gateway. Signed-off-by: ZHENYU <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @ae86zhizhi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
I've added sample configuration files to demonstrate how to enable the KV cache event synchronization feature within AIBrix, specifically for vLLM pods. These examples show how to configure vLLM to publish cache events via ZMQ, which is critical for real-time cache state management. The configurations cover both CLI argument-based and environment variable-based approaches, along with a necessary network policy to allow the ZMQ traffic.
Highlights
- Network Policy for ZMQ Traffic: A new Kubernetes NetworkPolicy (
allow-kv-events.yaml
) has been added to permit ZMQ traffic on ports 5557-5558, which is essential for KV event communication between vLLM pods and the AIBrix gateway. - CLI-based KV Event Configuration Example: A new quickstart example (
model-with-kv-events.yaml
) demonstrates how to enable KV cache events in a vLLM deployment using command-line arguments, including the necessary service configuration to expose the ZMQ ports. - Environment-based KV Event Configuration Example: An alternative quickstart example (
model-with-kv-events-env.yaml
) provides a configuration for enabling KV cache events using environment variables, showcasing flexibility and including performance tuning parameters like the ZMQ high water mark.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR adds useful sample configurations for enabling KV cache event synchronization in AIBrix. The examples cover network policies, CLI-based configuration, and environment-based configuration, which is great for users. My review focuses on improving the usability and robustness of these sample files. I've suggested making the namespace handling more flexible and adding health probes for consistency and better pod lifecycle management in Kubernetes.
- Add explanatory comments for namespace field in all samples Clarifies that 'default' namespace is used for quickstart purposes and provides guidance on deploying to different namespaces - Add health probes to model-with-kv-events-env.yaml Ensures consistency with other quickstart samples and provides production-ready configuration with liveness, readiness, and startup probes These improvements address code review feedback while maintaining consistency with existing AIBrix sample conventions. Signed-off-by: ZHENYU <[email protected]>
@@ -0,0 +1,24 @@ | |||
--- | |||
apiVersion: networking.k8s.io/v1 | |||
kind: NetworkPolicy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you see any issues without using NetworkPolicy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gateway plugins are deployed in the aibrix-system
namespace, while KV event producers (model pods) are deployed in the default
namespace (for quickstart) or in user-specified
namespaces. The current NetworkPolicy only permits same-namespace traffic, but the actual deployment requires cross-namespace communication. I will prepare and submit a patch soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ae86zhizhi Are you planning to submit a new PR or a new commit in this PR?
Summary
This PR adds sample configuration files for enabling the KV cache event synchronization feature in AIBrix, demonstrating how to configure vLLM pods to publish cache events via ZMQ.
What this PR does / why we need it
The KV cache event synchronization feature requires specific configuration on vLLM pods to enable real-time cache state publishing. These sample files provide users with ready-to-use configurations showing different approaches to enable this feature.
Changes
samples/network-policies/allow-kv-events.yaml
samples/quickstart/model-with-kv-events.yaml
samples/quickstart/model-with-kv-events-env.yaml
Testing
Documentation
These samples serve as documentation for users implementing KV cache event synchronization. Each file includes inline comments explaining the configuration options.