Skip to content

Conversation

ae86zhizhi
Copy link
Contributor

Summary

This PR adds sample configuration files for enabling the KV cache event synchronization feature in AIBrix, demonstrating how to configure vLLM pods to publish cache events via ZMQ.

What this PR does / why we need it

The KV cache event synchronization feature requires specific configuration on vLLM pods to enable real-time cache state publishing. These sample files provide users with ready-to-use configurations showing different approaches to enable this feature.

Changes

  • Network Policy: Added samples/network-policies/allow-kv-events.yaml
    • Configures network policy to allow ZMQ traffic on ports 5557-5558
    • Required for KV event communication between vLLM pods and AIBrix gateway
  • CLI-based Configuration: Added samples/quickstart/model-with-kv-events.yaml
    • Full deployment example with KV events enabled via command-line arguments
    • Shows complete vLLM deployment with DeepSeek-R1-Distill-Llama-8B model
    • Includes service configuration exposing KV event ports
  • Environment-based Configuration: Added samples/quickstart/model-with-kv-events-env.yaml
    • Alternative approach using environment variables for configuration
    • Demonstrates configuration flexibility for different deployment scenarios
    • Includes performance tuning parameters like ZMQ high water mark

Testing

  • Network policy correctly allows ZMQ traffic between pods
  • vLLM pods start successfully with KV event publishing enabled
  • KV events are published on configured ZMQ ports (5557/5558)
  • AIBrix gateway can subscribe and receive events from vLLM pods
  • Both CLI and environment-based configurations work correctly

Documentation

These samples serve as documentation for users implementing KV cache event synchronization. Each file includes inline comments explaining the configuration options.

Add sample configurations for enabling KV cache event synchronization:
- Network policy to allow ZMQ traffic on ports 5557-5558
- vLLM deployment with KV events enabled via CLI arguments
- vLLM deployment with KV events enabled via environment variables

These samples demonstrate how to configure vLLM pods to publish KV cache events
via ZMQ for real-time cache state synchronization with AIBrix gateway.

Signed-off-by: ZHENYU <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @ae86zhizhi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've added sample configuration files to demonstrate how to enable the KV cache event synchronization feature within AIBrix, specifically for vLLM pods. These examples show how to configure vLLM to publish cache events via ZMQ, which is critical for real-time cache state management. The configurations cover both CLI argument-based and environment variable-based approaches, along with a necessary network policy to allow the ZMQ traffic.

Highlights

  • Network Policy for ZMQ Traffic: A new Kubernetes NetworkPolicy (allow-kv-events.yaml) has been added to permit ZMQ traffic on ports 5557-5558, which is essential for KV event communication between vLLM pods and the AIBrix gateway.
  • CLI-based KV Event Configuration Example: A new quickstart example (model-with-kv-events.yaml) demonstrates how to enable KV cache events in a vLLM deployment using command-line arguments, including the necessary service configuration to expose the ZMQ ports.
  • Environment-based KV Event Configuration Example: An alternative quickstart example (model-with-kv-events-env.yaml) provides a configuration for enabling KV cache events using environment variables, showcasing flexibility and including performance tuning parameters like the ZMQ high water mark.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR adds useful sample configurations for enabling KV cache event synchronization in AIBrix. The examples cover network policies, CLI-based configuration, and environment-based configuration, which is great for users. My review focuses on improving the usability and robustness of these sample files. I've suggested making the namespace handling more flexible and adding health probes for consistency and better pod lifecycle management in Kubernetes.

- Add explanatory comments for namespace field in all samples
  Clarifies that 'default' namespace is used for quickstart purposes
  and provides guidance on deploying to different namespaces

- Add health probes to model-with-kv-events-env.yaml
  Ensures consistency with other quickstart samples and provides
  production-ready configuration with liveness, readiness, and startup probes

These improvements address code review feedback while maintaining
consistency with existing AIBrix sample conventions.

Signed-off-by: ZHENYU <[email protected]>
@@ -0,0 +1,24 @@
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you see any issues without using NetworkPolicy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gateway plugins are deployed in the aibrix-system namespace, while KV event producers (model pods) are deployed in the default namespace (for quickstart) or in user-specified namespaces. The current NetworkPolicy only permits same-namespace traffic, but the actual deployment requires cross-namespace communication. I will prepare and submit a patch soon.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ae86zhizhi Are you planning to submit a new PR or a new commit in this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants