[Misc] Add KV cache event synchronization configuration examples #1410

ae86zhizhi · 2025-08-07T00:57:22Z

Summary

This PR adds sample configuration files for enabling the KV cache event synchronization feature in AIBrix, demonstrating how to configure vLLM pods to publish cache events via ZMQ.

What this PR does / why we need it

The KV cache event synchronization feature requires specific configuration on vLLM pods to enable real-time cache state publishing. These sample files provide users with ready-to-use configurations showing different approaches to enable this feature.

Changes

Network Policy: Added samples/network-policies/allow-kv-events.yaml
- Configures network policy to allow ZMQ traffic on ports 5557-5558
- Required for KV event communication between vLLM pods and AIBrix gateway
CLI-based Configuration: Added samples/quickstart/model-with-kv-events.yaml
- Full deployment example with KV events enabled via command-line arguments
- Shows complete vLLM deployment with DeepSeek-R1-Distill-Llama-8B model
- Includes service configuration exposing KV event ports
Environment-based Configuration: Added samples/quickstart/model-with-kv-events-env.yaml
- Alternative approach using environment variables for configuration
- Demonstrates configuration flexibility for different deployment scenarios
- Includes performance tuning parameters like ZMQ high water mark

Testing

Network policy correctly allows ZMQ traffic between pods
vLLM pods start successfully with KV event publishing enabled
KV events are published on configured ZMQ ports (5557/5558)
AIBrix gateway can subscribe and receive events from vLLM pods
Both CLI and environment-based configurations work correctly

Documentation

These samples serve as documentation for users implementing KV cache event synchronization. Each file includes inline comments explaining the configuration options.

Add sample configurations for enabling KV cache event synchronization: - Network policy to allow ZMQ traffic on ports 5557-5558 - vLLM deployment with KV events enabled via CLI arguments - vLLM deployment with KV events enabled via environment variables These samples demonstrate how to configure vLLM pods to publish KV cache events via ZMQ for real-time cache state synchronization with AIBrix gateway. Signed-off-by: ZHENYU <[email protected]>

gemini-code-assist

Summary of Changes

Hello @ae86zhizhi, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've added sample configuration files to demonstrate how to enable the KV cache event synchronization feature within AIBrix, specifically for vLLM pods. These examples show how to configure vLLM to publish cache events via ZMQ, which is critical for real-time cache state management. The configurations cover both CLI argument-based and environment variable-based approaches, along with a necessary network policy to allow the ZMQ traffic.

Highlights

Network Policy for ZMQ Traffic: A new Kubernetes NetworkPolicy (allow-kv-events.yaml) has been added to permit ZMQ traffic on ports 5557-5558, which is essential for KV event communication between vLLM pods and the AIBrix gateway.
CLI-based KV Event Configuration Example: A new quickstart example (model-with-kv-events.yaml) demonstrates how to enable KV cache events in a vLLM deployment using command-line arguments, including the necessary service configuration to expose the ZMQ ports.
Environment-based KV Event Configuration Example: An alternative quickstart example (model-with-kv-events-env.yaml) provides a configuration for enabling KV cache events using environment variables, showcasing flexibility and including performance tuning parameters like the ZMQ high water mark.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This PR adds useful sample configurations for enabling KV cache event synchronization in AIBrix. The examples cover network policies, CLI-based configuration, and environment-based configuration, which is great for users. My review focuses on improving the usability and robustness of these sample files. I've suggested making the namespace handling more flexible and adding health probes for consistency and better pod lifecycle management in Kubernetes.

samples/network-policies/allow-kv-events.yaml

samples/quickstart/model-with-kv-events-env.yaml

samples/quickstart/model-with-kv-events.yaml

- Add explanatory comments for namespace field in all samples Clarifies that 'default' namespace is used for quickstart purposes and provides guidance on deploying to different namespaces - Add health probes to model-with-kv-events-env.yaml Ensures consistency with other quickstart samples and provides production-ready configuration with liveness, readiness, and startup probes These improvements address code review feedback while maintaining consistency with existing AIBrix sample conventions. Signed-off-by: ZHENYU <[email protected]>

Jeffwan · 2025-08-07T05:20:15Z

samples/network-policies/allow-kv-events.yaml

@@ -0,0 +1,24 @@
+---
+apiVersion: networking.k8s.io/v1
+kind: NetworkPolicy


did you see any issues without using NetworkPolicy?

Gateway plugins are deployed in the aibrix-system namespace, while KV event producers (model pods) are deployed in the default namespace (for quickstart) or in user-specified namespaces. The current NetworkPolicy only permits same-namespace traffic, but the actual deployment requires cross-namespace communication. I will prepare and submit a patch soon.

@ae86zhizhi Are you planning to submit a new PR or a new commit in this PR?

gemini-code-assist bot reviewed Aug 7, 2025

View reviewed changes

DwyaneShi added the area/kv-cache label Aug 7, 2025

Jeffwan reviewed Aug 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Misc] Add KV cache event synchronization configuration examples #1410

[Misc] Add KV cache event synchronization configuration examples #1410

Uh oh!

ae86zhizhi commented Aug 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jeffwan Aug 7, 2025

Uh oh!

ae86zhizhi Aug 15, 2025

Uh oh!

autopear Aug 18, 2025

Uh oh!

Uh oh!

[Misc] Add KV cache event synchronization configuration examples #1410

Are you sure you want to change the base?

[Misc] Add KV cache event synchronization configuration examples #1410

Uh oh!

Conversation

ae86zhizhi commented Aug 7, 2025

Summary

What this PR does / why we need it

Changes

Testing

Documentation

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jeffwan Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

ae86zhizhi Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

autopear Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!