Skip to content

Conversation

eero-t
Copy link
Collaborator

@eero-t eero-t commented May 16, 2025

Description

KubeAI updates:

  • Autoscale 8b Gaudi model
  • Change 70b Gaudi model to scale-from-zero, to avoid idle pods reserving 4 devices
  • Add example nodeSelectors
  • Update README / fix typos in it

Issues

n/a.

Type of change

  • New feature (non-breaking change which adds new functionality)

Dependencies

n/a.

Tests

Manual testing for scaling / changes.

@eero-t eero-t requested review from poussa and mkbhanda as code owners May 16, 2025 16:28
@eero-t
Copy link
Collaborator Author

eero-t commented May 16, 2025

@poussa minReplicas/maxReplicas are chosen so that one could run these and some additional model(s) on a single 8x Gaudi node, if not all of them are in use at the same time. IMHO this is good default for examples like this.

@eero-t eero-t requested review from Copilot and poussa and removed request for poussa and mkbhanda May 16, 2025 16:38
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves KubeAI autoscaling support by updating autoscaling configurations and providing clearer examples for node selection and model deployment. Key changes include:

  • Adding commented-out example nodeSelectors to the OPEA values file.
  • Updating the 70b Gaudi model configuration to scale from zero with explanatory comments.
  • Adjusting the 8b Gaudi model configuration and updating the README with corrected deployment instructions.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
kubeai/opea-values.yaml Added example nodeSelector blocks
kubeai/models/llama-3.3-70b-instruct-gaudi.yaml Changed autoscaling parameters and added clarifying comments
kubeai/models/llama-3.1-8b-instruct-gaudi.yaml Updated environment variable formatting and set maxReplicas
kubeai/README.md Revised documentation to reflect updated autoscaling behavior

@eero-t eero-t force-pushed the kubeai-scaling branch 2 times, most recently from 6b7f22a to 0cba77f Compare May 16, 2025 16:40
Copy link
Member

@poussa poussa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@poussa poussa requested review from marquiz and mkbhanda May 19, 2025 13:36
eero-t added 6 commits May 19, 2025 16:58
Signed-off-by: Eero Tamminen <[email protected]>
Signed-off-by: Eero Tamminen <[email protected]>
Signed-off-by: Eero Tamminen <[email protected]>
(Review request.)

Signed-off-by: Eero Tamminen <[email protected]>
@eero-t
Copy link
Collaborator Author

eero-t commented May 20, 2025

@mkbhanda OK to merge?

@eero-t
Copy link
Collaborator Author

eero-t commented May 26, 2025

@marquiz, @mkbhanda OK to merge?

Copy link
Collaborator

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @eero-t, looks good to me

@eero-t eero-t merged commit cd7d291 into opea-project:main May 27, 2025
7 checks passed
@eero-t eero-t deleted the kubeai-scaling branch May 27, 2025 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants