Skip to content

Requests could be bigger than limits #123

@kerthcet

Description

@kerthcet

What happened:

Once I set the Playground like:

apiVersion: inference.llmaz.io/v1alpha1
kind: Playground
metadata:
  name: llamacpp-speculator
spec:
  replicas: 1
  multiModelsClaim:
    modelNames:
      - name: llama2-7b-q8-gguf # the target model, should be the first one
        role: main
      - llama2-7b-q2-k-gguf  # the draft model
        role: draft
  backendConfig:
    name: llamacpp
    args:
      - -fa # use flash attention
    resources:
      requests:
        cpu: 4
        memory: "8Gi"

I could got a Service with

spec:
  multiModelsClaim:
    inferenceMode: SpeculativeDecoding
    modelNames:
    - llama2-7b-q8-gguf
    - llama2-7b-q2-k-gguf
  workloadTemplate:
    leaderWorkerTemplate:
      restartPolicy: Default
      size: 1
      workerTemplate:
        metadata: {}
        spec:
          containers:
          - args:
            - -fa
            command:
            - ./llama-server
            image: ghcr.io/ggerganov/llama.cpp:server
            name: model-runner
            ports:
            - containerPort: 8080
              name: http
              protocol: TCP
            resources:
              limits:
                cpu: "2"
                memory: 4Gi
              requests:
                cpu: "4"
                memory: 8Gi

Requests are greater than limits, this is absolutely not allowed.

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • LWS version:
  • llmaz version (use git describe --tags --dirty --always):
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugCategorizes issue or PR as related to a bug.help wantedExtra attention is neededneeds-kindIndicates a PR lacks a label and requires one.needs-priorityIndicates a PR lacks a label and requires one.needs-triageIndicates an issue or PR lacks a label and requires one.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions