-
-
Notifications
You must be signed in to change notification settings - Fork 43
Closed
Labels
bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.help wantedExtra attention is neededExtra attention is neededneeds-kindIndicates a PR lacks a label and requires one.Indicates a PR lacks a label and requires one.needs-priorityIndicates a PR lacks a label and requires one.Indicates a PR lacks a label and requires one.needs-triageIndicates an issue or PR lacks a label and requires one.Indicates an issue or PR lacks a label and requires one.
Milestone
Description
What happened:
Once I set the Playground like:
apiVersion: inference.llmaz.io/v1alpha1
kind: Playground
metadata:
name: llamacpp-speculator
spec:
replicas: 1
multiModelsClaim:
modelNames:
- name: llama2-7b-q8-gguf # the target model, should be the first one
role: main
- llama2-7b-q2-k-gguf # the draft model
role: draft
backendConfig:
name: llamacpp
args:
- -fa # use flash attention
resources:
requests:
cpu: 4
memory: "8Gi"
I could got a Service with
spec:
multiModelsClaim:
inferenceMode: SpeculativeDecoding
modelNames:
- llama2-7b-q8-gguf
- llama2-7b-q2-k-gguf
workloadTemplate:
leaderWorkerTemplate:
restartPolicy: Default
size: 1
workerTemplate:
metadata: {}
spec:
containers:
- args:
- -fa
command:
- ./llama-server
image: ghcr.io/ggerganov/llama.cpp:server
name: model-runner
ports:
- containerPort: 8080
name: http
protocol: TCP
resources:
limits:
cpu: "2"
memory: 4Gi
requests:
cpu: "4"
memory: 8Gi
Requests are greater than limits, this is absolutely not allowed.
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
): - LWS version:
- llmaz version (use
git describe --tags --dirty --always
): - Cloud provider or hardware configuration:
- OS (e.g:
cat /etc/os-release
): - Kernel (e.g.
uname -a
): - Install tools:
- Others:
Metadata
Metadata
Assignees
Labels
bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.help wantedExtra attention is neededExtra attention is neededneeds-kindIndicates a PR lacks a label and requires one.Indicates a PR lacks a label and requires one.needs-priorityIndicates a PR lacks a label and requires one.Indicates a PR lacks a label and requires one.needs-triageIndicates an issue or PR lacks a label and requires one.Indicates an issue or PR lacks a label and requires one.