helm upgrade --install mistral-7b-instruct substratusai/vllm -f - << EOF
model: mistralai/Mistral-7B-Instruct-v0.1
replicaCount: 0
env:
- name: SERVED_MODEL_NAME
value: mistral-7b-instruct-v0.1 # needs to be same as lingo model name
deploymentAnnotations:
lingo.substratus.ai/models: mistral-7b-instruct-v0.1
lingo.substratus.ai/min-replicas: "0" # needs to be string
lingo.substratus.ai/max-replicas: "3" # needs to be string
EOF
Release "mistral-7b-instruct" does not exist. Installing it now.
W1118 19:41:03.659013 45860 warnings.go:70] autopilot-default-resources-mutator:Autopilot updated Deployment default/mistral-7b-instruct-vllm: adjusted resources to meet requirements for containers [vllm] (see http://g.co/gke/autopilot-resources)
Error: admission webhook "warden-validating.common-webhooks.networking.gke.io" denied the request: GKE Warden rejected the request because it violates one or more constraints.
Violations details: {"[denied by autogke-gpu-limitation]":["You must specify a GPU type with node selector 'cloud.google.com/gke-accelerator' when GPU is requested on Autopilot workloads; supported values are: [nvidia-a100-80gb, nvidia-tesla-a100, nvidia-tesla-t4]."]}