Comments (12)
openllm 部署
- 安装openllm
pip install openllm - 安装bentoml
pip install bentoml - 更新openllm repo
openllm repo update
4.创建venv 虚拟环境
python -m uv venv /home/tcx/.openllm/venv/998690274545817638
5.激活venv虚拟环境
source /home/tcx/.openllm/venv/998690274545817638/bin/activate
6.安装依赖项
python -m uv pip install -p /home/tcx/.openllm/venv/998690274545817638/bin/python -r /home/tcx/.openllm/venv/998690274545817638/requirements.txt
-
huggingface克隆模型仓库
https://huggingface.co/Qwen/Qwen2-0.5B-Instruct
本地存放目录
/home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct -
更新模型仓库参数
/home/tcx/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml/bentos/qwen2/0.5b-instruct-fp16-fcc6
src/bentofile.yaml 更新如下
conda:
channels: null
dependencies: null
environment_yml: null
pip: null
description: null
docker:
base_image: null
cuda_version: null
distro: debian
dockerfile_template: null
env:
HF_TOKEN: ''
python_version: '3.9'
setup_script: null
system_packages: null
envs:
- name: HF_TOKEN
exclude: []
include: - '*.py'
- ui/*
- ui/chunks/*
- ui/css/*
- ui/media/*
- ui/chunks/pages/*
- bentovllm_openai/*.py
- chat_templates/chat_templates/*.jinja
- chat_templates/generation_configs/*.json
labels:
model_name: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
openllm_alias: 0.5b,0.5b-instruct
platforms: linux
source: https://github.com/bentoml/openllm-models-feed/tree/main/source/vllm-chat
models: []
name: null
python:
extra_index_url: null
find_links: null
index_url: null
lock_packages: true
no_index: null
pack_git_packages: true
packages: null
pip_args: null
requirements_txt: ./requirements.txt
trusted_host: null
wheels: null
service: service:VLLM
bento_constants.py 更新如下
CONSTANT_YAML = '''
engine_config:
dtype: half
max_model_len: 2048
model: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
extra_labels:
model_name: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
openllm_alias: 0.5b,0.5b-instruct
project: vllm-chat
service_config:
name: qwen2
resources:
gpu: 1
gpu_type: nvidia-rtx-3060
traffic:
timeout: 300
'''
bento.yaml 更新如下
service: service:VLLM
name: qwen2
version: 0.5b-instruct-fp16-fcc6
bentoml_version: 1.2.20
creation_time: '2024-07-12T14:16:26.873508+00:00'
labels:
model_name: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
openllm_alias: 0.5b,0.5b-instruct
platforms: linux
source: https://github.com/bentoml/openllm-models-feed/tree/main/source/vllm-chat
models: []
runners: []
entry_service: qwen2
services:
- name: qwen2
service: ''
models: []
dependencies: []
config:
name: qwen2
resources:
gpu: 1
gpu_type: nvidia-rtx-3060
traffic:
timeout: 300
envs: - name: HF_TOKEN
schema:
name: qwen2
type: service
routes:- name: chat
route: /api/chat
batchable: false
input:
properties:
messages:
default:
- role: user
content: what is the meaning of life?
items:
properties:
role:
enum:
- system
- user
- assistant
title: Role
type: string
content:
title: Content
type: string
required:
- role
- content
title: Message
type: object
title: Messages
type: array
model:
default: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
title: Model
type: string
max_tokens:
default: 2048
maximum: 2048
minimum: 128
title: Max Tokens
type: integer
stop:
default: null
title: Stop
items:
type: string
type: array
title: Input
type: object
output:
title: strIODescriptor
type: string
is_stream: true
media_type: text/event-stream - name: generate
route: /api/generate
batchable: false
input:
properties:
prompt:
default: Explain superconductors like I'm five years old
title: Prompt
type: string
model:
default: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
title: Model
type: string
max_tokens:
default: 2048
maximum: 2048
minimum: 128
title: Max Tokens
type: integer
stop:
default: null
title: Stop
items:
type: string
type: array
title: Input
type: object
output:
title: strIODescriptor
type: string
is_stream: true
media_type: text/event-stream
apis: []
docker:
distro: debian
python_version: '3.9'
cuda_version: null
env:
HF_TOKEN: ''
system_packages: null
setup_script: null
base_image: null
dockerfile_template: null
python:
requirements_txt: ./requirements.txt
packages: null
lock_packages: true
pack_git_packages: true
index_url: null
no_index: null
trusted_host: null
find_links: null
extra_index_url: null
pip_args: null
wheels: null
conda:
environment_yml: null
channels: null
dependencies: null
pip: null
- name: chat
9.启动venv虚拟环境,运行命令
进入/home/tcx/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml/bentos/qwen2/0.5b-instruct-fp16-fcc6/src目录执行命令
$ export BENTOML_HOME=/home/tcx/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml
$ source /home/tcx/.openllm/venv/998690274545817638/bin/activate
$ bentoml serve qwen2:0.5b-instruct-fp16-fcc6
或者
bentoml serve .
10 如果端口被占用,执行如下命令
netstat -tulnp | grep 3000
sudo kill -9 进程号
from openllm.
from openllm.
It seems that you have a solution step by step. Anything we can help?
from openllm.
It seems that you have a solution step by step. Anything we can help?
I still do not know how to load a Lora fine-tuning model or where to modify the yaml file.
from openllm.
I don't think have loading lora supported yet, but we can add this @bojiang
from openllm.
As for local path model, I think we can support it.
from openllm.
thanks🌺
from openllm.
openllm-models service.py
vllm_api_server.openai_serving_chat = OpenAIServingChat(
engine=self.engine,
served_model_names=[ENGINE_CONFIG["model"]],
response_role="assistant",
chat_template=chat_template,
model_config=model_config,
lora_modules=None,
prompt_adapters=None,
request_logger=None,
)
vllm_api_server.openai_serving_completion = OpenAIServingCompletion(
engine=self.engine,
served_model_names=[ENGINE_CONFIG["model"]],
model_config=model_config,
lora_modules=None,
prompt_adapters=None,
request_logger=None,
)
both set lora_modules=None,
how to set my lora model?
from openllm.
🌼
from openllm.
https://zhuanlan.zhihu.com/p/711869222
from openllm.
Related Issues (20)
- feat: support enforce_eager option from cli HOT 1
- bug: Cannot Run an OpenLLM server regardless of where I try to get it from or what model I use HOT 6
- bug: Attempting to invoke OpenLLM from Langchain results in error HOT 2
- Availability of the OpenAI /v1/completions API Endpoint ? HOT 3
- feat: Include starcoder2
- How to deploy a model using a single machine multi card approach? HOT 1
- Documentation HOT 1
- feat: add gemma2 HOT 1
- bug: SSL issues HOT 3
- Struggling to launch OpenLLM with error "No module named bentoml HOT 2
- Fresh install complains about vllm-flash-attn==2.5.9.post1 dependency being unsatisfiable HOT 6
- bug: `TypeError: Argument() missing 1 required positional argument: 'default'` HOT 2
- bug: `openllm run phi3:3.8b-ggml-q4` build fails to find FOUNDATION_LIBRARY HOT 2
- bug: on running openllm run <model-name>, Because vllm-flash-attn==2.5.9.post1 has no wheels with a matching Python ABI tag and you require vllm-flash-attn==2.5.9.post1, we can conclude that your requirements are unsatisfiable. HOT 1
- bug: Pydantic feature deprecated, need downgrade on latest pip version HOT 1
- bug: error coming up while install the vllm using pip install "openllm[vllm]" HOT 1
- For AMD/GPU, how to use multi GPUS in the api_server.py HOT 2
- bug: pip package version ssues
- feat: Multimodal LLMs? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openllm.