how can i use openllm for local lora model?

openllm 部署安装openllm pip install openllm 安装bentom

<a target="_blank" rel="noopener noreferrer" href="https://private-user-images.githubu

I don't think have loading lora supported yet, but we can add this <a class="user-ment

Can openllm support local path model? about openllm HOT 12 OPEN

dsp6414 commented on September 28, 2024

Can openllm support local path model?

from openllm.

Comments (12)

dsp6414 commented on September 28, 2024 1

openllm 部署

安装openllm
pip install openllm
安装bentoml
pip install bentoml
更新openllm repo
openllm repo update

4.创建venv 虚拟环境
python -m uv venv /home/tcx/.openllm/venv/998690274545817638

5.激活venv虚拟环境
source /home/tcx/.openllm/venv/998690274545817638/bin/activate

6.安装依赖项
python -m uv pip install -p /home/tcx/.openllm/venv/998690274545817638/bin/python -r /home/tcx/.openllm/venv/998690274545817638/requirements.txt

huggingface克隆模型仓库
https://huggingface.co/Qwen/Qwen2-0.5B-Instruct
本地存放目录
/home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
更新模型仓库参数
/home/tcx/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml/bentos/qwen2/0.5b-instruct-fp16-fcc6

src/bentofile.yaml 更新如下
conda:
channels: null
dependencies: null
environment_yml: null
pip: null
description: null
docker:
base_image: null
cuda_version: null
distro: debian
dockerfile_template: null
env:
HF_TOKEN: ''
python_version: '3.9'
setup_script: null
system_packages: null
envs:

name: HF_TOKEN
exclude: []
include:
'*.py'
ui/*
ui/chunks/*
ui/css/*
ui/media/*
ui/chunks/pages/*
bentovllm_openai/*.py
chat_templates/chat_templates/*.jinja
chat_templates/generation_configs/*.json
labels:
model_name: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
openllm_alias: 0.5b,0.5b-instruct
platforms: linux
source: https://github.com/bentoml/openllm-models-feed/tree/main/source/vllm-chat
models: []
name: null
python:
extra_index_url: null
find_links: null
index_url: null
lock_packages: true
no_index: null
pack_git_packages: true
packages: null
pip_args: null
requirements_txt: ./requirements.txt
trusted_host: null
wheels: null
service: service:VLLM

bento_constants.py 更新如下

CONSTANT_YAML = '''
engine_config:
dtype: half
max_model_len: 2048
model: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
extra_labels:
model_name: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
openllm_alias: 0.5b,0.5b-instruct
project: vllm-chat
service_config:
name: qwen2
resources:
gpu: 1
gpu_type: nvidia-rtx-3060
traffic:
timeout: 300

'''

bento.yaml 更新如下

service: service:VLLM
name: qwen2
version: 0.5b-instruct-fp16-fcc6
bentoml_version: 1.2.20
creation_time: '2024-07-12T14:16:26.873508+00:00'
labels:
model_name: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
openllm_alias: 0.5b,0.5b-instruct
platforms: linux
source: https://github.com/bentoml/openllm-models-feed/tree/main/source/vllm-chat
models: []
runners: []
entry_service: qwen2
services:

name: qwen2
service: ''
models: []
dependencies: []
config:
name: qwen2
resources:
gpu: 1
gpu_type: nvidia-rtx-3060
traffic:
timeout: 300
envs:
name: HF_TOKEN
schema:
name: qwen2
type: service
routes:
- name: chat
  route: /api/chat
  batchable: false
  input:
  properties:
  messages:
  default:
  - role: user
  content: what is the meaning of life?
  items:
  properties:
  role:
  enum:
  - system
  - user
  - assistant
  title: Role
  type: string
  content:
  title: Content
  type: string
  required:
  - role
  - content
  title: Message
  type: object
  title: Messages
  type: array
  model:
  default: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
  title: Model
  type: string
  max_tokens:
  default: 2048
  maximum: 2048
  minimum: 128
  title: Max Tokens
  type: integer
  stop:
  default: null
  title: Stop
  items:
  type: string
  type: array
  title: Input
  type: object
  output:
  title: strIODescriptor
  type: string
  is_stream: true
  media_type: text/event-stream
- name: generate
  route: /api/generate
  batchable: false
  input:
  properties:
  prompt:
  default: Explain superconductors like I'm five years old
  title: Prompt
  type: string
  model:
  default: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
  title: Model
  type: string
  max_tokens:
  default: 2048
  maximum: 2048
  minimum: 128
  title: Max Tokens
  type: integer
  stop:
  default: null
  title: Stop
  items:
  type: string
  type: array
  title: Input
  type: object
  output:
  title: strIODescriptor
  type: string
  is_stream: true
  media_type: text/event-stream
  apis: []
  docker:
  distro: debian
  python_version: '3.9'
  cuda_version: null
  env:
  HF_TOKEN: ''
  system_packages: null
  setup_script: null
  base_image: null
  dockerfile_template: null
  python:
  requirements_txt: ./requirements.txt
  packages: null
  lock_packages: true
  pack_git_packages: true
  index_url: null
  no_index: null
  trusted_host: null
  find_links: null
  extra_index_url: null
  pip_args: null
  wheels: null
  conda:
  environment_yml: null
  channels: null
  dependencies: null
  pip: null

9.启动venv虚拟环境，运行命令
进入/home/tcx/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml/bentos/qwen2/0.5b-instruct-fp16-fcc6/src目录执行命令

$ export BENTOML_HOME=/home/tcx/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml
$ source /home/tcx/.openllm/venv/998690274545817638/bin/activate
$ bentoml serve qwen2:0.5b-instruct-fp16-fcc6

或者
bentoml serve .

10 如果端口被占用，执行如下命令
netstat -tulnp | grep 3000
sudo kill -9 进程号

from openllm.

dsp6414 commented on September 28, 2024

from openllm.

bojiang commented on September 28, 2024

It seems that you have a solution step by step. Anything we can help?

from openllm.

dsp6414 commented on September 28, 2024

It seems that you have a solution step by step. Anything we can help?

I still do not know how to load a Lora fine-tuning model or where to modify the yaml file.

from openllm.

aarnphm commented on September 28, 2024

I don't think have loading lora supported yet, but we can add this @bojiang

from openllm.

bojiang commented on September 28, 2024

As for local path model, I think we can support it.

from openllm.

dsp6414 commented on September 28, 2024

thanks🌺

from openllm.

dsp6414 commented on September 28, 2024

openllm-models service.py

vllm_api_server.openai_serving_chat = OpenAIServingChat(
engine=self.engine,
served_model_names=[ENGINE_CONFIG["model"]],
response_role="assistant",
chat_template=chat_template,
model_config=model_config,
lora_modules=None,
prompt_adapters=None,
request_logger=None,
)
vllm_api_server.openai_serving_completion = OpenAIServingCompletion(
engine=self.engine,
served_model_names=[ENGINE_CONFIG["model"]],
model_config=model_config,
lora_modules=None,
prompt_adapters=None,
request_logger=None,
)

both set lora_modules=None,
how to set my lora model?

from openllm.

dsp6414 commented on September 28, 2024

🌼

from openllm.

dsp6414 commented on September 28, 2024

https://zhuanlan.zhihu.com/p/711869222

from openllm.

Can openllm support local path model? about openllm HOT 12 OPEN

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent