Coder Social home page Coder Social logo

Comments (12)

dsp6414 avatar dsp6414 commented on September 28, 2024 1

openllm 部署

  1. 安装openllm
    pip install openllm
  2. 安装bentoml
    pip install bentoml
  3. 更新openllm repo
    openllm repo update

4.创建venv 虚拟环境
python -m uv venv /home/tcx/.openllm/venv/998690274545817638

5.激活venv虚拟环境
source /home/tcx/.openllm/venv/998690274545817638/bin/activate

6.安装依赖项
python -m uv pip install -p /home/tcx/.openllm/venv/998690274545817638/bin/python -r /home/tcx/.openllm/venv/998690274545817638/requirements.txt

  1. huggingface克隆模型仓库
    https://huggingface.co/Qwen/Qwen2-0.5B-Instruct
    本地存放目录
    /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct

  2. 更新模型仓库参数
    /home/tcx/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml/bentos/qwen2/0.5b-instruct-fp16-fcc6

src/bentofile.yaml 更新如下
conda:
channels: null
dependencies: null
environment_yml: null
pip: null
description: null
docker:
base_image: null
cuda_version: null
distro: debian
dockerfile_template: null
env:
HF_TOKEN: ''
python_version: '3.9'
setup_script: null
system_packages: null
envs:

  • name: HF_TOKEN
    exclude: []
    include:
  • '*.py'
  • ui/*
  • ui/chunks/*
  • ui/css/*
  • ui/media/*
  • ui/chunks/pages/*
  • bentovllm_openai/*.py
  • chat_templates/chat_templates/*.jinja
  • chat_templates/generation_configs/*.json
    labels:
    model_name: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
    openllm_alias: 0.5b,0.5b-instruct
    platforms: linux
    source: https://github.com/bentoml/openllm-models-feed/tree/main/source/vllm-chat
    models: []
    name: null
    python:
    extra_index_url: null
    find_links: null
    index_url: null
    lock_packages: true
    no_index: null
    pack_git_packages: true
    packages: null
    pip_args: null
    requirements_txt: ./requirements.txt
    trusted_host: null
    wheels: null
    service: service:VLLM

bento_constants.py 更新如下

CONSTANT_YAML = '''
engine_config:
dtype: half
max_model_len: 2048
model: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
extra_labels:
model_name: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
openllm_alias: 0.5b,0.5b-instruct
project: vllm-chat
service_config:
name: qwen2
resources:
gpu: 1
gpu_type: nvidia-rtx-3060
traffic:
timeout: 300

'''

bento.yaml 更新如下

service: service:VLLM
name: qwen2
version: 0.5b-instruct-fp16-fcc6
bentoml_version: 1.2.20
creation_time: '2024-07-12T14:16:26.873508+00:00'
labels:
model_name: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
openllm_alias: 0.5b,0.5b-instruct
platforms: linux
source: https://github.com/bentoml/openllm-models-feed/tree/main/source/vllm-chat
models: []
runners: []
entry_service: qwen2
services:

  • name: qwen2
    service: ''
    models: []
    dependencies: []
    config:
    name: qwen2
    resources:
    gpu: 1
    gpu_type: nvidia-rtx-3060
    traffic:
    timeout: 300
    envs:
  • name: HF_TOKEN
    schema:
    name: qwen2
    type: service
    routes:
    • name: chat
      route: /api/chat
      batchable: false
      input:
      properties:
      messages:
      default:
      - role: user
      content: what is the meaning of life?
      items:
      properties:
      role:
      enum:
      - system
      - user
      - assistant
      title: Role
      type: string
      content:
      title: Content
      type: string
      required:
      - role
      - content
      title: Message
      type: object
      title: Messages
      type: array
      model:
      default: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
      title: Model
      type: string
      max_tokens:
      default: 2048
      maximum: 2048
      minimum: 128
      title: Max Tokens
      type: integer
      stop:
      default: null
      title: Stop
      items:
      type: string
      type: array
      title: Input
      type: object
      output:
      title: strIODescriptor
      type: string
      is_stream: true
      media_type: text/event-stream
    • name: generate
      route: /api/generate
      batchable: false
      input:
      properties:
      prompt:
      default: Explain superconductors like I'm five years old
      title: Prompt
      type: string
      model:
      default: /home/tcx/bentoml/models/Qwen/Qwen2-0.5B-Instruct
      title: Model
      type: string
      max_tokens:
      default: 2048
      maximum: 2048
      minimum: 128
      title: Max Tokens
      type: integer
      stop:
      default: null
      title: Stop
      items:
      type: string
      type: array
      title: Input
      type: object
      output:
      title: strIODescriptor
      type: string
      is_stream: true
      media_type: text/event-stream
      apis: []
      docker:
      distro: debian
      python_version: '3.9'
      cuda_version: null
      env:
      HF_TOKEN: ''
      system_packages: null
      setup_script: null
      base_image: null
      dockerfile_template: null
      python:
      requirements_txt: ./requirements.txt
      packages: null
      lock_packages: true
      pack_git_packages: true
      index_url: null
      no_index: null
      trusted_host: null
      find_links: null
      extra_index_url: null
      pip_args: null
      wheels: null
      conda:
      environment_yml: null
      channels: null
      dependencies: null
      pip: null

9.启动venv虚拟环境,运行命令
进入/home/tcx/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml/bentos/qwen2/0.5b-instruct-fp16-fcc6/src目录执行命令

$ export BENTOML_HOME=/home/tcx/.openllm/repos/github.com/bentoml/openllm-models/main/bentoml
$ source /home/tcx/.openllm/venv/998690274545817638/bin/activate
$ bentoml serve qwen2:0.5b-instruct-fp16-fcc6

或者
bentoml serve .

10 如果端口被占用,执行如下命令
netstat -tulnp | grep 3000
sudo kill -9 进程号

from openllm.

dsp6414 avatar dsp6414 commented on September 28, 2024

1

from openllm.

bojiang avatar bojiang commented on September 28, 2024

It seems that you have a solution step by step. Anything we can help?

from openllm.

dsp6414 avatar dsp6414 commented on September 28, 2024

It seems that you have a solution step by step. Anything we can help?

I still do not know how to load a Lora fine-tuning model or where to modify the yaml file.

from openllm.

aarnphm avatar aarnphm commented on September 28, 2024

I don't think have loading lora supported yet, but we can add this @bojiang

from openllm.

bojiang avatar bojiang commented on September 28, 2024

As for local path model, I think we can support it.

from openllm.

dsp6414 avatar dsp6414 commented on September 28, 2024

thanks🌺

from openllm.

dsp6414 avatar dsp6414 commented on September 28, 2024

openllm-models service.py

vllm_api_server.openai_serving_chat = OpenAIServingChat(
engine=self.engine,
served_model_names=[ENGINE_CONFIG["model"]],
response_role="assistant",
chat_template=chat_template,
model_config=model_config,
lora_modules=None,
prompt_adapters=None,
request_logger=None,
)
vllm_api_server.openai_serving_completion = OpenAIServingCompletion(
engine=self.engine,
served_model_names=[ENGINE_CONFIG["model"]],
model_config=model_config,
lora_modules=None,
prompt_adapters=None,
request_logger=None,
)

both set lora_modules=None,
how to set my lora model?

from openllm.

dsp6414 avatar dsp6414 commented on September 28, 2024

🌼

from openllm.

dsp6414 avatar dsp6414 commented on September 28, 2024

https://zhuanlan.zhihu.com/p/711869222

from openllm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.