Coder Social home page Coder Social logo

Comments (5)

YinSonglin1997 avatar YinSonglin1997 commented on July 21, 2024

补充一下,我和楼主同样的问题,我把QAEnsemble.log贴出来。
I0119 02:05:18.197207 86 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f9e5c000000' with size 268435456
I0119 02:05:18.201188 86 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0119 02:05:18.208520 86 model_lifecycle.cc:462] loading: rerank:1
I0119 02:05:18.208561 86 model_lifecycle.cc:462] loading: embed:1
I0119 02:05:18.208588 86 model_lifecycle.cc:462] loading: base:1
I0119 02:05:18.211636 86 onnxruntime.cc:2504] TRITONBACKEND_Initialize: onnxruntime
I0119 02:05:18.211702 86 onnxruntime.cc:2514] Triton TRITONBACKEND API version: 1.12
I0119 02:05:18.211721 86 onnxruntime.cc:2520] 'onnxruntime' TRITONBACKEND API version: 1.12
I0119 02:05:18.211736 86 onnxruntime.cc:2550] backend configuration:
{"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}
I0119 02:05:18.277019 86 onnxruntime.cc:2608] TRITONBACKEND_ModelInitialize: rerank (version 1)
I0119 02:05:18.277589 86 onnxruntime.cc:2608] TRITONBACKEND_ModelInitialize: embed (version 1)
I0119 02:05:18.277767 86 onnxruntime.cc:666] skipping model configuration auto-complete for 'rerank': inputs and outputs already specified
I0119 02:05:18.278371 86 onnxruntime.cc:2651] TRITONBACKEND_ModelInstanceInitialize: rerank (GPU device 0)
I0119 02:05:18.278735 86 onnxruntime.cc:666] skipping model configuration auto-complete for 'embed': inputs and outputs already specified
I0119 02:05:18.280363 86 onnxruntime.cc:2651] TRITONBACKEND_ModelInstanceInitialize: embed (GPU device 0)
I0119 02:05:18.758885 86 libfastertransformer.cc:459] Before Loading Weights:
terminate called after throwing an instance of 'std::length_error'
what(): basic_string::_M_create
[d46a4f8365f8:00086] *** Process received signal ***
[d46a4f8365f8:00086] Signal: Aborted (6)
[d46a4f8365f8:00086] Signal code: (-6)
[d46a4f8365f8:00086] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f9eab095520]
[d46a4f8365f8:00086] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f9eab0e99fc]
[d46a4f8365f8:00086] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f9eab095476]
[d46a4f8365f8:00086] [ 3] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f9eab07b7f3]
[d46a4f8365f8:00086] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2b9e)[0x7f9eab31db9e]
[d46a4f8365f8:00086] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c)[0x7f9eab32920c]
[d46a4f8365f8:00086] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277)[0x7f9eab329277]
[d46a4f8365f8:00086] [ 7] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8)[0x7f9eab3294d8]
[d46a4f8365f8:00086] [ 8] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZSt20__throw_length_errorPKc+0x40)[0x7f9eab320449]
[d46a4f8365f8:00086] [ 9] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x14bc69)[0x7f9eab3c6c69]
[d46a4f8365f8:00086] [10] /opt/tritonserver/backends/qa_ensemble/libqa-shared.so(+0xa6ba3c)[0x7f9e1dbf2a3c]
[d46a4f8365f8:00086] [11] /opt/tritonserver/backends/qa_ensemble/libqa-shared.so(_ZN17fastertransformer21loadWeightFromBinFuncI6__halfS1_EEiPT_St6vectorImSaImEENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x187)[0x7f9e1dc0b227]
[d46a4f8365f8:00086] [12] /opt/tritonserver/backends/qa_ensemble/libqa-shared.so(_ZN17fastertransformer17loadWeightFromBinI6__halfEEiPT_St6vectorImSaImEENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_14FtCudaDataTypeE+0x282)[0x7f9e1dc0ed12]
[d46a4f8365f8:00086] [13] /opt/tritonserver/backends/qa_ensemble/libqa-shared.so(_ZN17fastertransformer11LlamaWeightI6__halfE16loadEncryptModelENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x184)[0x7f9e1d7cb0b4]
[d46a4f8365f8:00086] [14] /opt/tritonserver/backends/qa_ensemble/libqa-shared.so(_ZN16LlamaTritonModelI6__halfE19createSharedWeightsEii+0x2ad)[0x7f9e1d7b219d]
[d46a4f8365f8:00086] [15] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253)[0x7f9eab357253]
[d46a4f8365f8:00086] [16] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f9eab0e7ac3]
[d46a4f8365f8:00086] [17] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126660)[0x7f9eab179660]
[d46a4f8365f8:00086] *** End of error message ***

from qanything.

yydxlv avatar yydxlv commented on July 21, 2024

Triton服务同样显示启动失败,进入容器内检查/model_repos/QAEnsemble_base/QAEnsemble_base.log 发现:nohup: failed to run command '/opt/tritonserver/bin/tritonserver': No such file or directory

from qanything.

xixihahaliu avatar xixihahaliu commented on July 21, 2024

Triton服务同样显示启动失败,进入容器内检查/model_repos/QAEnsemble_base/QAEnsemble_base.log 发现:nohup: failed to run command '/opt/tritonserver/bin/tritonserver': No such file or directory

可以贴出完整的log文件吗?方便排查,另外可以看下FAQ_zh.md,可能存在帮助

from qanything.

xixihahaliu avatar xixihahaliu commented on July 21, 2024

补充一下,我和楼主同样的问题,我把QAEnsemble.log贴出来。 I0119 02:05:18.197207 86 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f9e5c000000' with size 268435456 I0119 02:05:18.201188 86 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864 I0119 02:05:18.208520 86 model_lifecycle.cc:462] loading: rerank:1 I0119 02:05:18.208561 86 model_lifecycle.cc:462] loading: embed:1 I0119 02:05:18.208588 86 model_lifecycle.cc:462] loading: base:1 I0119 02:05:18.211636 86 onnxruntime.cc:2504] TRITONBACKEND_Initialize: onnxruntime I0119 02:05:18.211702 86 onnxruntime.cc:2514] Triton TRITONBACKEND API version: 1.12 I0119 02:05:18.211721 86 onnxruntime.cc:2520] 'onnxruntime' TRITONBACKEND API version: 1.12 I0119 02:05:18.211736 86 onnxruntime.cc:2550] backend configuration: {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} I0119 02:05:18.277019 86 onnxruntime.cc:2608] TRITONBACKEND_ModelInitialize: rerank (version 1) I0119 02:05:18.277589 86 onnxruntime.cc:2608] TRITONBACKEND_ModelInitialize: embed (version 1) I0119 02:05:18.277767 86 onnxruntime.cc:666] skipping model configuration auto-complete for 'rerank': inputs and outputs already specified I0119 02:05:18.278371 86 onnxruntime.cc:2651] TRITONBACKEND_ModelInstanceInitialize: rerank (GPU device 0) I0119 02:05:18.278735 86 onnxruntime.cc:666] skipping model configuration auto-complete for 'embed': inputs and outputs already specified I0119 02:05:18.280363 86 onnxruntime.cc:2651] TRITONBACKEND_ModelInstanceInitialize: embed (GPU device 0) I0119 02:05:18.758885 86 libfastertransformer.cc:459] Before Loading Weights: terminate called after throwing an instance of 'std::length_error' what(): basic_string::_M_create [d46a4f8365f8:00086] *** Process received signal *** [d46a4f8365f8:00086] Signal: Aborted (6) [d46a4f8365f8:00086] Signal code: (-6) [d46a4f8365f8:00086] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f9eab095520] [d46a4f8365f8:00086] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f9eab0e99fc] [d46a4f8365f8:00086] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f9eab095476] [d46a4f8365f8:00086] [ 3] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f9eab07b7f3] [d46a4f8365f8:00086] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2b9e)[0x7f9eab31db9e] [d46a4f8365f8:00086] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c)[0x7f9eab32920c] [d46a4f8365f8:00086] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277)[0x7f9eab329277] [d46a4f8365f8:00086] [ 7] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8)[0x7f9eab3294d8] [d46a4f8365f8:00086] [ 8] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZSt20__throw_length_errorPKc+0x40)[0x7f9eab320449] [d46a4f8365f8:00086] [ 9] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x14bc69)[0x7f9eab3c6c69] [d46a4f8365f8:00086] [10] /opt/tritonserver/backends/qa_ensemble/libqa-shared.so(+0xa6ba3c)[0x7f9e1dbf2a3c] [d46a4f8365f8:00086] [11] /opt/tritonserver/backends/qa_ensemble/libqa-shared.so(_ZN17fastertransformer21loadWeightFromBinFuncI6__halfS1_EEiPT_St6vectorImSaImEENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x187)[0x7f9e1dc0b227] [d46a4f8365f8:00086] [12] /opt/tritonserver/backends/qa_ensemble/libqa-shared.so(_ZN17fastertransformer17loadWeightFromBinI6__halfEEiPT_St6vectorImSaImEENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_14FtCudaDataTypeE+0x282)[0x7f9e1dc0ed12] [d46a4f8365f8:00086] [13] /opt/tritonserver/backends/qa_ensemble/libqa-shared.so(_ZN17fastertransformer11LlamaWeightI6__halfE16loadEncryptModelENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x184)[0x7f9e1d7cb0b4] [d46a4f8365f8:00086] [14] /opt/tritonserver/backends/qa_ensemble/libqa-shared.so(_ZN16LlamaTritonModelI6__halfE19createSharedWeightsEii+0x2ad)[0x7f9e1d7b219d] [d46a4f8365f8:00086] [15] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253)[0x7f9eab357253] [d46a4f8365f8:00086] [16] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f9eab0e7ac3] [d46a4f8365f8:00086] [17] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126660)[0x7f9eab179660] [d46a4f8365f8:00086] *** End of error message ***

  • 原因2:如果发现显存够用,那是因为新版模型与部分显卡型号不兼容。
  • 解决方案:请更换为兼容模型和镜像,手动下载模型文件解压并替换models目录,然后重启服务即可。
    • 将docker-compose-xxx.yaml中的freeren/qanyxxx:v1.0.9改为freeren/qanyxxx:v1.0.8
    • git clone https://www.wisemodel.cn/Netease_Youdao/qanything.git
    • cd qanything
    • git reset --hard 79b3da3bbb35406f0b2da3acfcdb4c96c2837faf
    • unzip models.zip
    • 替换掉现有的models目录

可以尝试上述解决方案,另外部分显卡型号不支持当前模型,请提前确认,在显存足够的前提下,目前已确认支持的显卡包括Nvidia 2080Ti,30系,40系,A30,A40,A100

from qanything.

xixihahaliu avatar xixihahaliu commented on July 21, 2024

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

从日志来看所有的服务均启动成功,但curl -s -w "%{http_code}" http://localhost:10000/v2/health/ready -o /dev/null) 检测一直不通过。超时后容器停止后也没有/model_repos/QAEnsemble_base/QAEnsemble_base.log 这个日志文件。

iShot_2024-01-19_09 30 13

期望行为 | Expected Behavior

No response

运行环境 | Environment

- OS: ubuntu 22.04 x86
- NVIDIA Driver: 535.146.02
- CUDA:12.2
- Docker Compose:v2.24.0-birthday.10
- NVIDIA GPU Memory:16GB

QAnything日志 | QAnything logs

root@f1376869a3c5:/workspace/qanything_local# cat api.log UPLOAD_ROOT_PATH: /workspace/qanything_local/QANY_DB/content rerank_port: 10001 embed_port: 10001 [2024-01-19 09:56:17 +0800] [91] [INFO] Sanic v23.6.0 [2024-01-19 09:56:17 +0800] [91] [INFO] Goin' Fast @ http://0.0.0.0:8777 [2024-01-19 09:56:17 +0800] [91] [INFO] mode: production, w/ 4 workers [2024-01-19 09:56:17 +0800] [91] [INFO] server: sanic, HTTP/1.1 [2024-01-19 09:56:17 +0800] [91] [INFO] python: 3.10.12 [2024-01-19 09:56:17 +0800] [91] [INFO] platform: Linux-6.5.0-14-generic-x86_64-with-glibc2.35 [2024-01-19 09:56:17 +0800] [91] [INFO] packages: sanic-routing==23.12.0, sanic-ext==23.6.0 UPLOAD_ROOT_PATH: /workspace/qanything_local/QANY_DB/content rerank_port: 10001 embed_port: 10001 [2024-01-19 09:56:27 +0800] [658] [INFO] Sanic Extensions: [2024-01-19 09:56:27 +0800] [658] [INFO] > injection [0 dependencies; 0 constants] [2024-01-19 09:56:27 +0800] [658] [INFO] > openapi [http://0.0.0.0:8777/docs] [2024-01-19 09:56:27 +0800] [658] [INFO] > http [2024-01-19 09:56:27 +0800] [658] [INFO] > templating [jinja2==3.1.3] UPLOAD_ROOT_PATH: /workspace/qanything_local/QANY_DB/content rerank_port: 10001 embed_port: 10001 [2024-01-19 09:56:27 +0800] [657] [INFO] Sanic Extensions: [2024-01-19 09:56:27 +0800] [657] [INFO] > injection [0 dependencies; 0 constants] [2024-01-19 09:56:27 +0800] [657] [INFO] > openapi [http://0.0.0.0:8777/docs] [2024-01-19 09:56:27 +0800] [657] [INFO] > http [2024-01-19 09:56:27 +0800] [657] [INFO] > templating [jinja2==3.1.3] UPLOAD_ROOT_PATH: /workspace/qanything_local/QANY_DB/content rerank_port: 10001 embed_port: 10001 [2024-01-19 09:56:27 +0800] [659] [INFO] Sanic Extensions: [2024-01-19 09:56:27 +0800] [659] [INFO] > injection [0 dependencies; 0 constants] [2024-01-19 09:56:27 +0800] [659] [INFO] > openapi [http://0.0.0.0:8777/docs] [2024-01-19 09:56:27 +0800] [659] [INFO] > http [2024-01-19 09:56:27 +0800] [659] [INFO] > templating [jinja2==3.1.3] init local_doc_qa in local init local_doc_qa in local UPLOAD_ROOT_PATH: /workspace/qanything_local/QANY_DB/content rerank_port: 10001 embed_port: 10001 [2024-01-19 09:56:27 +0800] [660] [INFO] Sanic Extensions: [2024-01-19 09:56:27 +0800] [660] [INFO] > injection [0 dependencies; 0 constants] [2024-01-19 09:56:27 +0800] [660] [INFO] > openapi [http://0.0.0.0:8777/docs] [2024-01-19 09:56:27 +0800] [660] [INFO] > http [2024-01-19 09:56:27 +0800] [660] [INFO] > templating [jinja2==3.1.3] init local_doc_qa in local init local_doc_qa in local [2024-01-19 09:56:27 +0800] [658] [INFO] Starting worker [658] [2024-01-19 09:56:27 +0800] [657] [INFO] Starting worker [657] [2024-01-19 09:56:27 +0800] [659] [INFO] Starting worker [659] [2024-01-19 09:56:27 +0800] [660] [INFO] Starting worker [660]

复现方法 | Steps To Reproduce

No response

备注 | Anything else?

No response

目前单卡启动和双卡启动的日志文件位置不同,因为单卡启动多个tritonserver服务会同时启动,节省显存,目前看你应该是单卡启动的,请贴出/model_repos/QAEnsemble/QAEnsemble.log的详细内容,这里应该会有更多信息

from qanything.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.