Comments (5)
Hi, this issue actually contains two parts:
- A bug that is caused by using low_bit fp16 in
ipex_llm_worker
. - Feature request: support ipex_llm_worker with speculative decoding.
The first part have been fixed by this pr #10907.
The second part will be supported by @hzjane.
from bigdl.
Just to provide a bit more information @gc-fu - here, it's providing the torch_dtype value as "auto", but in the fp16 example with self-speculative decoding, it's showing that torch_dtype should be set to torch.float16. Also there are other parameters in the example that aren't being provided when launching via the ipex_llm_worker - specifically "speculative" and "optimize_model" - this is why I marked this as a feature request and not a bug, I thought this mode just isn't supported yet for the ipex_llm_worker module (would be nice if it was though).
from bigdl.
@brosenfi
The self-speculative decoding using fastchat worker will be supported in this PR.
But the speculative example only supports running on intel max GPU due to the memory usage limitations. You can try it on max GPU or CPU later.
from bigdl.
Thank you @gc-fu
from bigdl.
Hi, I am working on to reproduce this issue.
from bigdl.
Related Issues (20)
- phi-3-mini support HOT 1
- IndexError: list index out of range when ipex_fp16_gpu test_api is used in all-in-one HOT 2
- Fastchat serving embeddings? HOT 4
- unable to run inference in linux environment HOT 9
- Performance drop for neural-chat 7b with new repo of ipex-llm(2.5.0b20240425) vllm serving. HOT 18
- 2nd latency of llama3-8B-Instruct with int4 & all-in-one tool issue HOT 1
- Unable to invoke the torch installed via the setup tutorial. HOT 2
- can not find gpu with linux system HOT 4
- MTL 165H ubuntu22.04 can't benchmark qwen/Qwen-7B-Chat HOT 1
- Docker image (intelanalytics/ipex-llm-xpu): Documentation stated I would need to disable iGPU to use A770. When will you fix this issue since disabling iGPU is problematic? HOT 6
- IPEX-LLM on Intel Max Series 1100 for inference libintel-ext-pt-gpu.so: undefined symbol: _ZNK5torch8autograd4Node4nameB5cxx11Ev HOT 7
- Speech T5 on XPU on Intel Arc GPU 770 taking 8 seconds and for CPU it takes 3 seconds ?? HOT 2
- Phi-3 model performance on MeteorLake GPU HOT 2
- Main Memory continued decline with ipex-llm for local LLM inference on Intel Arc GPU. HOT 1
- stable version release requirement for arc GPU
- Crash when using llama.dll HOT 1
- ipex-llm Llama.cpp port inside ipex-llm Docker containers getting SIGBUS HOT 4
- Not able to profile LLAMA2 on iGFX (windows) HOT 1
- failed to run piqa test with sym_int4 precison by harness HOT 6
- [bug] LLAMA3-8Bčžĺşé误 HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
đ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. đđđ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google â¤ď¸ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bigdl.