Comments (4)
Hi Team,
Read through the paper as well, great work.
- I have a doubt, if have enough space in VRAM to load the model, will these optimizations helps?
- How much data % of train data is suggested for DejaVu 'Predictors' finding?
- How to obtain predictors from custom-trained models, should we again do inference using DejaVu or any other alternate method?
Thanks
from powerinfer.
Hi Team,
Read through the paper as well, great work.
- I have a doubt, if have enough space in VRAM to load the model, will these optimizations helps?
- How much data % of train data is suggested for DejaVu 'Predictors' finding?
- How to obtain predictors from custom-trained models, should we again do inference using DejaVu or any other alternate method?
Thanks
Hello, thank you for your interest.
- Yes, when we have enough space in VRAM, we will fall back to Deja Vu.But currently, our code has not been optimized for complete offloading, and we will support this feature.
- Actually I use 1M data point for predictor training.
- For training predictors, we will open source a tool. At present, you can refer to the implementation of predictor training in Dejavu.
from powerinfer.
I have a fine tuned vicuna 7B model, i tried to convert into PowerInfer with 'LLaMA(ReLU)-2-7B' predictor, but the inference is not right? Is this because of a different predictor used rather than that of fine-tuned model one? How to obtain these weights?
I Todo section i see 'Release core code of PowerInfer, supporting Llama-2, Falcon-40B.' is marked as done.
Can we use PowerInfer for fine-tuned vicuna/ llama models?
Thanks
Prerequisites
Before submitting your question, please ensure the following:
- I am running the latest version of PowerInfer. Development is rapid, and as of now, there are no tagged versions.
- I have carefully read and followed the instructions in the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
Question Details
Please provide a clear and concise description of your question. If applicable, include steps to reproduce the issue or behaviors you've observed.
Additional Context
Please provide any additional information that may be relevant to your question, such as specific system configurations, environment details, or any other context that could be helpful in addressing your inquiry.
Thank you for your interest. First, for now, we just support ReLU-based model. And every model has its own predictor.
For now we do not support fine-tuned vicuna/ llama models because they are not ReLU-based models. By the way, we will release mistral-based model in the future. And we will SFT and DPO finetune this model.
from powerinfer.
Hi Team,
Read through the paper as well, great work.
- I have a doubt, if have enough space in VRAM to load the model, will these optimizations helps?
- How much data % of train data is suggested for DejaVu 'Predictors' finding?
- How to obtain predictors from custom-trained models, should we again do inference using DejaVu or any other alternate method?
Thanks
Hello, thank you for your interest.
- Yes, when we have enough space in VRAM, we will fall back to Deja Vu.But currently, our code has not been optimized for complete offloading, and we will support this feature.
- Actually I use 1M data point for predictor training.
- For training predictors, we will open source a tool. At present, you can refer to the implementation of predictor training in Dejavu.
Dear Team,
I hope you're doing well. I'm following up on the discussion about the optimization for complete offloading and the fallback to Deja Vu.
Could you kindly provide any updates on the progress of this feature?
Thank you for your time.
from powerinfer.
Related Issues (20)
- How to assign the specified CUDA_VISIBLE_DEVICE?
- invalid device symbol
- Where is the definition or addition location of GGML_USE_HYBRID_THREADING? HOT 2
- convert.py: error: the following arguments are required: mlp_model HOT 4
- Unable to generate constant output HOT 2
- The code about the figures in paper HOT 1
- Any plans to support llamafied Qwen1.5?有支持llama化qwen的计划吗? HOT 2
- 在A100-80G上无法找到cuda的情况 HOT 2
- 请问大神有支持LLama 3 70B 的计划吗?
- 关于在A100显卡上测得的效果异常的疑问 HOT 1
- Why AXPY? HOT 2
- Will this work with Falcon 2?
- Need quite a long time to load the model
- ReluFalcon 40B 在llama.cpp上无效输出 HOT 2
- 推理报错
- ggml-cuda.cu:8949: invalid argument无效参数问题 HOT 2
- Source for v2 (mobile inference engine) HOT 5
- How to calculate the number of activated params in TurboSparse paper? HOT 2
- 支持的量化类型 HOT 1
- How can i convert llama-3 8b and 70b in GGUF model ?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from powerinfer.