Comments (8)
请问 powerInfo 是基于哪个版本的 llama.cpp 做扩展的?原始的外层接口是否有修改?和 llama-cpp-python是否兼容?
from powerinfer.
Thank you for your interest in our project. In fact, our work has been developed based on llama.cpp, leading to an overlap in our code. We are immensely grateful for the excellent and easily modifiable code structure provided by llama.cpp. Building upon this, we have made improvements to the model loading method to achieve fine-grained neuron-level split and adjusted some of the corresponding operators. Additionally, we have enhanced the parallel processing capabilities of both CPU and GPU operators. Overall, we do not see starting from scratch as a favorable option; llama.cpp has already laid a solid code foundation for us.
感谢您对我们项目的关注。实际上,我们的工作基于llama.cpp进行了扩展,这也导致了代码的重合。我们非常感谢llama.cpp提供的优秀、易于修改的代码架构。在此基础上,为了实现神经元粒度的划分,我们对模型加载方式进行了改进,并提供了相关的稀疏算子。此外,我们还增强了对CPU和GPU算子的并行处理能力。总的来说,我们并不倾向于从头开始构建整个系统;llama.cpp已经为我们奠定了坚实的代码基础。
from powerinfer.
请问 powerInfo 是基于哪个版本的 llama.cpp 做扩展的?原始的外层接口是否有修改?和 llama-cpp-python是否兼容?
完全不兼容 需要ReLU化的模型
from powerinfer.
接口层面应该是可以兼容的,模型不一致不影响不影响接口层面的。
from powerinfer.
对于4090 的推理速度我存在质疑。不可能会少于10t/s(注意这是CPU推理的速度)
基于llama.cpp拉出的是只适配cpu的,你可以拉出最新的llama.cpp 已经适配了这个问题。
再次对比下速度。
from powerinfer.
对于4090 的推理速度我存在质疑。不可能会少于10t/s(注意这是CPU推理的速度) 基于llama.cpp拉出的是只适配cpu的,你可以拉出最新的llama.cpp 已经适配了这个问题。 再次对比下速度。
建议您按照我们的论文复现一下相关实验,对比powerinfer和llama.cpp在Falcon的性能。
如果发现任何问题,欢迎带着您的数据和我们讨论,谢谢。
from powerinfer.
请问 powerInfo 是基于哪个版本的 llama.cpp 做扩展的?原始的外层接口是否有修改?和 llama-cpp-python是否兼容?
PowerInfer是基于 llama.cpp 的 6bb4908
commit 分叉而来。由于 llama.cpp自此后一直保持更新并在外层接口有诸多修改,因此在ABI层面PowerInfer和最新的llama.cpp不兼容,也因此无法兼容llama-cpp-python的主线版本。
我尝试在较早版本的llama-cpp-python上兼容了PowerInfer的ABI,创建了这个fork。它可以实现正常的模型加载和推理,我以此为基础搭建了PowerInfer的Gradio server。欢迎试用这个库,但是不鼓励用在任何生产环境中。更多的讨论请见 #64 。
如果只需要应用级别的接口兼容,可以考虑使用 examples/server
来用API server封装内部实现的差异性。
PowerInfer is forked from the llama.cpp's 6bb4908
commit. Since then, llama.cpp has been continuously updated, with numerous changes to its external interfaces. Consequently, at the ABI level, PowerInfer is not compatible with the latest version of llama.cpp, nor with the mainline version of llama-cpp-python.
I have attempted to make PowerInfer's ABI compatible with an earlier version of llama-cpp-python and created this fork. It enables normal model loading and inference, and I have used this as a basis to build PowerInfer's Gradio server. You are welcome to try out this library, but it is not recommended for use in any production environment. For more discussion, please see #64.
If you only need application-level interface compatibility, consider using examples/server
to encapsulate the differences in internal implementation through an API server.
from powerinfer.
请问 powerInfo 是基于哪个版本的 llama.cpp 做扩展的?原始的外层接口是否有修改?和 llama-cpp-python是否兼容?
完全不兼容 需要ReLU化的模型
PowerInfer 利用了MLP中两个Linear层参数活跃度的高局部性,挺有创新性的,性能也很赞!
只是目前PowerInfer 需要限定模型MLP中的激活函数使用 ReLU。
原始的LLAMA模型使用的是SwiGLU,所以PowerInfer暂时不支持原始的LLAMA模型,需要将模型中的SwiGLU替换成ReLU。
请问我的理解对吗?
另外,简单替换激活函数之后,如果没有重新训练或者微调,模型推理的准确度怎么样?
PowerInfer 需要限定使用ReLU的原因是什么呢?
对于其他激活函数,有观察到MLP中参数活跃度的高局部性吗?
from powerinfer.
Related Issues (20)
- 24GB的显存只能占用12GB,CUDA占用也不到10%。但是CPU占用100%内存占用35GB HOT 1
- 24GB的显存只能占用12GB,CUDA占用也不到10%。但是CPU占用100%内存占用35GB HOT 1
- [Question]: High PPL on wikitext2 of ReLU-LLAMA-7B for language modeling tasks HOT 2
- Does PowerInfer support multi-GPU? HOT 1
- Will we have instruct fine-tuned model support in the future? HOT 1
- Clarification on Output Neuron Pruning Method in "Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time HOT 2
- Segmentation fault (core dumped) in ggml test
- two questions that i want to solve HOT 2
- How to assign the specified CUDA_VISIBLE_DEVICE?
- invalid device symbol
- Where is the definition or addition location of GGML_USE_HYBRID_THREADING? HOT 2
- convert.py: error: the following arguments are required: mlp_model HOT 4
- Unable to generate constant output HOT 2
- The code about the figures in paper HOT 1
- Any plans to support llamafied Qwen1.5?有支持llama化qwen的计划吗? HOT 2
- 在A100-80G上无法找到cuda的情况 HOT 2
- 请问大神有支持LLama 3 70B 的计划吗?
- 关于在A100显卡上测得的效果异常的疑问 HOT 1
- Why AXPY? HOT 2
- Will this work with Falcon 2?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from powerinfer.