Coder Social home page Coder Social logo

Comments (8)

1562668477 avatar 1562668477 commented on May 16, 2024 3

请问 powerInfo 是基于哪个版本的 llama.cpp 做扩展的?原始的外层接口是否有修改?和 llama-cpp-python是否兼容?

from powerinfer.

YixinSong-e avatar YixinSong-e commented on May 16, 2024

Thank you for your interest in our project. In fact, our work has been developed based on llama.cpp, leading to an overlap in our code. We are immensely grateful for the excellent and easily modifiable code structure provided by llama.cpp. Building upon this, we have made improvements to the model loading method to achieve fine-grained neuron-level split and adjusted some of the corresponding operators. Additionally, we have enhanced the parallel processing capabilities of both CPU and GPU operators. Overall, we do not see starting from scratch as a favorable option; llama.cpp has already laid a solid code foundation for us.

感谢您对我们项目的关注。实际上,我们的工作基于llama.cpp进行了扩展,这也导致了代码的重合。我们非常感谢llama.cpp提供的优秀、易于修改的代码架构。在此基础上,为了实现神经元粒度的划分,我们对模型加载方式进行了改进,并提供了相关的稀疏算子。此外,我们还增强了对CPU和GPU算子的并行处理能力。总的来说,我们并不倾向于从头开始构建整个系统;llama.cpp已经为我们奠定了坚实的代码基础。

from powerinfer.

sorasoras avatar sorasoras commented on May 16, 2024

请问 powerInfo 是基于哪个版本的 llama.cpp 做扩展的?原始的外层接口是否有修改?和 llama-cpp-python是否兼容?

完全不兼容 需要ReLU化的模型

from powerinfer.

1562668477 avatar 1562668477 commented on May 16, 2024

接口层面应该是可以兼容的,模型不一致不影响不影响接口层面的。

from powerinfer.

2213601279 avatar 2213601279 commented on May 16, 2024

对于4090 的推理速度我存在质疑。不可能会少于10t/s(注意这是CPU推理的速度)
image基于llama.cpp拉出的是只适配cpu的,你可以拉出最新的llama.cpp 已经适配了这个问题。
再次对比下速度。

from powerinfer.

ZeyuMi avatar ZeyuMi commented on May 16, 2024

对于4090 的推理速度我存在质疑。不可能会少于10t/s(注意这是CPU推理的速度) image基于llama.cpp拉出的是只适配cpu的,你可以拉出最新的llama.cpp 已经适配了这个问题。 再次对比下速度。

建议您按照我们的论文复现一下相关实验,对比powerinfer和llama.cpp在Falcon的性能。
如果发现任何问题,欢迎带着您的数据和我们讨论,谢谢。

from powerinfer.

hodlen avatar hodlen commented on May 16, 2024

请问 powerInfo 是基于哪个版本的 llama.cpp 做扩展的?原始的外层接口是否有修改?和 llama-cpp-python是否兼容?

PowerInfer是基于 llama.cpp 的 6bb4908 commit 分叉而来。由于 llama.cpp自此后一直保持更新并在外层接口有诸多修改,因此在ABI层面PowerInfer和最新的llama.cpp不兼容,也因此无法兼容llama-cpp-python的主线版本。

我尝试在较早版本的llama-cpp-python上兼容了PowerInfer的ABI,创建了这个fork。它可以实现正常的模型加载和推理,我以此为基础搭建了PowerInfer的Gradio server。欢迎试用这个库,但是不鼓励用在任何生产环境中。更多的讨论请见 #64

如果只需要应用级别的接口兼容,可以考虑使用 examples/server 来用API server封装内部实现的差异性。


PowerInfer is forked from the llama.cpp's 6bb4908 commit. Since then, llama.cpp has been continuously updated, with numerous changes to its external interfaces. Consequently, at the ABI level, PowerInfer is not compatible with the latest version of llama.cpp, nor with the mainline version of llama-cpp-python.

I have attempted to make PowerInfer's ABI compatible with an earlier version of llama-cpp-python and created this fork. It enables normal model loading and inference, and I have used this as a basis to build PowerInfer's Gradio server. You are welcome to try out this library, but it is not recommended for use in any production environment. For more discussion, please see #64.

If you only need application-level interface compatibility, consider using examples/server to encapsulate the differences in internal implementation through an API server.

from powerinfer.

shifang99 avatar shifang99 commented on May 16, 2024

请问 powerInfo 是基于哪个版本的 llama.cpp 做扩展的?原始的外层接口是否有修改?和 llama-cpp-python是否兼容?

完全不兼容 需要ReLU化的模型

PowerInfer 利用了MLP中两个Linear层参数活跃度的高局部性,挺有创新性的,性能也很赞!

只是目前PowerInfer 需要限定模型MLP中的激活函数使用 ReLU。
原始的LLAMA模型使用的是SwiGLU,所以PowerInfer暂时不支持原始的LLAMA模型,需要将模型中的SwiGLU替换成ReLU。
请问我的理解对吗?
另外,简单替换激活函数之后,如果没有重新训练或者微调,模型推理的准确度怎么样?
PowerInfer 需要限定使用ReLU的原因是什么呢?
对于其他激活函数,有观察到MLP中参数活跃度的高局部性吗?

from powerinfer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.