Coder Social home page Coder Social logo

推理性能? about chatglm-6b-qlora HOT 6 CLOSED

Nipi64310 avatar Nipi64310 commented on September 11, 2024
推理性能?

from chatglm-6b-qlora.

Comments (6)

shuxueslpi avatar shuxueslpi commented on September 11, 2024

推理这部分我还没有做过性能相关的测试,但我理解这里的4bit方法应该和自带的quantize方法是一致的。
所以如果真的只有压缩显存的效果,那么确实应该还要寻求更优的推理方案,比如类似tensorRT这样的推理专用引擎之类的。

from chatglm-6b-qlora.

Nipi64310 avatar Nipi64310 commented on September 11, 2024

推理这部分我还没有做过性能相关的测试,但我理解这里的4bit方法应该和自带的quantize方法是一致的。 所以如果真的只有压缩显存的效果,那么确实应该还要寻求更优的推理方案,比如类似tensorRT这样的推理专用引擎之类的。

你可以帮测试一下训练过之后的性能吗,就用单条prompt循环10次,耗时平均, 我用没训练的测试了一下,似乎是比自带的快一点点,但是还是比不量化慢。确实提速只有靠cuda算子优化,用fasttransformers转了之后,速度提升比较明显

from chatglm-6b-qlora.

shuxueslpi avatar shuxueslpi commented on September 11, 2024

https://huggingface.co/blog/zh/hf-bitsandbytes-integration
找了下官方博客里的一篇文章,8bit的原理,看上去确实只是为了节省显存。我有空了也测试下。

from chatglm-6b-qlora.

shuxueslpi avatar shuxueslpi commented on September 11, 2024

@Nipi64310 不好意思,我昨天在写合并lora和用自带的quantize量化的脚本,对你的原始问题回答可能跑偏了。
这里qlora训练后,得到的是一个lora的adapter文件,只有几兆的大小,需要把这个adapter文件和原始fp32的模型合并后,才能得到完整模型,所以是可以得到完整的fp32模型,再拿去别的框架里加速的。
昨天更新的merge_lora_and_quantize.py文件,就是在做模型的合并和使用自带的quantize进行量化。

from chatglm-6b-qlora.

shuxueslpi avatar shuxueslpi commented on September 11, 2024

推理这部分我还没有做过性能相关的测试,但我理解这里的4bit方法应该和自带的quantize方法是一致的。 所以如果真的只有压缩显存的效果,那么确实应该还要寻求更优的推理方案,比如类似tensorRT这样的推理专用引擎之类的。

你可以帮测试一下训练过之后的性能吗,就用单条prompt循环10次,耗时平均, 我用没训练的测试了一下,似乎是比自带的快一点点,但是还是比不量化慢。确实提速只有靠cuda算子优化,用fasttransformers转了之后,速度提升比较明显

做了个测试,可以看一下:https://github.com/shuxueslpi/chatGLM-6B-QLoRA#%E6%8E%A8%E7%90%86%E6%80%A7%E8%83%BD%E6%B5%8B%E8%AF%95

from chatglm-6b-qlora.

Nipi64310 avatar Nipi64310 commented on September 11, 2024

感谢,很棒的工作

from chatglm-6b-qlora.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.