Coder Social home page Coder Social logo

Comments (9)

YixinSong-e avatar YixinSong-e commented on May 22, 2024 2

@jtoy
Yes, but not entirely. Currently, PowerInfer only benefits when resources are limited, such as the 7B model exceeding GPU memory. For scenarios where the model can be placed in GPU VRAM, the advantage of PowerInfer is not significant, and further optimization of the relevant code is still being carried out. Please wait for the next update as we will optimize the performance of the 7B scale model. By the way, happy New Year. :)

from powerinfer.

YixinSong-e avatar YixinSong-e commented on May 22, 2024 1

I will now provide some explanations. In fact, the target scenario for PowerInfer is the model size exceeds GPU VRAM. The currently open-sourced version of the code is not suitable for running entirely on GPU for inference. Moreover, the open-source code, for ease of trial use, is not the same as the code tested in the paper, and it introduces about a 10% performance decline(We are still investigating the cause).

The open-source version of PowerInfer has the following issues when the entire model is on the GPU:

  1. There are synchronization issues between CPU and GPU. Even if the model is entirely on the GPU, the Feed-Forward Network (FFN) layer still requires synchronization between CPU and GPU, introducing significant synchronization overhead.
  2. To facilitate the computations of the predictor, the current computation results of the predictor are stored on the CPU. This means that even if the model is on the GPU, some minor computations are still performed by the CPU.
    Since our code has not been deeply optimized for models that can be entirely on the GPU, I have attempted to eliminate the above two overheads as much as possible on our internal PowerInfer code. Preliminary tests on a 4090 using llama-2-7B yielded the following results:
    llama.cpp: 15.9 ms/token
    PowerInfer: 12.64 ms/token on average
    After further breakdown, I found that there are still some computations currently placed on the CPU, meaning some overheads have not been eliminated, preventing further improvement in PowerInfer.
    And here is the result.
image

Currently, I am thinking whether to provide a testbed for those who wish to reproduce the results of our paper. Moreover, the level of interest from the community in this project has significantly surpassed our expectations. Please be aware that our open-source code is currently in a preliminary stage of development. Please be patient as we work on further optimizing the code.

from powerinfer.

tusharsoni42909 avatar tusharsoni42909 commented on May 22, 2024

Jtoy hii is the issue we resolve after eod
Thank you
Tushar

from powerinfer.

jtoy avatar jtoy commented on May 22, 2024

Which is the commit? Is it already pushed?

from powerinfer.

jtoy avatar jtoy commented on May 22, 2024

And what is required to test it? Do we just recompile or do we need to reconvert weights?

Can we add the performance gains for llama2 7b to the paper? It wasn’t clear when I read what expected gains we should expect.

from powerinfer.

YixinSong-e avatar YixinSong-e commented on May 22, 2024

Thank you for your feedback. In our previous tests, even if all of the model were placed on the GPU, there was still a certain acceleration ratio(maybe 1.2 -2x speedup, this is the result I tested with OPT before.) for powerinfer. We will check the results you mentioned. Currently, I believe there may be some performance issues with the open-source version of the code. We will reply to you as soon as possible regarding the reason for this result.

from powerinfer.

jtoy avatar jtoy commented on May 22, 2024

Is there anyway I can help test?

from powerinfer.

jtoy avatar jtoy commented on May 22, 2024

@YixinSong-e is it right to say that llama2 7b might not good a speed up with this library? are there any new updates?

from powerinfer.

jtoy avatar jtoy commented on May 22, 2024

Has there been any improvement with these smaller models?

from powerinfer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.