Comments (3)
Actually, I was wrong. After I tried my port with a higher version of python+pytorch, the outputs were as good as the cpu ones, I am happy that it worked after all!
from llama-int8.
No luck with this repo, "bitsandbytes" dependency is heavily relying on CUDA.
But there is a repo for cpu inference, just change the prompts to prompts[0], so it doesn't crash with max_batch_size=1.
It takes more than 10 minutes to produce output with max_gen_len=20, even GPT-J 5B took me around a minute on CPU.
I also tried to make an MPS port with gpu acceleration, it works faster, but the output is not good enough imo, not sure if it is always good on cpu or if I just got lucky on my first generation. UPDATE: the model gives good outputs with python3.10 + pytorch-nightly
from llama-int8.
thanks!
from llama-int8.
Related Issues (17)
- 13B - load is successful on T4, but forward pass fails
- Does 8GB able to run smallest llama model? HOT 4
- Any chance to share quantized int8 7B and 13B models?
- RTX4090 CUDA out of memory. HOT 3
- When a single A100 80G ,memory is about 96G,Error loading 65B HOT 3
- Systematic comparison of original models to int8 inferencing HOT 1
- Is it possible to save the smaller weights so it doesn't have to convert them each time?
- Can 65B run on 4*32G GPU?
- Getting error on generation in Windows HOT 4
- Issue for bitsandbytes /// NameError: name 'cuda_setup' is not defined. Did you mean: 'CUDASetup'? HOT 1
- Further detail needed - installing bitsandbytes from source HOT 1
- LLaMA 13B works on a single RTX 4080 16GB HOT 1
- 65B on multiple GPUs : CUDA out of memory with 4 x GPU RTX A5000 (24GB) / 96GB in total HOT 3
- CUDA out of memory
- Producing nan Tensors
- Does this support llama2 as well?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llama-int8.