Comments (12)
You are converting the bf16 model to fp32 so it becomes much larger.
Try using "cpu fp32i8" as strategy in convert_model.py and chat.py and then it will be much smaller.
from chatrwkv.
Tried but it still runs out of memory. Maybe this needs some tuning or needs a gpu to really have it work properly.
from chatrwkv.
I can't load the model to convert it with 32GB of RAM either. Definitely interested in hearing if anyone finds a way to do it.
from chatrwkv.
@djaffer ok so if you have around 35 gigs of RAM including swap you can call
python3 ./convert.py ... strategy "cpu bf16"
And I didn't run out of memory doing this.
Then you can also run the model you save this way with chat.py
on CPU and it's actually surprisingly fast on CPU afaik. All of it seems to work without touching GPU I think, though I haven't been able to get it to work with my CUDA 12 installation.
from chatrwkv.
I only have 32 gb
from chatrwkv.
@djaffer I have 33.4 GB of actual RAM and 2.1 GB of swap RAM and it works for me at reasonable speed. If you have fast storage you can try making a swapfile of several gigabytes, but definitely read up on it first since it's not necesarilly great for your storage to constantly do IO on it.
from chatrwkv.
I converted the model as you suggested. Using hugging space upgraded cpu.
bdsz6 2023-04-02T06:31:31.804Z Traceback (most recent call last):
bdsz6 2023-04-02T06:31:31.804Z File "/home/user/.local/lib/python3.8/site-packages/rwkv/model.py", line 102, in init
bdsz6 2023-04-02T06:31:31.804Z w['emb.weight'] = F.layer_norm(w['emb.weight'], (args.n_embd,), weight=w['blocks.0.ln0.weight'], bias=w['blocks.0.ln0.bias'])
bdsz6 2023-04-02T06:31:31.804Z KeyError: 'blocks.0.ln0.weight'
bdsz6 2023-04-02T06:31:31.804Z
bdsz6 2023-04-02T06:31:31.804Z During handling of the above exception, another exception occurred:
bdsz6 2023-04-02T06:31:31.804Z
bdsz6 2023-04-02T06:31:31.804Z Traceback (most recent call last):
bdsz6 2023-04-02T06:31:31.804Z File "app.py", line 25, in
bdsz6 2023-04-02T06:31:31.804Z model = RWKV(model=model_path, strategy='cpu bf16')
bdsz6 2023-04-02T06:31:31.804Z File "/home/user/.local/lib/python3.8/site-packages/torch/jit/_script.py", line 292, in init_then_script
bdsz6 2023-04-02T06:31:31.804Z original_init(self, *args, **kwargs)
bdsz6 2023-04-02T06:31:31.804Z File "/home/user/.local/lib/python3.8/site-packages/rwkv/model.py", line 104, in init
bdsz6 2023-04-02T06:31:31.805Z w['emb.weight'] = F.layer_norm(w['emb.weight'].float(), (args.n_embd,), weight=w['blocks.0.ln0.weight'].float(), bias=w['blocks.0.ln0.bias'].float())
bdsz6 2023-04-02T06:31:31.805Z KeyError: 'blocks.0.ln0.weight'
from chatrwkv.
ok so youre saying convert worked using --strategy 'cpu bf16'
?
Are you using the chat.py
file inside v2?
from chatrwkv.
You will need to change the model path in v2/chat.py
it to point to the saved output of the convert.py
call
from chatrwkv.
I only have 32 gb
I have 32 gb RAM and encounter the same problem. Then I tried to increase the SWAP memory to 16 GB, and then 32 GB, and finally it is solved. The method to increase the swap memory can be found here.
from chatrwkv.
Yeah the performance is not that good. Turned of the machine. Curious if op can enable inference on hugging face.
from chatrwkv.
https://github.com/saharNooby/rwkv.cpp
Now with efficient CPU inference (WIP) by community
from chatrwkv.
Related Issues (20)
- 'No CUDA GPUs are available' in google colab with V100 GPU and high RAM HOT 2
- huggingface无法访问,模型无法下载 HOT 4
- Prompt for RAG with RWKV-4-World-7B-v1-20230626-ctx4096 HOT 1
- [Feature Request] text2music HOT 2
- RuntimeError: Error building extension 'wkv_cuda_v1' HOT 2
- How to write the RWKV in autogressive style like RNN HOT 2
- NameError: name 'PIPELINE' is not defined HOT 1
- 大哥,乱码了 HOT 1
- 回复总是截断了,如何让回复自然的结束 HOT 1
- eagle-7B HOT 1
- Inference doesn't work on Apple Macbook even when using CPU fp32 as strategy HOT 1
- "cpu fp32i8" strategy not working in RWKV v6 through Python rwkv module HOT 2
- How to run new v5-Eagle-7B HOT 2
- mps slower than cpu HOT 1
- model path list HOT 1
- add text condition for gen music HOT 1
- [pip package] Make loading aware that os.environ can change HOT 2
- [pip package] feature request: pipeline.generate: add ability to get the state, if it was not provided HOT 1
- 如何选模型基座?
- [requires_grad]在本地部署CHATRWKV时遇到了AttributeError: 'str' object has no attribute 'requires_grad'
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chatrwkv.