Is this tested to work on 32 gb ram RWKV-4-Pile-14B-20230313-ctx8192-test1050.pth?

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

You will need to change the model path in v2/chat.py</code

The model BlinkDL/rwkv-4-pile-14b runs out of memory on HF with 32 GB RAM CPU about chatrwkv HOT 12 CLOSED

blinkdl commented on July 20, 2024

The model BlinkDL/rwkv-4-pile-14b runs out of memory on HF with 32 GB RAM CPU

from chatrwkv.

Comments (12)

BlinkDL commented on July 20, 2024

You are converting the bf16 model to fp32 so it becomes much larger.

Try using "cpu fp32i8" as strategy in convert_model.py and chat.py and then it will be much smaller.

from chatrwkv.

djaffer commented on July 20, 2024

Tried but it still runs out of memory. Maybe this needs some tuning or needs a gpu to really have it work properly.

from chatrwkv.

balisujohn commented on July 20, 2024

I can't load the model to convert it with 32GB of RAM either. Definitely interested in hearing if anyone finds a way to do it.

from chatrwkv.

balisujohn commented on July 20, 2024

@djaffer ok so if you have around 35 gigs of RAM including swap you can call

python3 ./convert.py  ... strategy "cpu bf16"

And I didn't run out of memory doing this.

Then you can also run the model you save this way with chat.py on CPU and it's actually surprisingly fast on CPU afaik. All of it seems to work without touching GPU I think, though I haven't been able to get it to work with my CUDA 12 installation.

from chatrwkv.

djaffer commented on July 20, 2024

I only have 32 gb

from chatrwkv.

balisujohn commented on July 20, 2024

@djaffer I have 33.4 GB of actual RAM and 2.1 GB of swap RAM and it works for me at reasonable speed. If you have fast storage you can try making a swapfile of several gigabytes, but definitely read up on it first since it's not necesarilly great for your storage to constantly do IO on it.

from chatrwkv.

djaffer commented on July 20, 2024

I converted the model as you suggested. Using hugging space upgraded cpu.

bdsz6 2023-04-02T06:31:31.804Z Traceback (most recent call last):
bdsz6 2023-04-02T06:31:31.804Z File "/home/user/.local/lib/python3.8/site-packages/rwkv/model.py", line 102, in init
bdsz6 2023-04-02T06:31:31.804Z w['emb.weight'] = F.layer_norm(w['emb.weight'], (args.n_embd,), weight=w['blocks.0.ln0.weight'], bias=w['blocks.0.ln0.bias'])
bdsz6 2023-04-02T06:31:31.804Z KeyError: 'blocks.0.ln0.weight'
bdsz6 2023-04-02T06:31:31.804Z
bdsz6 2023-04-02T06:31:31.804Z During handling of the above exception, another exception occurred:
bdsz6 2023-04-02T06:31:31.804Z
bdsz6 2023-04-02T06:31:31.804Z Traceback (most recent call last):
bdsz6 2023-04-02T06:31:31.804Z File "app.py", line 25, in
bdsz6 2023-04-02T06:31:31.804Z model = RWKV(model=model_path, strategy='cpu bf16')
bdsz6 2023-04-02T06:31:31.804Z File "/home/user/.local/lib/python3.8/site-packages/torch/jit/_script.py", line 292, in init_then_script
bdsz6 2023-04-02T06:31:31.804Z original_init(self, *args, **kwargs)
bdsz6 2023-04-02T06:31:31.804Z File "/home/user/.local/lib/python3.8/site-packages/rwkv/model.py", line 104, in init
bdsz6 2023-04-02T06:31:31.805Z w['emb.weight'] = F.layer_norm(w['emb.weight'].float(), (args.n_embd,), weight=w['blocks.0.ln0.weight'].float(), bias=w['blocks.0.ln0.bias'].float())
bdsz6 2023-04-02T06:31:31.805Z KeyError: 'blocks.0.ln0.weight'

from chatrwkv.

balisujohn commented on July 20, 2024

ok so youre saying convert worked using --strategy 'cpu bf16'?

Are you using the chat.py file inside v2?

from chatrwkv.

balisujohn commented on July 20, 2024

You will need to change the model path in v2/chat.py it to point to the saved output of the convert.py call

from chatrwkv.

zhaoqf123 commented on July 20, 2024

I only have 32 gb

I have 32 gb RAM and encounter the same problem. Then I tried to increase the SWAP memory to 16 GB, and then 32 GB, and finally it is solved. The method to increase the swap memory can be found here.

from chatrwkv.

djaffer commented on July 20, 2024

Yeah the performance is not that good. Turned of the machine. Curious if op can enable inference on hugging face.

from chatrwkv.

BlinkDL commented on July 20, 2024

https://github.com/saharNooby/rwkv.cpp
Now with efficient CPU inference (WIP) by community

from chatrwkv.

The model BlinkDL/rwkv-4-pile-14b runs out of memory on HF with 32 GB RAM CPU about chatrwkv HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent