Comments (6)
可能是模型下载出错了。因为其它人都没反应过这种现象。我刚测了测也是好的。重新下载试试。
from chatrwkv.
可能是模型下载出错了。因为其它人都没反应过这种现象。我刚测了测也是好的。重新下载试试。
重新下载还是不行,不过换一个下载器下载就好了...
from chatrwkv.
用的 fp16?你看看 probs 内容,说明有浮点溢出
from chatrwkv.
用的 fp16?你看看 probs 内容,说明有浮点溢出
使用的参数是cuda - fp16 - RWKV_JIT_ON=1,在win10机器上测试的。
使用中模型的话是在out = torch.multinomial(probs, num_samples=1)[0]这一步报的错误,这里的probs类型是torch.float32,值全都是nan。换成小模型这里是正常的。
再往前看,在out = run_rnn(tokenizer.tokenizer.encode(init_prompt))这一步的out的值也都是nan,换成小模型同样是正常的。
如果换成cpu - fp32 - RWKV_JIT_ON=1,则是在out = np.random.choice(a=len(probs), p=probs)这里报错:probabilities contain NaN
from chatrwkv.
输入的什么 prompt,先输入很短的句子试试
from chatrwkv.
输入的什么 prompt,先输入很短的句子试试
使用很短的中文、英文、数字都会报错。
调试了一下发现,最早出现 nan 的地方是 model_run.py 中 forward() 的这个部分:
for i in range(args.n_layer):
if i == 0:
x = self.LN(x, w.blocks[i].ln0)
ww = w.blocks[i].att
x = x + self.SA(self.LN(x, w.blocks[i].ln1), state, i,
ww.time_mix_k, ww.time_mix_v, ww.time_mix_r, ww.time_first, ww.time_decay,
ww.key.weight, ww.value.weight, ww.receptance.weight, ww.output.weight)
ww = w.blocks[i].ffn
x = x + self.FF(self.LN(x, w.blocks[i].ln2), state, i,
ww.time_mix_k, ww.time_mix_r,
ww.key.weight, ww.value.weight, ww.receptance.weight)
if (i+1) % RWKV_RESCALE_LAYER == 0:
x = x / 2
使用中模型时,n_layer 默认是32,当 i=18 时,这里的 x 的值就变成 nan 了。如果把 n_layer 设置成16,x 的值不是 nan,进行提问的时候就不会报错,但是结果会变得很差,只会重复一个单词。
这里的 x 为 nan,也就是 out = run_rnn(tokenizer.tokenizer.encode(init_prompt)) 这里的 out 为 nan,最后会导致 out = torch.multinomial(probs, num_samples=1)[0] 中的 probs 为 nan,也就报错了。
from chatrwkv.
Related Issues (20)
- 'No CUDA GPUs are available' in google colab with V100 GPU and high RAM HOT 2
- huggingface无法访问,模型无法下载 HOT 4
- Prompt for RAG with RWKV-4-World-7B-v1-20230626-ctx4096 HOT 1
- [Feature Request] text2music HOT 2
- RuntimeError: Error building extension 'wkv_cuda_v1' HOT 2
- How to write the RWKV in autogressive style like RNN HOT 2
- NameError: name 'PIPELINE' is not defined HOT 1
- 大哥,乱码了 HOT 1
- 回复总是截断了,如何让回复自然的结束 HOT 1
- eagle-7B HOT 1
- Inference doesn't work on Apple Macbook even when using CPU fp32 as strategy HOT 1
- "cpu fp32i8" strategy not working in RWKV v6 through Python rwkv module HOT 2
- How to run new v5-Eagle-7B HOT 2
- mps slower than cpu HOT 1
- model path list HOT 1
- add text condition for gen music HOT 1
- [pip package] Make loading aware that os.environ can change HOT 2
- [pip package] feature request: pipeline.generate: add ability to get the state, if it was not provided HOT 1
- 如何选模型基座?
- [requires_grad]在本地部署CHATRWKV时遇到了AttributeError: 'str' object has no attribute 'requires_grad'
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chatrwkv.