Coder Social home page Coder Social logo

Comments (6)

BlinkDL avatar BlinkDL commented on July 20, 2024 1

可能是模型下载出错了。因为其它人都没反应过这种现象。我刚测了测也是好的。重新下载试试。

from chatrwkv.

Leo4zhou avatar Leo4zhou commented on July 20, 2024 1

可能是模型下载出错了。因为其它人都没反应过这种现象。我刚测了测也是好的。重新下载试试。

重新下载还是不行,不过换一个下载器下载就好了...

from chatrwkv.

BlinkDL avatar BlinkDL commented on July 20, 2024

用的 fp16?你看看 probs 内容,说明有浮点溢出

from chatrwkv.

Leo4zhou avatar Leo4zhou commented on July 20, 2024

用的 fp16?你看看 probs 内容,说明有浮点溢出

使用的参数是cuda - fp16 - RWKV_JIT_ON=1,在win10机器上测试的。
使用中模型的话是在out = torch.multinomial(probs, num_samples=1)[0]这一步报的错误,这里的probs类型是torch.float32,值全都是nan。换成小模型这里是正常的。
再往前看,在out = run_rnn(tokenizer.tokenizer.encode(init_prompt))这一步的out的值也都是nan,换成小模型同样是正常的。

如果换成cpu - fp32 - RWKV_JIT_ON=1,则是在out = np.random.choice(a=len(probs), p=probs)这里报错:probabilities contain NaN

from chatrwkv.

BlinkDL avatar BlinkDL commented on July 20, 2024

输入的什么 prompt,先输入很短的句子试试

from chatrwkv.

Leo4zhou avatar Leo4zhou commented on July 20, 2024

输入的什么 prompt,先输入很短的句子试试

使用很短的中文、英文、数字都会报错。

调试了一下发现,最早出现 nan 的地方是 model_run.py 中 forward() 的这个部分:

        for i in range(args.n_layer):
            if i == 0:
                x = self.LN(x, w.blocks[i].ln0)
            ww = w.blocks[i].att
            x = x + self.SA(self.LN(x, w.blocks[i].ln1), state, i, 
                ww.time_mix_k, ww.time_mix_v, ww.time_mix_r, ww.time_first, ww.time_decay, 
                ww.key.weight, ww.value.weight, ww.receptance.weight, ww.output.weight)
            ww = w.blocks[i].ffn
            x = x + self.FF(self.LN(x, w.blocks[i].ln2), state, i, 
                ww.time_mix_k, ww.time_mix_r, 
                ww.key.weight, ww.value.weight, ww.receptance.weight)
            if (i+1) % RWKV_RESCALE_LAYER == 0:
                x = x / 2

使用中模型时,n_layer 默认是32,当 i=18 时,这里的 x 的值就变成 nan 了。如果把 n_layer 设置成16,x 的值不是 nan,进行提问的时候就不会报错,但是结果会变得很差,只会重复一个单词。
这里的 x 为 nan,也就是 out = run_rnn(tokenizer.tokenizer.encode(init_prompt)) 这里的 out 为 nan,最后会导致 out = torch.multinomial(probs, num_samples=1)[0] 中的 probs 为 nan,也就报错了。

from chatrwkv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.