Coder Social home page Coder Social logo

Comments (10)

hit-computer avatar hit-computer commented on September 27, 2024

我也遇到过有重复的情况额,感觉很难重现karpathy的结果啊.....貌似用他写的程序训练也会出现重复的现象,估计和参数设定有关吧

from char-rnn.

liuhy0908 avatar liuhy0908 commented on September 27, 2024

确实是,在github上找了好多相关的项目,都没有karpathy的python版跑出的结果好(对torch不了解)

from char-rnn.

liuhy0908 avatar liuhy0908 commented on September 27, 2024

但应该也不是参数的问题,训练karpathy的程序时,换过各种参数,效果都还可以,应该不是参数的问题

from char-rnn.

hit-computer avatar hit-computer commented on September 27, 2024

是吗,我没有试过他torch的代码额。我试过他用numpy写的简易模型,貌似效果也不是很好额,估计torch写的那个模型应该有一些小trick吧,我对torch也不是很熟悉-_-

from char-rnn.

zhang-jinyi avatar zhang-jinyi commented on September 27, 2024

https://github.com/hejunqing/tf-char-cnn-lstm

可以看看这个更新版本。无限接近Yoon Kim's paper。

from char-rnn.

apeterswu avatar apeterswu commented on September 27, 2024

所以请假一下,不停重复翻译的原因是?我也遇到了同样的情况

from char-rnn.

hit-computer avatar hit-computer commented on September 27, 2024

@apeterswu 在做生成的时候其实有两种策略,一种是argmax还有一种是sample,本程序用的是argmax策略,这个策略会导致重复的现象,而sample策略不会但句子连贯性会比argmax策略差一些(karpathy的程序默认采用的是sample策略)。同时,我最近用tensorflow重写这个模型后发现增加训练语料以及采用多层RNN能使重复现象出现时序列长度更长(采用argmax策略时)。

from char-rnn.

apeterswu avatar apeterswu commented on September 27, 2024

@hit-computer 不过在decoding的时候用beam search,所以还是使用argmax,因此这个问题还是会存在?

from char-rnn.

hit-computer avatar hit-computer commented on September 27, 2024

@apeterswu 是的,beam search每次选的是top-N max,也是会出现重复问题。所以建议还是增加语料,增加隐层参数和迭代次数,然后采用sample策略是比较好的

from char-rnn.

xslittlegrass avatar xslittlegrass commented on September 27, 2024

谢谢分享,你知道karpathy blog里面例子的参数吗?

from char-rnn.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.