Coder Social home page Coder Social logo

Comments (19)

YingZiqiang avatar YingZiqiang commented on June 1, 2024

@ewrfcas 请问roberta-middle在哪里,我为什么没有在界面上看见.

from roberta_zh.

ewrfcas avatar ewrfcas commented on June 1, 2024

@YingZiqiang 是Roberta_l24_zh_base,24层,12head,768hidden的。

from roberta_zh.

brightmart avatar brightmart commented on June 1, 2024

在我们的测试里large效果比middle要好。你训练的超参数怎么样的,能否贴出来,batch size多少。

from roberta_zh.

ewrfcas avatar ewrfcas commented on June 1, 2024

@brightmart 感谢回复,我large我是用5卡batchsize30训练的,middle是32,一共3个epoch,lr=3e-5/2e-5,warmup=0.1。除了batchsize基本和middle没区别。

from roberta_zh.

ewrfcas avatar ewrfcas commented on June 1, 2024

image
另外,large和middle的词表应该是相同的吧?那预处理应该不会有问题才对。。

from roberta_zh.

brightmart avatar brightmart commented on June 1, 2024

词汇表是一模一样的哦。你看看这两个large和middel的文件夹下的名称。是不是large的checkpoint没有加载成功呢。再跑一次,看看checkpoint加载成功了没,batch size用相同的32。

from roberta_zh.

ymcui avatar ymcui commented on June 1, 2024

Same question here.
尝试了三个阅读理解数据集:CMRC 2018, DRCD, CJRC在large上的效果都比较差(不是init_ckpt没加载的问题)。但XNLI可以得到比 @brightmart 报告的更好的结果。或许large不是max_seq_len=512训出来的?

from roberta_zh.

ewrfcas avatar ewrfcas commented on June 1, 2024

加载应该是成功的,我对比过参数,没有加载的只有cls的pooler相关的权重

from roberta_zh.

brightmart avatar brightmart commented on June 1, 2024

@ymcui 是的,现有的roberta是在max_seq_len为256上训练的,可以适合处理这范围内的;那么对于长文本,如超过256,可以效果不好。

阅读理解的效果测试结果是怎么样?

@ewrfcas

from roberta_zh.

ymcui avatar ymcui commented on June 1, 2024

@brightmart
OK, got it. Thanks.

from roberta_zh.

ewrfcas avatar ewrfcas commented on June 1, 2024

我在CMRC2018上测试结果都是基于512长度的,middle的F1在5次里是86~87,large的F1大概要低10个点,在75~77左右,256长度的large结果正在测试中
@brightmart 希望能够调整下large模型config文件的max_position_embeddings

from roberta_zh.

ewrfcas avatar ewrfcas commented on June 1, 2024

目前测roberta-large长度256在CMRC2018的dev结果为
F1:88.365, EM:69.991
lr=2e-5 epoch1最佳

from roberta_zh.

brightmart avatar brightmart commented on June 1, 2024

所有,初步的看,在这个阅读理解任务上,和其他模型比,怎么样呢?为什么阅读理解还能将长度设为这么小。

from roberta_zh.

ewrfcas avatar ewrfcas commented on June 1, 2024

这个结果目前看来在ERNIE2.0 base到ERNIE2.0 large之间,在预训练模型里效果算比较好的了。
长度设为256依靠划窗可以跑,但是效果会有一点下降

from roberta_zh.

brightmart avatar brightmart commented on June 1, 2024

好的。 @ewrfcas 是否可以测试对比一下XLNet_zh_Large在CMRC2018数据集上的效果?

(目前的XLNet_zh_Large是尝鲜版,如有问题会协助解决)

from roberta_zh.

ewrfcas avatar ewrfcas commented on June 1, 2024

@brightmart xlnet如果是用sentencepiece的话做阅读理解效果不好,具体可见ymcui/Chinese-XLNet#11

from roberta_zh.

oyjxer avatar oyjxer commented on June 1, 2024

这个结果目前看来在ERNIE2.0 base到ERNIE2.0 large之间,在预训练模型里效果算比较好的了。
长度设为256依靠划窗可以跑,但是效果会有一点下降

划窗具体怎么操作?@ewrfcas

from roberta_zh.

ahzz1207 avatar ahzz1207 commented on June 1, 2024

这个结果目前看来在ERNIE2.0 base到ERNIE2.0 large之间,在预训练模型里效果算比较好的了。
长度设为256依靠划窗可以跑,但是效果会有一点下降

划窗具体怎么操作?@ewrfcas

插个眼..同好奇

from roberta_zh.

ewrfcas avatar ewrfcas commented on June 1, 2024

划窗可以参考google官方squad代码,或者https://github.com/ewrfcas/bert_cn_finetune/blob/master/preprocess/cmrc2018_preprocess.py

from roberta_zh.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.