Comments (19)
@ewrfcas 请问roberta-middle在哪里,我为什么没有在界面上看见.
from roberta_zh.
@YingZiqiang 是Roberta_l24_zh_base,24层,12head,768hidden的。
from roberta_zh.
在我们的测试里large效果比middle要好。你训练的超参数怎么样的,能否贴出来,batch size多少。
from roberta_zh.
@brightmart 感谢回复,我large我是用5卡batchsize30训练的,middle是32,一共3个epoch,lr=3e-5/2e-5,warmup=0.1。除了batchsize基本和middle没区别。
from roberta_zh.
另外,large和middle的词表应该是相同的吧?那预处理应该不会有问题才对。。
from roberta_zh.
词汇表是一模一样的哦。你看看这两个large和middel的文件夹下的名称。是不是large的checkpoint没有加载成功呢。再跑一次,看看checkpoint加载成功了没,batch size用相同的32。
from roberta_zh.
Same question here.
尝试了三个阅读理解数据集:CMRC 2018, DRCD, CJRC在large上的效果都比较差(不是init_ckpt没加载的问题)。但XNLI可以得到比 @brightmart 报告的更好的结果。或许large不是max_seq_len=512训出来的?
from roberta_zh.
加载应该是成功的,我对比过参数,没有加载的只有cls的pooler相关的权重
from roberta_zh.
@ymcui 是的,现有的roberta是在max_seq_len为256上训练的,可以适合处理这范围内的;那么对于长文本,如超过256,可以效果不好。
阅读理解的效果测试结果是怎么样?
from roberta_zh.
@brightmart
OK, got it. Thanks.
from roberta_zh.
我在CMRC2018上测试结果都是基于512长度的,middle的F1在5次里是86~87,large的F1大概要低10个点,在75~77左右,256长度的large结果正在测试中
@brightmart 希望能够调整下large模型config文件的max_position_embeddings
from roberta_zh.
目前测roberta-large长度256在CMRC2018的dev结果为
F1:88.365, EM:69.991
lr=2e-5 epoch1最佳
from roberta_zh.
所有,初步的看,在这个阅读理解任务上,和其他模型比,怎么样呢?为什么阅读理解还能将长度设为这么小。
from roberta_zh.
这个结果目前看来在ERNIE2.0 base到ERNIE2.0 large之间,在预训练模型里效果算比较好的了。
长度设为256依靠划窗可以跑,但是效果会有一点下降
from roberta_zh.
好的。 @ewrfcas 是否可以测试对比一下XLNet_zh_Large在CMRC2018数据集上的效果?
(目前的XLNet_zh_Large是尝鲜版,如有问题会协助解决)
from roberta_zh.
@brightmart xlnet如果是用sentencepiece的话做阅读理解效果不好,具体可见ymcui/Chinese-XLNet#11
from roberta_zh.
这个结果目前看来在ERNIE2.0 base到ERNIE2.0 large之间,在预训练模型里效果算比较好的了。
长度设为256依靠划窗可以跑,但是效果会有一点下降
划窗具体怎么操作?@ewrfcas
from roberta_zh.
这个结果目前看来在ERNIE2.0 base到ERNIE2.0 large之间,在预训练模型里效果算比较好的了。
长度设为256依靠划窗可以跑,但是效果会有一点下降划窗具体怎么操作?@ewrfcas
插个眼..同好奇
from roberta_zh.
划窗可以参考google官方squad代码,或者https://github.com/ewrfcas/bert_cn_finetune/blob/master/preprocess/cmrc2018_preprocess.py
from roberta_zh.
Related Issues (20)
- GPT vs BERT, under same computation and data resource, which one is better for downstream tasks like GLUE?
- 在pytorch模型上做post train HOT 2
- NaN probability sometimes when inference on GPU
- CMRC示例
- 是否可以开放语料,供其他模型对比
- 其中依赖的预训练模型是否和bert官方提供是一样的?
- 关于MLM中,中文全词掩盖的预测标签问题 HOT 7
- What are the pretrained-language-model that is obviously better than BERT and RoBERTa?
- 预处理数据丢失问题 HOT 1
- Unrelated parameters in the config
- resource文件夹下的vocab和代码不对应
- pytorch用BERT的加载方式加载roberta模型,呢么创建token时special token 是按照bert的方式还是roberta的方式呢
- 请问下,怎么进行GPU训练?
- Huggingface
- 利用roberta_zh的tokenizer来做中文NER任务时报错 HOT 2
- tensorboard可视化模型输出结果 train的masked_lm_loss和masked_lm_accuracy是空的,eval的图只有一个点
- Loss curve
- 下载问题和加载模型
- 加载的小问题求解答
- RoBERTa_zh_Large_PyTorch的网盘链接失效了,能麻烦提供新的链接吗? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from roberta_zh.