Coder Social home page Coder Social logo

roberta_zh's Issues

loss库

请问from loss import bi_tempered_logistic_loss中的loss使用的是什么库 pip install loss找不着库

Use Roberta in pytorch transformers

I have been using BertPreTrainedModel to load this roberta model, which works well.

Noticing in pytorch_transformers, Roberta is also supported.

from pytorch_transformers import (BertConfig, BertTokenizer,
                                  RobertaConfig, RobertaTokenizer)

Should I switch to Roberta? If so, what to use for the parameter merges_file in RobertaTokenizer?

[question] Pretrain longer

感谢开源!

如果我没算错的话,RoBERTa-zh-Large跑了3epoch +(8k * 100k / 2.5 亿))

在roberta原始论文的里面看到,没加dynamic mask的bert大约跑了40epoch
image

所以可能训练更长的时间(500k/1m step)对下游任务效果会更好?
有进一步训练的计划吗?

加载pytorch模型错误

我使用这行加载模型时出错model_bert.load_state_dict(torch.load(init_checkpoint, map_location='cpu'))。

我修改成这个之后model_bert.load_state_dict(torch.load(init_checkpoint),strict=False)
模型可以使用但是效果很差。

请教一下处理预训练数据的大致策略?

最近在做bert模型蒸馏,准备使用作者开源的另一个中文语料库。训练数据的处理应该和作者训练roberta是一样的,取消了预测next sentence的预训练,所以训练数据只需要单独的句子就行。对于一篇长新闻的语料来说,我是以句号为分割,得到每一句话作为训练数据,还是尽可能满足长度接近512个字符的多个连续的句子作为一条训练数据,后面一种处理方法会遇到一些问题,比如最后不是以句号结尾,语义不完整,如果要处理,会复杂很杂。所以想请教下训练roberta时处理预训练数据的策略?

下载失败,点开链接出现下面这个

This XML file does not appear to have any style information associated with it. The document tree is shown below.

AccessDenied
Access denied.

Anonymous caller does not have storage.objects.get access to roberta_zh/roberta_model/.

指标里的模型都指的那个?

屏幕快照 2019-10-17 下午8 23 17
屏幕快照 2019-10-17 下午8 23 10

RoBERTa-zh-Large和RoBERTa-zh-Large(20w_steps)的区别是啥,分别对应着下载的那个模型
brightmart roberta_middle和brightmart roberta_large分别对应着下载的那个模型

关于多卡训练

您好,有没有多卡预训练roberta的方法。。
无论我怎么调试,使用的gpu数量还是1个
另外几个卡的利用率都是0%

[CLS],[SEP]

您好
我是用huggingface的pytorch代码https://github.com/huggingface/pytorch-transformers。但是我 设置了add_special_tokens=True,运行的时候却总是出错:
A sequence with no special tokens has been passed to the RoBERTa model. This model requires special tokens in order to work. Please specify add_special_tokens=True in your encoding.
并且忽略这个问题的话,运行之后的效果也远不如bert

关于在您的模型上继续预训练

您好打扰您了
我用您的roberta模型进行微调效果很好
但是我用您的模型进行预训练,一上来的mlm准确率是0。我也用完形填空试了一下,确实模型不能进行准确的预测。所以我猜测您的预训练模型的最上层(单词预测)是不是有一些问题,期待您的回复~

下载失败

This XML file does not appear to have any style information associated with it. The document tree is shown below.
AccessDeniedAccess denied.

Anonymous caller does not have storage.objects.get access to roberta_zh/roberta_model/roeberta_zh_L-24_H-768_A-12.zip.

请问关于数据预处理

你好,
请问在预训练的时候,数据预处理有去掉停用词等操作吗?还是就是不做任何处理就进行训练了?

谢谢。

关于pytorch版本

您好,请问Roberta中文模型(RoBERTa-zh-Layer6)对应的pytorch版本是什么?

关于BPE

你好,我想问一下,这个模型应该没有加BPE吧,我看原版roBERTa是使用了BPE来做tokenizer

PS: 貌似roBERTa_12L的下载链接挂了

关于预训练的embedding

请问预训练的embedding包含三部分(word embedding, position embedding, segment embedding)还是两部分(word embedding, position embedding)?谢谢!

运行run_classifier的时候oom了

请问模型对最低显存有要求吗?seq_len和batch_size等我都调到1了还是oom,我的显存是8G的,跑的是large版的,谢谢

cuda out of memory

您好
我之前用bert-base-chinese设置的batch size是32,现在用pytorch版本的roberta large,batch size设置为4还会出现显存不够的问题.pytorch的使用应该没问题,在小的数据集上跑通了,请问roberta large比bert-base-chinese要大多少呢

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.