yitu-opensource / convbert Goto Github PK

View Code? Open in Web Editor NEW

241.0 241.0 54.0 203 KB

License: Other

Shell 0.38% Python 99.62%

convbert's People

Contributors

Stargazers

Watchers

Forkers

zihangjiang advanceflow mppmys yuweihao laomagic hell-to-heaven bruinxiong guome charliezhugj qianrenjian zhanzq zhuyawen fengxingxiang yyht frostjsy cdqncn wudi001007 xiaming9880 mrwaterzhou yiluzhuimeng xrosliang weibobo2015 zxgineng aikho guojson xinyang178 sabirdvd johnson7788 miss-rain phychaos lliai tommylitlle german-nlp-group judepark96 ishine wm2012011492 mengbingrock kedengfeng vovanphuc junnyu olek-glowka techthiyanes hudakas zth9730 linhr000 priyadhanu14 sinking8 trellixvulnteam jianfeiwang wangmengzhi akmiller01

convbert's Issues

UnboundLocalError: local variable 'seq_length' referenced before assignment

Hi, I am using the ConvBertForTokenClassification model in transformers and encountered the bug when passing only input_embeds to forward().
The traceback says that at line 833 in modeling_convbert.py

if token_type_ids is None:
    if hasattr(self.embeddings, "token_type_ids"):
        buffered_token_type_ids = self.embeddings.token_type_ids[:, :seq_length]

The seq_length is unassigned.

I noticed just above this piece of code that in

elif input_ids is not None:
    input_shape = input_ids.size()
    batch_size, seq_length = input_shape
elif inputs_embeds is not None:
    input_shape = inputs_embeds.size()[:-1]

seq_length is not assigned if the program enters elif inputs_embeds is not None.

Not sure if it is the batch_size, seq_length = input_shape missing for inputs_embeds or I am not using the model correctly?

这个预训练代码不就是ELECTRA那套？

The exact English pretraining data and Chinese pretraining data that are exact same to the BERT paper's pretraining data.

Any one know where to get them?
Thank you and thank you.

Training on multiple GPUs for BASE or LARGE Models

Hi,

here #16 (comment) you say

Our code is only tested on a single V100 GPU.

But in your Paper you write about BASE size ConvBERT models.

But BASE size models can not be trained (created) on a single GPU. From my experince you need 8 GPUs.

Could you please explain this? I would like to create a german BASE or maybe even LARGE new language model.

At #16 (comment) you say that Hugging Face might be an option for multi GPU training. From my experience they are good at downstream task training but not good at the initial language model creation.

I would be super happy about some help to create my new ConvBERT BASE or larger model in different languages.

Many thanks
Philip

关于mixed-attention推理速度的问题

请问一下，因为我看到论文中提到的FLOPs分别是26.5G和19.3G，请问这个实验数据是怎么得到的，因为我自己测试12层的medium-small模型encoder总体是在1GFLOPs左右。还有后面的推理速度是什么条件下测试出来的呢？
因为我这边得到的结果是推理速度慢于原始的self-attention，我猜想是因为里面虽然浮点计算操作少了，但是数据搬运的时间多了（reshape、transpose）。

疑惑

论文描述中，这个部分是LConv，有点不解，望不吝解答，感谢

能否提供一个预训练模型的国内下载地址，谢谢

如题

Train on GPU instead of TPU - differnt distribution strategies

Hi,
many thanks for this nice new model type and your research.
We would like to train a ConvBERT but on GPU and not TPU.
Do you have any experiences or tips how to do this?
We have concerns regarding the differnt distribution strategies
between GPUs and TPUs.

Thanks
Philip

Pytorch version

when will have Pytorch version?

Where is the chinese convbert model?

span light conv疑惑

你好，我想请问下，在span light conv中，既然已经用tf.layers.separable_conv1d生成了带span信息的矩阵key_conv_attn_layer，为什么还需要点乘query_layer呢？对应于conv_attn_layer = tf.multiply(key_conv_attn_layer, query_layer)。感觉此处点乘不是很有必要