Coder Social home page Coder Social logo

convbert's People

Contributors

philipmay avatar zhoudaquan avatar zihangjiang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

convbert's Issues

UnboundLocalError: local variable 'seq_length' referenced before assignment

Hi, I am using the ConvBertForTokenClassification model in transformers and encountered the bug when passing only input_embeds to forward().
The traceback says that at line 833 in modeling_convbert.py

if token_type_ids is None:
    if hasattr(self.embeddings, "token_type_ids"):
        buffered_token_type_ids = self.embeddings.token_type_ids[:, :seq_length]

The seq_length is unassigned.

I noticed just above this piece of code that in

elif input_ids is not None:
    input_shape = input_ids.size()
    batch_size, seq_length = input_shape
elif inputs_embeds is not None:
    input_shape = inputs_embeds.size()[:-1]

seq_length is not assigned if the program enters elif inputs_embeds is not None.

Not sure if it is the batch_size, seq_length = input_shape missing for inputs_embeds or I am not using the model correctly?

Training on multiple GPUs for BASE or LARGE Models

Hi,

here #16 (comment) you say

Our code is only tested on a single V100 GPU.

But in your Paper you write about BASE size ConvBERT models.

But BASE size models can not be trained (created) on a single GPU. From my experince you need 8 GPUs.

Could you please explain this? I would like to create a german BASE or maybe even LARGE new language model.

At #16 (comment) you say that Hugging Face might be an option for multi GPU training. From my experience they are good at downstream task training but not good at the initial language model creation.

I would be super happy about some help to create my new ConvBERT BASE or larger model in different languages.

Many thanks
Philip

关于mixed-attention推理速度的问题

请问一下,因为我看到论文中提到的FLOPs分别是26.5G和19.3G,请问这个实验数据是怎么得到的,因为我自己测试12层的medium-small模型encoder总体是在1GFLOPs左右。还有后面的推理速度是什么条件下测试出来的呢?
因为我这边得到的结果是推理速度慢于原始的self-attention,我猜想是因为里面虽然浮点计算操作少了,但是数据搬运的时间多了(reshape、transpose)。

疑惑

image
论文描述中,这个部分是LConv,有点不解,望不吝解答,感谢

Train on GPU instead of TPU - differnt distribution strategies

Hi,
many thanks for this nice new model type and your research.
We would like to train a ConvBERT but on GPU and not TPU.
Do you have any experiences or tips how to do this?
We have concerns regarding the differnt distribution strategies
between GPUs and TPUs.

Thanks
Philip

span light conv疑惑

你好,我想请问下,在span light conv中,既然已经用tf.layers.separable_conv1d生成了带span信息的矩阵key_conv_attn_layer,为什么还需要点乘query_layer呢?对应于conv_attn_layer = tf.multiply(key_conv_attn_layer, query_layer)。感觉此处点乘不是很有必要

用自己的数据预训练 各种nah loss 问题

您好,感谢您的开源。我用自己的数据进行预训练 默认的2e-4 lr 的base 模型 一开始训练就nah loss. 换成 medium-small 模型 使用 2e-4 2e-5 均存在 训练大概几千步nah loss 退出训练的问题 想请教下解决办法

预测性能

hi,
请问有预测性能数据吗?
比如, bert_base, bert_tiny, conv_bert, conv_bert_small

关于预训练的问题

我想请问一下,在实际预训练过程中,如何去判断训练多少步是足够的,另外训练过程中loss大概是多少,我目前在9-11左右一直在徘徊,是不是有问题?

请问你提供的预训练模型是中文预训练模型 还是英文 是基于什么进行训练的 细节可以稍微介绍下吗

我从readme 里下载了你的预训练模型 convbert_base convbert_medium convbert_small. 这三个模型 文件夹里没有词表, 我根据你项目中的词表 vocab.txt (30522维度) 我理解你这是英文的预训练模型,请问我理解的对吗(我是根据electra 来看 英文模型词表是30522 中文预训练模型 词表是 21128)。谢谢回答

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.