yitu-opensource / convbert Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
Hi, I am using the ConvBertForTokenClassification
model in transformers and encountered the bug when passing only input_embeds
to forward()
.
The traceback says that at line 833 in modeling_convbert.py
if token_type_ids is None:
if hasattr(self.embeddings, "token_type_ids"):
buffered_token_type_ids = self.embeddings.token_type_ids[:, :seq_length]
The seq_length is unassigned.
I noticed just above this piece of code that in
elif input_ids is not None:
input_shape = input_ids.size()
batch_size, seq_length = input_shape
elif inputs_embeds is not None:
input_shape = inputs_embeds.size()[:-1]
seq_length is not assigned if the program enters elif inputs_embeds is not None
.
Not sure if it is the batch_size, seq_length = input_shape
missing for inputs_embeds
or I am not using the model correctly?
Any one know where to get them?
Thank you and thank you.
Hi,
here #16 (comment) you say
Our code is only tested on a single V100 GPU.
But in your Paper you write about BASE size ConvBERT models.
But BASE size models can not be trained (created) on a single GPU. From my experince you need 8 GPUs.
Could you please explain this? I would like to create a german BASE or maybe even LARGE new language model.
At #16 (comment) you say that Hugging Face might be an option for multi GPU training. From my experience they are good at downstream task training but not good at the initial language model creation.
I would be super happy about some help to create my new ConvBERT BASE or larger model in different languages.
Many thanks
Philip
请问一下,因为我看到论文中提到的FLOPs分别是26.5G和19.3G,请问这个实验数据是怎么得到的,因为我自己测试12层的medium-small模型encoder总体是在1GFLOPs左右。还有后面的推理速度是什么条件下测试出来的呢?
因为我这边得到的结果是推理速度慢于原始的self-attention,我猜想是因为里面虽然浮点计算操作少了,但是数据搬运的时间多了(reshape、transpose)。
如题
Hi,
many thanks for this nice new model type and your research.
We would like to train a ConvBERT but on GPU and not TPU.
Do you have any experiences or tips how to do this?
We have concerns regarding the differnt distribution strategies
between GPUs and TPUs.
Thanks
Philip
when will have Pytorch version?
Where is the chinese convbert model?
你好,我想请问下,在span light conv中,既然已经用tf.layers.separable_conv1d生成了带span信息的矩阵key_conv_attn_layer,为什么还需要点乘query_layer呢?对应于conv_attn_layer = tf.multiply(key_conv_attn_layer, query_layer)。感觉此处点乘不是很有必要
您好,感谢您的开源。我用自己的数据进行预训练 默认的2e-4 lr 的base 模型 一开始训练就nah loss. 换成 medium-small 模型 使用 2e-4 2e-5 均存在 训练大概几千步nah loss 退出训练的问题 想请教下解决办法
hi,
请问有预测性能数据吗?
比如, bert_base, bert_tiny, conv_bert, conv_bert_small
感谢分享这么好的项目!
请问有pytorch版本发布吗?
可以发布pytorch版本吗?
感谢
LSRA: Lite Transformer with Long-Short Range Attention.
LSRA also integrates convolution operations into transformer blocks. I'm just wondering what makes ConvBert differ from LSRA.
Can I use ConvBertModel as a decoder in autoregressive mode ?
我想请问一下,在实际预训练过程中,如何去判断训练多少步是足够的,另外训练过程中loss大概是多少,我目前在9-11左右一直在徘徊,是不是有问题?
我从readme 里下载了你的预训练模型 convbert_base convbert_medium convbert_small. 这三个模型 文件夹里没有词表, 我根据你项目中的词表 vocab.txt (30522维度) 我理解你这是英文的预训练模型,请问我理解的对吗(我是根据electra 来看 英文模型词表是30522 中文预训练模型 词表是 21128)。谢谢回答
在论文里没有看到训练使用的算力介绍
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.