huggingface / transformers Goto Github PK
View Code? Open in Web Editor NEWπ€ Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Home Page: https://huggingface.co/transformers
License: Apache License 2.0
π€ Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Home Page: https://huggingface.co/transformers
License: Apache License 2.0
Sorry to bother you
I recently have used your extract_features.py to extract features of some data set but failed. The error information is as follows:
/opt/conda/conda-bld/pytorch_1532584813488/work/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [11,0,0], thread: [95,0,0] Assertion
srcIndex < srcSelectDimSizefailed. Traceback (most recent call last): File "examples/extract_features.py", line 405, in <module> main() File "examples/extract_features.py", line 375, in main all_encoder_layers, _ = model(input_ids, token_type_ids=None, attention_mask=input_mask) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 610, in forward output_all_encoded_layers=output_all_encoded_layers) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 328, in forward hidden_states = layer_module(hidden_states, attention_mask) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 313, in forward attention_output = self.attention(hidden_states, attention_mask) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 273, in forward self_output = self.self(input_tensor, attention_mask) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/pytorch_pretrained_bert/modeling.py", line 224, in forward mixed_query_layer = self.query(hidden_states) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 55, in forward return F.linear(input, self.weight, self.bias) File "/home/jiaofangkai/anaconda3/envs/allennlp-env/lib/python3.7/site-packages/torch/nn/functional.py", line 1026, in linear output = input.matmul(weight.t()) RuntimeError: cublas runtime error : resource allocation failed at /opt/conda/conda-bld/pytorch_1532584813488/work/aten/src/THC/THCGeneral.cpp:333
It seems that the index_select function in the models crashed. I read my own data from json files and construct examples from them. I set the batch-size equals 1 and I modified the max_seq_length to the max_length of the input sentences.
Thanks for your help!
Hi,
I have a question about Multi-GPU vs Distributed training, probably unrelated to BERT itself.
I have a 4-GPU server, and was trying to run run_classifier.py
in two ways:
(a) run single-node distributed training with 4 processes and minibatch of 32 each
(b) run Multi-GPU training with minibatch of 128, and all other hyperparams keep the same
Intuitively I believe a and b should yield the closed accuracy and training times. Below please find my observations:
The first looks like reasonable since I guess the loss.mean() is done by CPU which may be slower than using NCCL directly? However, I don't quite understand the second observation. Can you please give any hint or reference about the possible cause?
Thanks!
When describing how you reproduced the MRPC results, you say:
"Our test ran on a few seeds with the original implementation hyper-parameters gave evaluation results between 82 and 87."
and you link to the SQuAD hyperparameters (https://github.com/google-research/bert#squad).
Is the link a mistake? Or did you use the SQuAD hyperparameters for tuning on MRPC? More generally, I'm wondering if there's a reason the MRPC dev set accuracy is slightly lower (in [82, 87] vs. [84, 88] reported by Google)
BERTConfig is not used for BERTIntermediate
's activation function. intermediate_act_fn
is always gelu
. Is this normal?
https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/modeling.py#L240
When running the following command for tuning on squad, I am getting a petty error inside logger TypeError: object of type 'NoneType' has no len()
. Any thoughts what could be the main cause of the problem?
Full log:
python3.6 examples/run_squad.py \
> --bert_model bert-base-uncased \
> --do_train \
> --do_predict \
> --train_file $SQUAD_DIR/train-v1.1.json \
> --predict_file $SQUAD_DIR/dev-v1.1.json \
> --train_batch_size 12 \
> --learning_rate 3e-5 \
> --num_train_epochs 2.0 \
> --max_seq_length 384 \
> --doc_stride 128 \
> --output_dir out
.
.
.
11/29/2018 23:10:14 - INFO - __main__ - input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/29/2018 23:10:14 - INFO - __main__ - segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/29/2018 23:10:14 - INFO - __main__ - start_position: 47
11/29/2018 23:10:14 - INFO - __main__ - end_position: 48
11/29/2018 23:10:14 - INFO - __main__ - answer: the 1870s
11/29/2018 23:14:38 - INFO - __main__ - Saving train features into cached file /shared/shelley/khashab2/pytorch-pretrained-BERT/squad/train-v1.1.json_bert-base-uncased_384_128_64
11/29/2018 23:14:51 - INFO - __main__ - ***** Running training *****
11/29/2018 23:14:51 - INFO - __main__ - Num orig examples = 87599
Traceback (most recent call last):
File "examples/run_squad.py", line 989, in <module>
main()
File "examples/run_squad.py", line 884, in main
logger.info(" Num split examples = %d", len(train_features))
TypeError: object of type 'NoneType' has no len()
Traceback (most recent call last): | 1/87970 [00:00<8:35:35, 2.84it/s]
File "./run_squad.py", line 990, in
main()
File "./run_squad.py", line 922, in main
is_nan = set_optimizer_params_grad(param_optimizer, model.named_parameters(), test_nan=True)
File "./run_squad.py", line 691, in set_optimizer_params_grad
if test_nan and torch.isnan(param_model.grad).sum() > 0:
File "/people/sanjay/anaconda2/envs/bert_pytorch/lib/python3.5/site-packages/torch/functional.py", line 289, in isnan
raise ValueError("The argument is not a tensor", str(tensor))
ValueError: ('The argument is not a tensor', 'None')
Command:
CUDA_VISIBLE_DEVICES=0 python ./run_squad.py
--vocab_file bert_large/uncased_L-24_H-1024_A-16/vocab.txt
--bert_config_file bert_large/uncased_L-24_H-1024_A-16/bert_config.json
--init_checkpoint bert_large/uncased_L-24_H-1024_A-16/pytorch_model.bin
--do_lower_case
--do_train
--do_predict
--train_file squad_dir/train-v1.1.json
--predict_file squad_dir/dev-v1.1.json
--learning_rate 3e-5
--num_train_epochs 2
--max_seq_length 384
--doc_stride 128
--output_dir outputs
--train_batch_size 4
--gradient_accumulation_steps 2
--optimize_on_cpu
Error while using --optimize_on_cpu only.
Works fine without the argument.
GPU: Nvidia GTX 1080Ti Single GPU.
PS: I can only fit in train_batch_size 4 on the memory of a single GPU.
If I am running only evaluation and not training, there are errors as tr_loss and nb_tr_steps are undefined.
i download the model from bert, it only has model.ckpt.dataοΌmodel.ckpt.meta and model.ckpt.index, i donnot which to load, what is checkpoint file for convert.py?
I think I spotted a typo in the README file under the Usage header. There is a piece of code that uses BertTokenizer
and the typo is on this line:
tokenized_text = "Who was Jim Henson ? Jim Henson was a puppeteer"
I think tokenized_text
should be replaced with text
, since the next line is
tokenized_text = tokenizer.tokenize(text)
Hi!
In the config
definition https://github.com/huggingface/pytorch-pretrained-BERT/blob/21f0196412115876da1c38652d22d1f7a14b36ff/pytorch_pretrained_bert/modeling.py#L848
in the Example usage of BertForSequenceClassification
in modeling.py
, there's things I don't understand:
vocab_size
in not an acceptable parameter name, by looking at the BertConfig
class definition https://github.com/huggingface/pytorch-pretrained-BERT/blob/21f0196412115876da1c38652d22d1f7a14b36ff/pytorch_pretrained_bert/modeling.py#L70
even by changing vocab_size
into vocab_size_or_config_json_file
, for the choice of the other params given in the example i.e.
vocab_size=32000, hidden_size=512, num_hidden_layers=8, num_attention_heads=6, intermediate_size=1024
I get:
ValueError: The hidden size (512) is not a multiple of the number of attention heads (6)
I think that something similar may be true for the other classes as well, BertForQuestionAnswering
, BertForNextSentencePrediction
, etc.
Am I missing something?
Hi, firstly, admire u for the great job. but I encounter 2 problems when i use it:
1. UnicodeDecodeError: 'gbk' codec can't decode byte 0x85 in position 4527: illegal multibyte sequence
,
same problem as ISSUE 52 when I excute the BertTokenizer.from_pretrained('bert-base-uncased')
, but I successfully excute BertForNextSentencePrediction.from_pretrained('bert-base-uncased')
, >.<
2. in the pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py,
line 761 --> ```
token_type_ids
: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token
types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to
a `sentence B` token (see BERT paper for more details).
but in the following example, in **line 784**--> `token_type_ids = torch.LongTensor([[0, 0, 1], [0, **2**, 0]])`, why the '2' appears? I am confused. Otherwise, is the situation similar to '0, 1, 0 ' correct ? Or it should be similar to [000000111111] , that is continuous '0' and continuous '1' ?
ty.
With this code, all parameters are decayed because the condition "parameter_name in no_decay" will never be satisfied.
I've made a PR #32 to fix it.
just want to study codes, don't need to have same pre-train performance.
I have fine-tuned the TF model on SQuAD v1 and I've made the weights available at: https://s3.eu-west-2.amazonaws.com/nlpfiles/squad_bert_base.tgz
I get 88.5 FM using these weights on SQuAD dev. (If I recall correctly I get roughly 82 EM).
I think it may be beneficial to have these weights here, so that people could play with SQuAD and BERT without the need of fine-tuning, which requires a decent enough setup. Let me know what you think!
attributeError: 'BertForPreTraining' object has no attribute 'global_step'
Hi,
I am trying to understand the bert_model
arg in run_classify.py
. In the file, I can see
tokenizer = BertTokenizer.from_pretrained(args.bert_model)
where bert_model
is expected to be the vocab text file of the model
However, I also see
model = BertForSequenceClassification.from_pretrained(args.bert_model, len(label_list))
where bert_model
is expected to be a archive file containing the model checkpoint and config.
Please help to advice the correct use of bert_model
if I have my pretrained model converted locally already.
Thanks!
The function convert_to_unicode is not in tokenization.py but used to be there in v0.1.2. When fine tuning with run_classifier.py, you get an ImportError: cannot import name 'convert_to_unicode'.
convert samples to features, is very slow
Hello, the BertTokenizer seems loose accents when convert_ids_to_tokens() is used :
Example:
Here the problem is in "cafe" that loses its accent. I'm using BertTokenizer.from_pretrained('Bert-base-multilingual') as the tokenizer, I also tried with "Bert-base-uncased" and experienced the same issue.
Thanks for this great work!
Can you push the pytorch code for the pre-training process,such as MLM task, please?
I really want to study, but I can't understand tensorflow, it's so complex.
thanks!!!
RuntimeError: Error(s) in loading state_dict for BertModel:
size mismatch for embeddings.token_type_embeddings.weight: copying a param of torch.Size([16, 768]) from checkpoint, where the shape is torch.Size([2, 768]) in current model.
Hi, I have a question in terms of using BERT for sequential labeling task.
Please correct me if I'm wrong.
My understanding is:
Is this entire process correct? I followed this procedure but could not have any results.
Thank you!
There is a bug in README.md about Command-line interface:
export BERT_BASE_DIR=chinese_L-12_H-768_A-12
Wrong:
pytorch_pretrained_bert convert_tf_checkpoint_to_pytorch \
--tf_checkpoint_path $BERT_BASE_DIR/bert_model.ckpt.index \
--bert_config_file $BERT_BASE_DIR/bert_config.json \
--pytorch_dump_path $BERT_BASE_DIR/pytorch_model.bin
Right:
pytorch_pretrained_bert convert_tf_checkpoint_to_pytorch \
$BERT_BASE_DIR/bert_model.ckpt.index \
$BERT_BASE_DIR/bert_config.json \
$BERT_BASE_DIR/pytorch_model.bin
I have a reasonable truncated normal approximation. (Actually that is what tf does).
https://discuss.pytorch.org/t/implementing-truncated-normal-initializer/4778/16?u=ruotianluo
Installed pytorch-pretrained-BERT from source, Python 3.7, Windows 10
When I run the following snippet:
import torch
from pytorch_pretrained_bert import BertTokenizer, BertModel, BertForMaskedLM
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
I get the following:
UnicodeDecodeError Traceback (most recent call last)
in ()
3
4 # Load pre-trained model tokenizer (vocabulary)
----> 5 tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
~\Anaconda3\lib\site-packages\pytorch_pretrained_bert\tokenization.py in from_pretrained(cls, pretrained_model_name, do_lower_case)
139 vocab_file, resolved_vocab_file))
140 # Instantiate tokenizer.
--> 141 tokenizer = cls(resolved_vocab_file, do_lower_case)
142 except FileNotFoundError:
143 logger.error(
~\Anaconda3\lib\site-packages\pytorch_pretrained_bert\tokenization.py in init(self, vocab_file, do_lower_case)
93 "Can't find a vocabulary file at path '{}'. To load the vocabulary from a Google pretrained "
94 "model use tokenizer = BertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)
".format(vocab_file))
---> 95 self.vocab = load_vocab(vocab_file)
96 self.ids_to_tokens = collections.OrderedDict(
97 [(ids, tok) for tok, ids in self.vocab.items()])
~\Anaconda3\lib\site-packages\pytorch_pretrained_bert\tokenization.py in load_vocab(vocab_file)
68 with open(vocab_file, "r", encoding="utf8") as reader:
69 while True:
---> 70 token = convert_to_unicode(reader.readline())
71 if not token:
72 break
~\Anaconda3\lib\encodings\cp1252.py in decode(self, input, final)
21 class IncrementalDecoder(codecs.IncrementalDecoder):
22 def decode(self, input, final=False):
---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0]
24
25 class StreamWriter(Codec,codecs.StreamWriter):
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 3920: character maps to
Hi,
recently, I am researching about Keyphrase generation. Usually, people use seq2seq with attention model to deal with such problem. Specifically I use the framework: https://github.com/memray/seq2seq-keyphrase-pytorch, which is implementation of http://memray.me/uploads/acl17-keyphrase-generation.pdf .
Now I just change its encoder part to BERT, but the result is not good. The experiment comparison of two models is in the attachment.
Can you give me some advice if what I did is reasonable and if BERT is suitable for doing such a thing?
I'm pretty sure this comment:
should instead say:
# Sizes are [batch_size, 1, 1, to_seq_length]
# So we can broadcast to [batch_size, num_heads, from_seq_length, to_seq_length]
When masking out tokens for attention, it doesn't matter what happens to attention from padding tokens, only that there is no attention to padding tokens.
I don't believe the code is doing what the comment currently suggests because that would be an implementation flaw.
Thanks a lot for the port! I have some minor questions, for the run_squad file, I see two options for accumulating gradients, accumulate_gradients and gradient_accumulation_steps but it seems to me that it can be combined into one. The other one is for the global_step variable, seems we are only counting but not using this variable in gradient accumulating. Thanks again!
There is an option save_checkpoints_steps
that seems to control checkpointing. However, there is no actual saving operation in the run_*
scripts. So, should we add that functionality or remove this argument?
Hi, guys, I try the run_squad
example with
Traceback (most recent call last): | 0/7331 [00:00<?, ?it/s]
File "examples/run_squad.py", line 973, in <module>
main()
File "examples/run_squad.py", line 904, in main
param.grad.data = param.grad.data / args.loss_scale
AttributeError: 'NoneType' object has no attribute 'data'
I find one of the param.grads is None, so the param.grad.data doesn't exist.
by the way I down load the data by myself from the urls in this prject. my os is ubuntu 18.04, pytorch 0.41 gpu 1080t
anyone else encounters this situation?
wanna help, please, thx in advance...
Is there a way to use any of the provided pre-trained models in the repository for machine translation task?
Thanks
Hi, I tried running the Squad model this morning (on a single GPU with gradient accumulation over 3 steps) but after 3 hours of training, my job failed with the following output:
I was running the code, unmodified, from commit 3bfbc21
Is this an issue you know about?
11/08/2018 17:50:03 - INFO - __main__ - device cuda n_gpu 1 distributed training False
11/08/2018 17:50:18 - INFO - __main__ - *** Example ***
11/08/2018 17:50:18 - INFO - __main__ - unique_id: 1000000000
11/08/2018 17:50:18 - INFO - __main__ - example_index: 0
11/08/2018 17:50:18 - INFO - __main__ - doc_span_index: 0
11/08/2018 17:50:18 - INFO - __main__ - tokens: [CLS] to whom did the virgin mary allegedly appear in 1858 in lou ##rdes france ? [SEP] architectural ##ly , the school has a catholic character . atop the main building ' s gold dome is a golden statue of the virgin mary . immediately in front of the main building and facing it , is a copper statue of christ with arms up ##rai ##sed with the legend " ve ##ni ##te ad me om ##nes " . next to the main building is the basilica of the sacred heart . immediately behind the basilica is the gr ##otto , a marian place of prayer and reflection . it is a replica of the gr ##otto at lou ##rdes , france where the virgin mary reputed ##ly appeared to saint bern ##ade ##tte so ##ub ##iro ##us in 1858 . at the end of the main drive ( and in a direct line that connects through 3 statues and the gold dome ) , is a simple , modern stone statue of mary . [SEP]
11/08/2018 17:50:18 - INFO - __main__ - token_to_orig_map: 17:0 18:0 19:0 20:1 21:2 22:3 23:4 24:5 25:6 26:6 27:7 28:8 29:9 30:10 31:10 32:10 33:11 34:12 35:13 36:14 37:15 38:16 39:17 40:18 41:19 42:20 43:20 44:21 45:22 46:23 47:24 48:25 49:26 50:27 51:28 52:29 53:30 54:30 55:31 56:32 57:33 58:34 59:35 60:36 61:37 62:38 63:39 64:39 65:39 66:40 67:41 68:42 69:43 70:43 71:43 72:43 73:44 74:45 75:46 76:46 77:46 78:46 79:47 80:48 81:49 82:50 83:51 84:52 85:53 86:54 87:55 88:56 89:57 90:58 91:58 92:59 93:60 94:61 95:62 96:63 97:64 98:65 99:65 100:65 101:66 102:67 103:68 104:69 105:70 106:71 107:72 108:72 109:73 110:74 111:75 112:76 113:77 114:78 115:79 116:79 117:80 118:81 119:81 120:81 121:82 122:83 123:84 124:85 125:86 126:87 127:87 128:88 129:89 130:90 131:91 132:91 133:91 134:92 135:92 136:92 137:92 138:93 139:94 140:94 141:95 142:96 143:97 144:98 145:99 146:100 147:101 148:102 149:102 150:103 151:104 152:105 153:106 154:107 155:108 156:109 157:110 158:111 159:112 160:113 161:114 162:115 163:115 164:115 165:116 166:117 167:118 168:118 169:119 170:120 171:121 172:122 173:123 174:123
11/08/2018 17:50:18 - INFO - __main__ - token_is_max_context: 17:True 18:True 19:True 20:True 21:True 22:True 23:True 24:True 25:True 26:True 27:True 28:True 29:True 30:True 31:True 32:True 33:True 34:True 35:True 36:True 37:True 38:True 39:True 40:True 41:True 42:True 43:True 44:True 45:True 46:True 47:True 48:True 49:True 50:True 51:True 52:True 53:True 54:True 55:True 56:True 57:True 58:True 59:True 60:True 61:True 62:True 63:True 64:True 65:True 66:True 67:True 68:True 69:True 70:True 71:True 72:True 73:True 74:True 75:True 76:True 77:True 78:True 79:True 80:True 81:True 82:True 83:True 84:True 85:True 86:True 87:True 88:True 89:True 90:True 91:True 92:True 93:True 94:True 95:True 96:True 97:True 98:True 99:True 100:True 101:True 102:True 103:True 104:True 105:True 106:True 107:True 108:True 109:True 110:True 111:True 112:True 113:True 114:True 115:True 116:True 117:True 118:True 119:True 120:True 121:True 122:True 123:True 124:True 125:True 126:True 127:True 128:True 129:True 130:True 131:True 132:True 133:True 134:True 135:True 136:True 137:True 138:True 139:True 140:True 141:True 142:True 143:True 144:True 145:True 146:True 147:True 148:True 149:True 150:True 151:True 152:True 153:True 154:True 155:True 156:True 157:True 158:True 159:True 160:True 161:True 162:True 163:True 164:True 165:True 166:True 167:True 168:True 169:True 170:True 171:True 172:True 173:True 174:True
11/08/2018 17:50:18 - INFO - __main__ - input_ids: 101 2000 3183 2106 1996 6261 2984 9382 3711 1999 8517 1999 10223 26371 2605 1029 102 6549 2135 1010 1996 2082 2038 1037 3234 2839 1012 10234 1996 2364 2311 1005 1055 2751 8514 2003 1037 3585 6231 1997 1996 6261 2984 1012 3202 1999 2392 1997 1996 2364 2311 1998 5307 2009 1010 2003 1037 6967 6231 1997 4828 2007 2608 2039 14995 6924 2007 1996 5722 1000 2310 3490 2618 4748 2033 18168 5267 1000 1012 2279 2000 1996 2364 2311 2003 1996 13546 1997 1996 6730 2540 1012 3202 2369 1996 13546 2003 1996 24665 23052 1010 1037 14042 2173 1997 7083 1998 9185 1012 2009 2003 1037 15059 1997 1996 24665 23052 2012 10223 26371 1010 2605 2073 1996 6261 2984 22353 2135 2596 2000 3002 16595 9648 4674 2061 12083 9711 2271 1999 8517 1012 2012 1996 2203 1997 1996 2364 3298 1006 1998 1999 1037 3622 2240 2008 8539 2083 1017 11342 1998 1996 2751 8514 1007 1010 2003 1037 3722 1010 2715 2962 6231 1997 2984 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/08/2018 17:50:18 - INFO - __main__ - input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... [truncated] ...
Iteration: 100%|ββββββββββ| 29314/29324 [3:27:55<00:04, 2.36it/s]οΏ½[A
Iteration: 100%|ββββββββββ| 29315/29324 [3:27:55<00:03, 2.44it/s]οΏ½[A
Iteration: 100%|ββββββββββ| 29316/29324 [3:27:56<00:03, 2.26it/s]οΏ½[A
Iteration: 100%|ββββββββββ| 29317/29324 [3:27:56<00:02, 2.35it/s]οΏ½[A
Iteration: 100%|ββββββββββ| 29318/29324 [3:27:56<00:02, 2.44it/s]οΏ½[A
Iteration: 100%|ββββββββββ| 29319/29324 [3:27:57<00:02, 2.25it/s]οΏ½[A
Iteration: 100%|ββββββββββ| 29320/29324 [3:27:57<00:01, 2.35it/s]οΏ½[A
Iteration: 100%|ββββββββββ| 29321/29324 [3:27:58<00:01, 2.41it/s]οΏ½[A
Iteration: 100%|ββββββββββ| 29322/29324 [3:27:58<00:00, 2.25it/s]οΏ½[A
Iteration: 100%|ββββββββββ| 29323/29324 [3:27:59<00:00, 2.36it/s]οΏ½[ATraceback (most recent call last):
File "code/run_squad.py", line 929, in <module>
main()
File "code/run_squad.py", line 862, in main
loss = model(input_ids, segment_ids, input_mask, start_positions, end_positions)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "/0x0d4ff90d01fa4168983197b17d73bb0c_dependencies/code/modeling.py", line 467, in forward
start_loss = loss_fct(start_logits, start_positions)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py", line 862, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1550, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 1403, in nll_loss
if input.size(0) != target.size(0):
RuntimeError: dimension specified as 0 but tensor has no dimensions
Exception ignored in: <bound method tqdm.__del__ of Iteration: 100%|ββββββββββ| 29323/29324 [3:27:59<00:00, 2.36it/s]>
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tqdm/_tqdm.py", line 931, in __del__
self.close()
File "/usr/local/lib/python3.6/dist-packages/tqdm/_tqdm.py", line 1133, in close
self._decr_instances(self)
File "/usr/local/lib/python3.6/dist-packages/tqdm/_tqdm.py", line 496, in _decr_instances
cls.monitor.exit()
File "/usr/local/lib/python3.6/dist-packages/tqdm/_monitor.py", line 52, in exit
self.join()
File "/usr/lib/python3.6/threading.py", line 1053, in join
raise RuntimeError("cannot join current thread")
RuntimeError: cannot join current thread
I have downloaded the model and vocab files into a specific location, using their original file names, so my directory for bert-base-cased contains:
bert-base-cased-vocab.txt
bert_config.json
pytorch_model.bin
But when I try to specify the directory which contains these files for the --bert_model
parameter of extract_features.py
I get the following error:
ValueError: Can't find a vocabulary file at path <THEDIRECTORYPATHISPECIFIED> ...
When I specify a file that exists and is a proper file, the error messages seem to indicate that the program wants to untar and uncompress the files.
Is there no way to just specify a specific directory that contains the vocab, config, and model files?
I was trying to use BERT as a language model to assign a score(could be PPL score) of a given sentence. Something like
P("He is go to school")=0.008
P("He is going to school")=0.08
Which is indicating that the probability of second sentence is higher than first sentence. Is there a way to get a score like this?
Thanks
Is there a plan to have an FP16 for GPU so to have a larger batch size or longer text documents support?
foo@bar:~/foo/bar/pytorch-pretrained-BERT$ pytest -sv ./tests/
===================================================================================================================== test session starts =====================================================================================================================
platform linux -- Python 3.6.6, pytest-3.9.1, py-1.7.0, pluggy-0.8.0 -- /home/foo/.pyenv/versions/anaconda3-5.1.0/bin/python
cachedir: .pytest_cache
rootdir: /data1/users/foo/bar/pytorch-pretrained-BERT, inifile:
plugins: remotedata-0.3.0, openfiles-0.3.0, doctestplus-0.1.3, cov-2.6.0, arraydiff-0.2, flaky-3.4.0
collected 0 items / 3 errors
=========================================================================================================================== ERRORS ============================================================================================================================
___________________________________________________________________________________________________________ ERROR collecting tests/modeling_test.py ___________________________________________________________________________________________________________
ImportError while importing test module '/data1/users/foo/bar/pytorch-pretrained-BERT/tests/modeling_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/modeling_test.py:25: in <module>
import modeling
E ModuleNotFoundError: No module named 'modeling'
_________________________________________________________________________________________________________ ERROR collecting tests/optimization_test.py _________________________________________________________________________________________________________
ImportError while importing test module '/data1/users/foo/bar/pytorch-pretrained-BERT/tests/optimization_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/optimization_test.py:23: in <module>
import optimization
E ModuleNotFoundError: No module named 'optimization'
_________________________________________________________________________________________________________ ERROR collecting tests/tokenization_test.py _________________________________________________________________________________________________________
ImportError while importing test module '/data1/users/foo/bar/pytorch-pretrained-BERT/tests/tokenization_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/tokenization_test.py:22: in <module>
import tokenization
E ModuleNotFoundError: No module named 'tokenization'
===Flaky Test Report===
===End Flaky Test Report===
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 3 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=================================================================================================================== 3 error in 0.60 seconds ==================================================================================================================
In python 3, python -m pytest -sv tests/
works fine.
if I convert code to python2 version of code, it can't converage ; Would you present py2 code?
I was wondering if there's a proper way of detokenizing the output tokens, i.e., constructing the sentence back from the tokens? Considering the fact that the word-piece tokenisation introduces lots of #
s.
Dear authors,
I have two questions.
First, how can I use multilingual pre-trained BERT in pytorch?
Is it all download model to $BERT_BASE_DIR?
Second is tokenization issue.
For Chinese and Japanese, tokenizer may works, however, for Korean, it shows different result that I expected
import torch
from pytorch_pretrained_bert import BertTokenizer, BertModel, BertForMaskedLM
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
text = "μλ
νμΈμ"
tokenized_text = tokenizer.tokenize(text)
print(tokenized_text)
` ['α', '##α ‘', '##α«', '##α', '##α §', '##αΌ', '##α', '##α ‘', '##α', '##α ¦', '##α', '##α ']
The result is based on not 'character' but 'byte-based character'
May it comes from unicode issue. (I expect ['μλ
', '##νμΈμ'])
Can you make up a working example for 'is next sentence'
Is this expected to work properly ?
# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# Tokenized input
text = "Who was Jim Morrison ? Jim Morrison was a puppeteer"
tokenized_text = tokenizer.tokenize(text)
# Convert token to vocabulary indices
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
# Define sentence A and B indices associated to 1st and 2nd sentences (see paper)
segments_ids = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
# Convert inputs to PyTorch tensors
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
# Load pre-trained model (weights)
model = BertForNextSentencePrediction.from_pretrained('bert-base-uncased')
model.eval()
# Predict is Next Sentence ?
predictions = model(tokens_tensor, segments_tensors)
Hi there,
Thanks for releasing this awesome repo, it does lots people like me a great favor.
So far I've tried sentence-pair BertForSequenceClassification task, and it indeed work. I'd like to know if it is possible to use BertForSequenceClassification to model triple sentences classification problem and its input can be described as below:
**[CLS]A[SEP]B[SEP]C[SEP]**
Expecting for your reply!
Thanks & Regards
Hi,
I launched two processes per node to run distributed run_classifier.py. However, I am occasionally get below error:
11/20/2018 09:31:48 - INFO - pytorch_pretrained_bert.file_utils - copying /tmp/tmpa25_y4es to cache at /root/.pytorch_pretrained_bert/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba
93%|ββββββββββ| 381028352/407873900 [00:11<00:01, 14366075.22B/s]
94%|ββββββββββ| 383812608/407873900 [00:11<00:01, 16210783.00B/s]
95%|ββββββββββ| 386455552/407873900 [00:11<00:01, 16205260.89B/s]11/20/2018 09:31:49 - INFO - pytorch_pretrained_bert.file_utils - creating metadata file for /root/.pytorch_pretrained_bert/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba
11/20/2018 09:31:49 - INFO - pytorch_pretrained_bert.file_utils - removing temp file /tmp/tmpa25_y4es
95%|ββββββββββ| 388946944/407873900 [00:11<00:01, 18097539.03B/s]11/20/2018 09:31:49 - INFO - pytorch_pretrained_bert.modeling - loading archive file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz from cache at /root/.pytorch_pretrained_bert/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba
11/20/2018 09:31:49 - INFO - pytorch_pretrained_bert.modeling - extracting archive file /root/.pytorch_pretrained_bert/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba to temp dir /tmp/tmpvxvnr8_1
97%|ββββββββββ| 393660416/407873900 [00:11<00:00, 22199883.93B/s]
98%|ββββββββββ| 399411200/407873900 [00:11<00:00, 27211860.00B/s]
99%|ββββββββββ| 405128192/407873900 [00:11<00:00, 32287252.94B/s]
100%|ββββββββββ| 407873900/407873900 [00:11<00:00, 34098120.40B/s]
11/20/2018 09:31:49 - INFO - pytorch_pretrained_bert.file_utils - copying /tmp/tmp5fcm4v8x to cache at /root/.pytorch_pretrained_bert/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba
Traceback (most recent call last):
File "examples/run_classifier.py", line 629, in <module>
main()
File "examples/run_classifier.py", line 485, in main
model = BertForSequenceClassification.from_pretrained(args.bert_model, len(label_list))
File "/azureml-envs/azureml_49b6ba977c83839baa597001c9b55a6f/lib/python3.6/site-packages/pytorch_pretrained_bert-0.1.2-py3.6.egg/pytorch_pretrained_bert/modeling.py", line 495, in from_pretrained
archive.extractall(tempdir)
File "/azureml-envs/azureml_49b6ba977c83839baa597001c9b55a6f/lib/python3.6/tarfile.py", line 2007, in extractall
numeric_owner=numeric_owner)
File "/azureml-envs/azureml_49b6ba977c83839baa597001c9b55a6f/lib/python3.6/tarfile.py", line 2049, in extract
numeric_owner=numeric_owner)
File "/azureml-envs/azureml_49b6ba977c83839baa597001c9b55a6f/lib/python3.6/tarfile.py", line 2119, in _extract_member
self.makefile(tarinfo, targetpath)
File "/azureml-envs/azureml_49b6ba977c83839baa597001c9b55a6f/lib/python3.6/tarfile.py", line 2168, in makefile
copyfileobj(source, target, tarinfo.size, ReadError, bufsize)
File "/azureml-envs/azureml_49b6ba977c83839baa597001c9b55a6f/lib/python3.6/tarfile.py", line 248, in copyfileobj
buf = src.read(bufsize)
File "/azureml-envs/azureml_49b6ba977c83839baa597001c9b55a6f/lib/python3.6/gzip.py", line 276, in read
return self._buffer.read(size)
File "/azureml-envs/azureml_49b6ba977c83839baa597001c9b55a6f/lib/python3.6/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/azureml-envs/azureml_49b6ba977c83839baa597001c9b55a6f/lib/python3.6/gzip.py", line 482, in read
raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached
It looks like a race-condition that two processes are simultaneously writing model file to /root/.pytorch_pretrained_bert/
.
Please help to advice any workaround. Thanks!
Hi, I am running the same task with the same hyper parameters as the official Google Tensorflow implementation of BERT, however, I am getting around 1.5% lower accuracy. Can you please give any hint about the possible cause?
ThanksοΌ
After I convert the TF model to pytorch model, I run a classification task on a new Chinese dataset, but get this:
CUDA_VISIBLE_DEVICES=3 python run_classifier.py --task_name weibo --do_eval --do_train --bert_model chinese_L-12_H-768_A-12 --max_seq_length 128 --train_batch_size 32 --learning_rate 2e-5 --num_train_epochs 3.0 --output_dir bert_result
11/18/2018 21:56:59 - INFO - main - device cuda n_gpu 1 distributed training False
11/18/2018 21:56:59 - INFO - pytorch_pretrained_bert.tokenization - loading vocabulary file chinese_L-12_H-768_A-12
Traceback (most recent call last):
File "run_classifier.py", line 661, in
main()
File "run_classifier.py", line 508, in main
tokenizer = BertTokenizer.from_pretrained(args.bert_model)
File "/home/lin/jpmorgan/pytorch-pretrained-BERT/pytorch_pretrained_bert/tokenization.py", line 141, in from_pretrained
tokenizer = cls(resolved_vocab_file, do_lower_case)
File "/home/lin/jpmorgan/pytorch-pretrained-BERT/pytorch_pretrained_bert/tokenization.py", line 94, in init
"model use tokenizer = BertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)
".format(vocab_file))
ValueError: Can't find a vocabulary file at path 'chinese_L-12_H-768_A-12'. To load the vocabulary from a Google pretrained model use tokenizer = BertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)
Thank you so much for this well-documented and easy-to-understand implementation! I remember meeting you at WeCNLP and am so happy to see you push out usable implementations of the SOA in pytorch for the community!!!!!
I have a question: The convert_tokens_to_ids method in the BertTokenizer that provides input to the BertEncoder uses an OrderedDict for the vocab attribute, which throws an error (e.g. KeyError: 'ketorolac'
) for any words not in the vocab. Can I create another vocab object that adds unseen words and use that in the tokenizer? Does the pretrained BertEncoder depend on the default id mapping?
It seems to me that ideally in the long-term, this repo would incorporate character level embeddings to deal with unseen words, but idk if that is necessary for this use-case.
Recently the Google team added support for Squad 2.0:
Would be great to also have it available in the Pytorch version.
Thanks for the great code..However, the run_squad.py
for BERT Large seems to not have the vocab_file
and bert_config_file
(or other) options/arguments. Did you push the latest version?
Also, it is looking for a pytorch model file (a bin file). Does it need to be there?
I also had to add this line to the file to make BERT base to run on Squad 1.1:
parser.add_argument('--do_lower_case', action="store_true", default=True, help="Lowercase the input")
Got ValueError: Expected target size (1, 30522), got torch.Size([1, 11])
at line 744 of modeling.py
. I think the line should be changed to masked_lm_loss = loss_fct(prediction_scores.view([-1, self.config.vocab_size]), masked_lm_labels.view([-1]))
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.