Coder Social home page Coder Social logo

google-research / bert Goto Github PK

View Code? Open in Web Editor NEW
37.0K 1.0K 9.5K 317 KB

TensorFlow code and pre-trained models for BERT

Home Page: https://arxiv.org/abs/1810.04805

License: Apache License 2.0

Python 76.32% Jupyter Notebook 23.68%
nlp google natural-language-processing natural-language-understanding tensorflow

bert's Introduction

Google Research

This repository contains code released by Google Research.

All datasets in this repository are released under the CC BY 4.0 International license, which can be found here: https://creativecommons.org/licenses/by/4.0/legalcode. All source files in this repository are released under the Apache 2.0 license, the text of which can be found in the LICENSE file.


Because the repo is large, we recommend you download only the subdirectory of interest:

SUBDIR=foo
svn export https://github.com/google-research/google-research/trunk/$SUBDIR

If you'd like to submit a pull request, you'll need to clone the repository; we recommend making a shallow clone (without history).

git clone [email protected]:google-research/google-research.git --depth=1

Disclaimer: This is not an official Google product.

Updated in 2023.

bert's People

Contributors

0xflotus avatar abhishekraok avatar aijunbai avatar ammarasmro avatar bogdandidenko avatar cbockman avatar craigcitro avatar dalequark avatar eric-haibin-lin avatar georgefeng avatar hsm207 avatar imcaspar avatar iuliaturc-google avatar jacobdevlin-google avatar jasonjpu avatar msramalho avatar pengli09 avatar qwfy avatar rodgzilla avatar slavpetrov avatar soloice avatar stefan-it avatar tianxin1860 avatar ywkim avatar zhaoyongke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bert's Issues

What does the type token mean in modeling.py

In the file modeling.py, the BertModel class involves "embedding_postprocessor ", where there is type token used, is this the segment A and segmentB in next sentence prediction? If so, the token vocabulary ("type_vocab_size" ) size should be 2, is that right? THANK YOU.

run run_pretraining.py but use CPU instead of GPU

  1. when I use run_pretraining.py to pre-train my model, I have found it use all the memory of GPU, but use CPU to do a lot of computation.
    type the command "nvtop", the print like below:

Device 0 [GeForce GTX 1080 Ti] PCIe GEN 1@16x RX: 0.000 kB/s TX: 0.000 kB/s
GPU 139MHz MEM 405MHz TEMP 44°C FAN 35% POW 18 / 250 W
GPU-Util[ 0%] MEM-Util[||||||||||||11.2G/11.7G] Encoder[ 0%] Decoder[ 0%]

Device 1 [GeForce GTX 1080 Ti] PCIe GEN 1@ 4x RX: 0.000 kB/s TX: 0.000 kB/s
GPU 139MHz MEM 405MHz TEMP 42°C FAN 34% POW 17 / 250 W
GPU-Util[ 0%] MEM-Util[ 0.0G/11.7G] Encoder[ 0%] Decoder[ 0%]

PID USER GPU TYPE MEM Command
15353 feynman 0 Compute 11168Mo 95.3% python

we can see GPU-Util is 0%, and MEM-Util is nearly 100%

Problem in generating the pertained output like Elmo

I followed the example to generate pretrained features. However, unlike other examples, it cannot find the pretrained model, with the error message like this:
INFO:tensorflow:Could not find trained model in model_dir: /tmp/tmp6sn66z76, running initialization to predict.. This message print when running line for result in estimator.predict(input_fn, yield_single_examples=True):

However, after this, in the model_fn function, tf.train.init_from_checkpoint is called. So not sure if this actually a problem or is just a meaningless printing .

failed to squad on cased model

SQUAD_DIR=./data/squad_1_1/
BERT_BASE_DIR=./models/cased_L-12_H-768_A-12/
nohup python3 run_squad.py
--vocab_file=$BERT_BASE_DIR/vocab.txt
--bert_config_file=$BERT_BASE_DIR/bert_config.json
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt
--do_train=True
--train_file=$SQUAD_DIR/train-v1.1.json
--do_predict=True
--predict_file=$SQUAD_DIR/dev-v1.1.json
--train_batch_size=12
--learning_rate=3e-5
--num_train_epochs=2.0
--max_seq_length=384
--doc_stride=128
--output_dir=./output/ > output.txt &

INFO:tensorflow:start_position: 59
INFO:tensorflow:end_position: 63
INFO:tensorflow:answer: f ##eb ##ru ##ary 1848
INFO:tensorflow:***** Running training *****
INFO:tensorflow: Num orig examples = 87599
INFO:tensorflow: Num split examples = 88245
INFO:tensorflow: Batch size = 12
INFO:tensorflow: Num steps = 14599
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Running train on CPU
INFO:tensorflow:*** Features ***
INFO:tensorflow: name = end_positions, shape = (12,)
INFO:tensorflow: name = input_ids, shape = (12, 384)
INFO:tensorflow: name = input_mask, shape = (12, 384)
INFO:tensorflow: name = segment_ids, shape = (12, 384)
INFO:tensorflow: name = start_positions, shape = (12,)
INFO:tensorflow: name = unique_ids, shape = (12,)
Traceback (most recent call last):
File "run_squad.py", line 1170, in
tf.app.run()
File "/usr/local/python3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "run_squad.py", line 1104, in main
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
File "/usr/local/python3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 376, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/python3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1145, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/python3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1170, in _train_model_default
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/usr/local/python3/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2162, in _call_model_fn
features, labels, mode, config)
File "/usr/local/python3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1133, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/usr/local/python3/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2391, in _model_fn
features, labels, is_export_mode=is_export_mode)
File "/usr/local/python3/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1244, in call_without_tpu
return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
File "/usr/local/python3/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1505, in _call_model_fn
estimator_spec = self._model_fn(features=features, **kwargs)
File "run_squad.py", line 581, in model_fn
tvars, init_checkpoint)
File "/home/zzt/bert/modeling.py", line 331, in get_assigment_map_from_checkpoint
init_vars = tf.train.list_variables(init_checkpoint)
File "/usr/local/python3/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 94, in list_variables
reader = load_checkpoint(ckpt_dir_or_file)
File "/usr/local/python3/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 63, in load_checkpoint
return pywrap_tensorflow.NewCheckpointReader(filename)
File "/usr/local/python3/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 306, in NewCheckpointReader
return CheckpointReader(compat.as_bytes(filepattern), status)
File "/usr/local/python3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ./models/cased_L-12_H-768_A-12//bert_model.ckpt

TensorFlow Hub Module?

Thanks for releasing BERT!

I'm just wondering if BERT will be available on TensorFlow Hub like ELMO (for either fine-tuning or extracting features)?

how to generate vocab file that BERT model was trained on?

I was weird how to generate the vocab file when specified --vocab_file to create_pretraining_data.py?

I noticed the released BERT model indeed include the vocab file? so, how's you guy generate it via for instance, enlish Wikipedia dump file? as I am going to do the pre-training from scratch. Appreciate your help!

Thanks,
Allen Zhang

Missing requirements.txt

Hope to see this file added so we can tell what versions of tensorlfow and other libraries are supported.

Need clarification for pre-training

In the README.md, it says for the pre-training:

It is important that these be actual sentences 
for the "next sentence prediction" task

and the example sample_text.txt does have each line ends with either . or ;.

Whereas in the BERT paper, it says

... we sample two spans of text from the corpus, which we refer to as "sentences" 
even though they are typically much longer than single sentences 
(but can be shorter also)

So it becomes unclear whether this implementation does expect actual sentences per line or just documents be broken down into multiple lines arbitrarily.

Fine tuning Bert base/large on GPUs

Given the huge number of parameters in Bert, I wonder whether it is at all feasible to fine tune on GPUs without going to the google cloud TPU offers. Has there been any benchmarking on the current implementation? If yes, what types of GPUs are expected to work? to how many layers and attention heads?

Plans to support longer sequences?

Right now, the model (correct me if I'm wrong) appears to be locked down to sequences of max 512, based on running & playing with the code (and this makes sense in the context of the paper).

Are there any near-term plans to support longer sequences?

Offhand, this would potentially require multiple issues to be addressed, including 1) allowing positional embeddings that can extend for longer or perhaps arbitrary lengths (with some degradation over longer lengths than it has been trained on, of course) (possibly using something like multiple sinusoidal embeddings, like in the original transformer paper?) and 2) containing/limiting the Transformer quadratic memory explosion (my first gut would be to try something like the techniques in "Generating Wikipedia by Summarizing Long Sequences" https://arxiv.org/abs/1801.10198).

Right now--from first pass--it seems like the way to use this over longer sequences is to chunk the docs into sequences (either inline with fixed lengths, or possibly as pre-processing on boundaries like sentences or paragraphs) and apply BERT in a feature-input mode, and then feed into something else downstream (like universal transformer).

All of this seems doable, but is 1) more complicated from an engineering perspective and 2) loses the ability to fine-tune (at least in any way that is obvious to me).

(Of course, having a model adept to longer sequences like in https://arxiv.org/abs/1801.10198 has model power trade-offs, such that it is plausible that the feature-based approach could still plausibly be more superior?)

Why is there extra denser layer in pooler?

I'm referring to this line

In the paper, you state

In order to obtain a fixed-dimensional pooled representation of the input sequence, we take the final hidden state (i.e., the output of the Transformer) for the first token in the input, which by construction corresponds to the the special [CLS] word embedding. We denote this vector as C ∈ R^H. The only new parameters added during fine-tuning are for a classification layer W ∈ R^{K X H} , where K is the number of classifier labels.

But here, you have a H X H dense layer which is in contradiction to the above. Even more perplexing to me is that activation of this layer is tanh! I'm surprised all the models worked with tanh instead of rely activation.

I suspect that I'm missing something here. Thanks for your patience.

Clarification of document

In paper, it says

For Wikipedia we extract only the text passages and ignore lists, tables, and headers

I wonder if these text passages extracted from a document are separated as individual documents or they are recombined to be a document?

[Clarification] Feature vectors : Creating the input file

As I understand, we need to input to the script extract_features.py the dataset we will use for the model build on top of BERT embeddings. This allows the model to do supplementary training on data specific to the data set. 2 sentences are used (separated by '|||') in order to train the Next Sentence Prediction feature. Right ?


From the paper :

To generate each training input sequence, we sample two spans of text from the corpus, which we refer to as “sentences” even though they are typically much longer than single sentences (but can be shorter also)

If I want to create my input file from a dataset where data is documents, should I take the same approach (splitting in the middle, even if there is more than 2 sentences), or strictly split every sentence ? Which approach will give the best accuracy ?


For example, let's say I have this data row :

doc1 = "Sentence 1. Sentence 2. Sentence 3."
doc2 = "Sentence 4. Sentence 5."
label = X

Then should I split like this :

Sentence 1. Sentence 2. ||| Sentence 3.
Sentence 4. ||| Sentence 5.

Or like this :

Sentence 1. ||| Sentence 2.
Sentence 2. ||| Sentence 3.
Sentence 4. ||| Sentence 5.

Or any other way I didn't think of ?
(I should not link Sentence 3 and Sentence 4 together right ? As they are potentially not following each other.)


Thanks again for the brilliant work.

input_fn_builder() got an unexpected keyword argument 'features'

When training the model,
first error (missing argument outpit_file) was easy to solve (specified output_file = "output").
But afterwards, run_classifier.convert_examples_to_features() throws error about keyword 'features'.
No hint how to solve this!

Strack trace:

INFO:tensorflow:guid: train-5
INFO:tensorflow:tokens: [CLS] the stock rose $ 2 . 11 , or about 11 percent , to close friday at $ 21 . 51 on the new york stock exchange . [SEP] pg & e corp . shares jumped $ 1 . 63 or 8 percent to $ 21 . 03 on the new york stock exchange on friday . [SEP]
INFO:tensorflow:input_ids: 101 1996 4518 3123 1002 1016 1012 2340 1010 2030 2055 2340 3867 1010 2000 2485 5958 2012 1002 2538 1012 4868 2006 1996 2047 2259 4518 3863 1012 102 18720 1004 1041 13058 1012 6661 5598 1002 1015 1012 6191 2030 1022 3867 2000 1002 2538 1012 6021 2006 1996 2047 2259 4518 3863 2006 5958 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
INFO:tensorflow:label: 1 (id = 1)
***** Started training at 2018-11-05 07:44:50.022688 *****
  Num examples = 3668
  Batch size = 32
INFO:tensorflow:  Num steps = 343
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-c5a5fb94a015> in <module>()
     10     seq_length=MAX_SEQ_LENGTH,
     11     is_training=True,
---> 12     drop_remainder=True)
     13 estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
     14 print('***** Finished training at {} *****'.format(datetime.datetime.now()))

TypeError: input_fn_builder() got an unexpected keyword argument 'features'

Are linear decay, L2 normalization and learned positional embs essential to the performance?

Hello, I've been training my version of bert (i.e. not from this repo, but i think the main idea was implemented) on Chinese over a week, however the performance is not so promising. (the problem could be implementation, dataset, time of training or the language difference between English and Chinese) And as for the optimizer, i use the adam without linear decay and L2 normalization, and I use sinusoidal positional embeddings to reduce the number of variables, could you tell the importance of them? are they essential to the final performance? Any trick for transferring to other language? Thanks very much!

Colab notebook is out of sync with the latest update

Hi, I am trying to run the Colab notebook: https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_finetuning_with_cloud_tpus.ipynb

I was able to run it yesterday before the last git update. There were substantial changes to run_classifier.py code (eg. convert_examples_to_features function now requires "output_file" argument which is not there in the colab notebook, input_fn_builder function does not recognize 'features' argument anymore, but requires 'input_file' and so on).

Will the colab notebook be updated soon to reflect these changes? Thanks.

Adding domain specific vocabulary

Hi, thanks for the release !

I will need to add some domain specific vocabulary, do you have any suggestion on how to do it ?
I was thinking of replacing some [unused#] tokens in the vocab file (so if i'm not mistaken they already have existing weights in the checkpoint model files) to avoid extending the matrices, and then finetuning the LM with a domain specific corpus.
If it feasible I would also try to do a first LM finetuning pass with the existing vocab embeddings freezed, to only learn the new words, and then a second pass with everything unfreezed.

Do you think it's the right way to do it ?
How can I freeze a subset of the embeddings ? (Gradient masking ?)

There is a endless loop when max_seq_length=64.

Thanks for your hard work. There is a endless loop when max_seq_length=64.
your code:

max_tokens_for_doc = max_seq_length - len(query_tokens) - 3
...
while start_offset < len(all_doc_tokens):
length = len(all_doc_tokens) - start_offset
if length > max_tokens_for_doc:
length = max_tokens_for_doc
doc_spans.append(_DocSpan(start=start_offset, length=length))
if start_offset + length == len(all_doc_tokens):
break
start_offset += min(length, doc_stride)

I'm running fine-tuning on squad1.1. When "max_seq_length" is 64, "max_tokens_for_doc" is 0 in some training data, then the "length" is 0 and the "start_offset" is always 0. So the loop above is endless and my memory is growing until my program is killed.

Why Chinese vocab contains ##word?

In the chinese vocab, we see many word pieces contains ##. In my understanding, those ## words only exist if the words are rare. But after tokenization, the article is converted to a sequence of single character, in what case do we use/need those vocabs ##?

Training on data sets not in the discussed data sets

Would it be possible to supplant the microsoft research paraphrase data with data from an alternative source?

Take, for example, the following;

`export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
export GLUE_DIR=/path/to/glue

python run_classifier.py
--task_name=MRPC
--do_train=true
--do_eval=true
--data_dir=$GLUE_DIR/MRPC
--vocab_file=$BERT_BASE_DIR/vocab.txt
--bert_config_file=$BERT_BASE_DIR/bert_config.json
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt
--max_seq_length=128
--train_batch_size=32
--learning_rate=2e-5
--num_train_epochs=3.0
--output_dir=/tmp/mrpc_output/
`

Does run_classifier support replacing the data_dir with my own dataset to train the classifier?

tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version

python run_squad.py
--vocab_file=$BERT_BASE_DIR/vocab.txt
--bert_config_file=$BERT_BASE_DIR/bert_config.json
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt
--do_train=True
--train_file=$SQUAD_DIR/train-v1.1.json
--do_predict=True
--predict_file=$SQUAD_DIR/dev-v1.1.json
--train_batch_size=12
--learning_rate=3e-5
--num_train_epochs=2.0
--max_seq_length=384
--doc_stride=128
--output_dir=/tmp/squad_base/

should I install cuda 8?

linear projection "bias=True" != original t2t implementation transformer

Hi,

In modeling.py line 866:

with tf.variable_scope("output"): attention_output = tf.layers.dense( attention_output, hidden_size, kernel_initializer=create_initializer(initializer_range))

You are setting the bias of the dense layer = True, where in the original implementation of transformer they are setting it to False. There is any reason for doing that?

Thanks.

PyTorch implementation

Hello all,

We have released a PyTorch implementation/port of BERT !

Our scripts load Google's pre-trained models and it performs about the same as the TF implementation in our tests (see the readme). We have also included gradient-accumulation, multi-GPU & distributed training options to help you fine-tune these large models.

Here's the link: https://github.com/huggingface/pytorch-pretrained-BERT

We hope it will be useful !

Victor - HuggingFace 🤗

I get attribute error when I run classifier code

When I run code like:
python run_classifier.py
--task_name=MRPC
--do_train=true
--do_eval=true
--data_dir=SST-2
--vocab_file=english_L-12_H-768_A-12/vocab.txt
--bert_config_file=english_L-12_H-768_A-12/bert_config.json
--init_checkpoint=english_L-12_H-768_A-12/bert_model.ckpt
--max_seq_length=128
--train_batch_size=32
--learning_rate=2e-5
--num_train_epochs=3.0
--output_dir=output

I get an Attribute Error as following:
Traceback (most recent call last):
File "run_classifier.py", line 754, in
tf.app.run()
File "/home/xiyu/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "run_classifier.py", line 658, in main
is_per_host = tf.contrib.tpu.InputPipelineConfig.PER_HOST_V2
AttributeError: 'module' object has no attribute 'InputPipelineConfig'

My tensorflow version is 1.4.0, and my gpu is GTX1080. Do I run the shell wrong or I forget to install some packages?

Thanks very much.

run run_classifier.py on chinese data, Failed to find any matching files for /path/chinese_L-12_H-768_A-12/bert_model.ckpt

when run the classify script "run_classifier.py"
as follow:

python run_classifier.py --task_name=XNLI --do_train=true --do_eval=true --data_dir=$XNLI_DIR --vocab_file=$BERT_BASE_DIR/vocab.txt --bert_config_file=$BERT_BASE_DIR/bert_config.json --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt --max_seq_length=128 --train_batch_size=32 --learning_rate=5e-5 --num_train_epochs=0.01 --output_dir=/tmp/xnli_output/

suffer this error, I cannot find this file "bert_model.ckpt"

INFO:tensorflow:Error recorded from training_loop: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /path/chinese_L-12_H-768_A-12//bert_model.ckpt INFO:tensorflow:training_loop marked as finished WARNING:tensorflow:Reraising captured error Traceback (most recent call last): File "run_classifier.py", line 838, in <module> tf.app.run() File "/Library/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "run_classifier.py", line 794, in main estimator.train(input_fn=train_input_fn, max_steps=num_train_steps) File "/Library/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2400, in train rendezvous.raise_errors() File "/Library/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/error_handling.py", line 128, in raise_errors six.reraise(typ, value, traceback) File "/Library/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2394, in train saving_listeners=saving_listeners File "/Library/anaconda2/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 356, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/Library/anaconda2/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1181, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/Library/anaconda2/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1211, in _train_model_default features, labels, model_fn_lib.ModeKeys.TRAIN, self.config) File "/Library/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2186, in _call_model_fn features, labels, mode, config) File "/Library/anaconda2/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1169, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/Library/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2470, in _model_fn features, labels, is_export_mode=is_export_mode) File "/Library/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1250, in call_without_tpu return self._call_model_fn(features, labels, is_export_mode=is_export_mode) File "/Library/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1524, in _call_model_fn estimator_spec = self._model_fn(features=features, **kwargs) File "run_classifier.py", line 575, in model_fn ) = modeling.get_assignment_map_from_checkpoint(tvars, init_checkpoint) File "/Users/xiaoqiugen/Project/tmp/bert/modeling.py", line 331, in get_assignment_map_from_checkpoint init_vars = tf.train.list_variables(init_checkpoint) File "/Library/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/checkpoint_utils.py", line 95, in list_variables reader = load_checkpoint(ckpt_dir_or_file) File "/Library/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/checkpoint_utils.py", line 64, in load_checkpoint return pywrap_tensorflow.NewCheckpointReader(filename) File "/Library/anaconda2/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 314, in NewCheckpointReader return CheckpointReader(compat.as_bytes(filepattern), status) File "/Library/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 526, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /path/chinese_L-12_H-768_A-12//bert_model.ckpt

where is the XnliProcessor?

I can't want run_classifier.py with Xnli dataset. I did as the CONTRIBUTING.md told. but i can't find the XnliProcessor

Training WordPiece vocabulary

Hello, thanks for releasing this code! I need to pretrain a BERT model for non-English language using my own (domain-specific) data. I noticed that the training code for WordPiece vocabulary is not released, although there are a couple open source implementations mentioned in the README. It is however stated that "these are not compatible with our tokenization.py library". May I know whether this means I can't simply feed in the vocab file generated by an external WordPiece library (e.g. tensor2tensor's) as an argument for --vocab_file?

How BERT would perform on IMDB, Trec-6?

On paper it mentioned only GLUE datasets. I wonder how it would perform on binary, multiclass and multilabel text classification tasks, so that we can directly compare it with ELMO, ULMFIT.

Clarification : Fixed feature vectors

Please correct me if I'm wrong :

  • Feature vectors are word embeddings, for each token of the input file.
  • These vectors can be used as ELMo / GloVe : as a base for a bigger neural network.

If these assumptions are right, here is my question :

From the use example :

python extract_features.py
...
--layers=-1,-2,-3,-4
...

Why would anyone be interested in features vectors from others layers than the last one ?

From my understanding, feature vectors from the last layer are complete. Feature vectors from other layers are not complete.

'Complete' is obviously the wrong word here, due to my lack of vocabulary / knowledge.

By the way, BERT is really amazing, congratulations and thank you for sharing it.

plan to release SWAG code?

Hi, I just want to know if you plan to release fine-tuning and evaluation code for SWAG dataset.
If not, I wonder if the training procedure is same as MRPC. (more specificly, label 0 for distractors and 1 for gold-ending)

throwing bad_alloc after calling model_fn

Awesome research! This is a huge breakthrough for NLP.

I'm running BERT-large on a Cloud TPU doing fine-tuning for squad, but I keep getting

image

I have nothing else running so I'm not sure why the machine is running out of memory, and followed the steps exactly for setup (ie putting the pre-trained model on a google bucket, set up for TPU, etc).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.