xuyige / bert4doc-classification Goto Github PK
View Code? Open in Web Editor NEWCode and source for paper ``How to Fine-Tune BERT for Text Classification?``
License: Apache License 2.0
Code and source for paper ``How to Fine-Tune BERT for Text Classification?``
License: Apache License 2.0
您好,Bert当中的Embedding Layer是在Layer0之前的,他的学习率设置为Layer0乘以权重ξ(0.95)会不会更好一点?
Hello, what is the python version of your code
Thanks for your hard work!
I have two questions. First, for Layer-wise Decreasing Layer Rate, did you use a warm-up or polynomial_decay simultaneous?,and it means that warm-up rate and Layer-wise Decreasing Layer Rate are used simultaneous? Second, for large bert, how did you set the Learning rate and Decay factor which the paper didn't give?
The parser option for save_checkpoints_steps doesnt do anything for me.
Im running:
python3 run_classifier_single_layer.py --task_name imdb --do_train --do_eval --do_lower_case --data_dir ./stock --vocab_file ./uncased_L-12_H-768_A-12/vocab.txt --bert_config_file ./uncased_L-12_H-768_A-12/bert_config.json --init_checkpoint ./uncased_L-12_H-768_A-12/pytorch_model.bin --max_seq_length 512 --train_batch_size 16 --learning_rate 2e-5 --num_train_epochs 3.0 --output_dir ./stock_output --seed 42 --layers 11 10 --trunc_medium -1 --save_checkpoints_steps 1000
Any idea to solve this?
Dear Yige,
thanks a lot for sharing the code!
I was wondering if you could provide some more detail on "further pre-training" on the IMDB dataset, e.g. the hyperparameter settings for it.
Or, is it possible to share the BERT model which did the LM pre-training on the IMDB dataset?
强烈建议国人学者开源工作时能出个对应的中文版
Hi,
Thanks so much for sharing the code for this fantastic work!
In the paper you mentioned that "We empirically set the max number of the epoch to 4 and save the best model on the validation set for testing". I am wondering how did you create the validation dataset for the classification tasks? Did you split the original train dataset into train/val? If that's the case, what's the ratio you split train/validation dataset for the IMDB, AGnews etc.?
Thanks so much for your help in advance!
Hi sorry to bother you, but I have one question.
Documents have multiple sentences so how do you deal with that ? Do you split the text into sentences and the concatenate the final embeddings for each sentence or do you remove all punctuation marks so the text won't have any [SEP] tokens.
Sorry to bother you!
But it seems to me, the run_classifier_single_layer.py does not save the model, and what should I do to fine tuning the fine tuned model?
Thanks!
I got this error when doing further-pretraining
my environment
Ubuntu 18.04.4 LTS (GNU/Linux 5.4.0-74-generic x86_64)
GPU 2080ti
I use following command
python run_pretraining.py
--input_file=./tmp/tf_AGnews.tfrecord
--output_dir=./uncased_L-12_H-768_A-12_AGnews_pretrain
--do_train=True
--do_eval=True
--bert_config_file=./uncased_L-12_H-768_A-12/bert_config.json
--init_checkpoint=./uncased_L-12_H-768_A-12/bert_model.ckpt
--train_batch_size=8
--max_seq_length=128
--max_predictions_per_seq=20
--num_train_steps=100000
--num_warmup_steps=10000
--save_checkpoints_steps=10000
--learning_rate=5e-5
I got this message and further pretraining does not work
How can I fix this problem?
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 62 vs previous value: 62. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W0622 17:33:44.304897 140418054317888 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 62 vs previous value: 62. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
In Section 5.4.3 " We find that assign a lower learn- ing rate to the lower layer is effective to fine-tuning BERT, and an appropriate setting is ξ=0.95 and lr=2.0e-5."
Compared to the code in https://github.com/xuyige/BERT4doc-Classification/blob/master/codes/fine-tuning/run_classifier.py#L812
Seem that you divide the bert layer into 3 part (4 layers for one part) and set different learning rate for each part.
Some questions about it:
Hi, thanks for your great work.
While running run_pretraining.py, I kept getting OOM for any size of the matrix.
I already reduce the batch size to 1 but didn't help.
I'm using 960M, TensorFlow-gpu1.10, Cuda toolkit 9.0
I'm wondering what version of TensorFlow are you using? Any thoughts on this issue?
Thanks in advance.
Hi:
I tried to use your code on my own corpus to do classification which consists of many short sentences.I want to try some expriements with further pre-training without the NSP task.But from your code of "create_pretraining_data.py" ,I found you random choose a doc from the dataset to concatenate to another doc after [SEP] as input which confuse me a lot,could you please explain to me why this is done?Thanks a lot.
When do further pre-training on my own datas the ppl is too much high for example 709. I have 3582619 examples, and use batch size=8, epoch=3, learing rate=5e-5. Is there any advice ? Thanks a lot!
你好,非常感谢,这个项目对我目前的工作很有帮助。我在做学生作文自动评分的项目,数据量是450mb,大约93万篇学生作文。我用create_pretraining_data这个脚本生成了一个17G的tf.records 文件,max_sequence_length 选择的是128。我的问题是:在生成预训练数据这个步骤中,max_sequence_length 是选择最大的文章的长度,还是最长的一句话的长度?
Hi,
first, thank u for having sharing ur cod with us
I am trying to further pretraining a bert model on my own corpus on colab gpu but I am getting an error of resource exhausted
can someone tell me how to fix this
Also what are the expected output of this further pretraining
Are they the bert tenserflow files that we can use for fine-tuning ( checkpoint, config, and vocab)?
Thank u
Hi,
Thank you for sharing your code. I met the following problem when running "python generate_corpus_agnews.py".
Traceback (most recent call last):
File "generate_corpus_agnews.py", line 18, in
f.write(str(test_data[i][1])+"\n")
IndexError: index 1 is out of bounds for axis 0 with size 1
And also, could provide some guideline on how I can apply your code on my own dataset?
@xuyige Time taken
!python run_pretraining.py
--input_file=./tmp/tf_examples.tfrecord
--output_dir=./tmp/pretraining_output
--do_train=True
--do_eval=True
--bert_config_file=./uncased_L-12_H-768_A-12/bert_config.json
--init_checkpoint=./uncased_L-12_H-768_A-12/bert_model.ckpt
--train_batch_size=32
--max_seq_length=128
--max_predictions_per_seq=20
--num_train_steps=100000
--num_warmup_steps=10000
--learning_rate=5e-5
--use_tpu=False
--save_checkpoints_steps=10000
Hi,
I followed ur code to further pre-train a bert model on my own corpus but I got only checkpoint files without any config or vocab.txt file any ideas plz?
Thank u
Hey,
When i run the command create_pretraining_data.py i see the following msg:
INFO:tensorflow:*** Reading from input files ***
I1210 15:59:58.812381 140714487977856 create_pretraining_data.py:419] *** Reading from input files ***
INFO:tensorflow:*** Writing to output files ***
I1210 15:59:58.815751 140714487977856 create_pretraining_data.py:430] *** Writing to output files ***
INFO:tensorflow: tmp/tf_AGnews.tfrecord
I1210 15:59:58.815884 140714487977856 create_pretraining_data.py:432] tmp/tf_AGnews.tfrecord
WARNING:tensorflow:From create_pretraining_data.py:97: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.
W1210 15:59:58.816398 140714487977856 module_wrapper.py:139] From create_pretraining_data.py:97: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.
INFO:tensorflow:Wrote 0 total instances
I1210 15:59:58.819541 140714487977856 create_pretraining_data.py:162] Wrote 0 total instances
Does this mean no data is created? If, yes, can you tell me why this is happening?
Thanks in advance.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.