Coder Social home page Coder Social logo

xuyige / bert4doc-classification Goto Github PK

View Code? Open in Web Editor NEW
599.0 599.0 99.0 814 KB

Code and source for paper ``How to Fine-Tune BERT for Text Classification?``

License: Apache License 2.0

Python 100.00%
bert natural-language-processing text-classification

bert4doc-classification's People

Contributors

evrys avatar mhilmiasyrofi avatar xuyige avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bert4doc-classification's Issues

Bert中的Embedding Layer

您好,Bert当中的Embedding Layer是在Layer0之前的,他的学习率设置为Layer0乘以权重ξ(0.95)会不会更好一点?

For Layer-wise Decreasing Layer Rate

Thanks for your hard work!
I have two questions. First, for Layer-wise Decreasing Layer Rate, did you use a warm-up or polynomial_decay simultaneous?,and it means that warm-up rate and Layer-wise Decreasing Layer Rate are used simultaneous? Second, for large bert, how did you set the Learning rate and Decay factor which the paper didn't give?

save_checkpoints_steps doesnt work.

The parser option for save_checkpoints_steps doesnt do anything for me.

Im running:

python3 run_classifier_single_layer.py --task_name imdb --do_train --do_eval --do_lower_case --data_dir ./stock --vocab_file ./uncased_L-12_H-768_A-12/vocab.txt --bert_config_file ./uncased_L-12_H-768_A-12/bert_config.json --init_checkpoint ./uncased_L-12_H-768_A-12/pytorch_model.bin --max_seq_length 512 --train_batch_size 16 --learning_rate 2e-5 --num_train_epochs 3.0 --output_dir ./stock_output --seed 42 --layers 11 10 --trunc_medium -1 --save_checkpoints_steps 1000

Any idea to solve this?

Further Pre-Training on the IMDB dataset

Dear Yige,
thanks a lot for sharing the code!
I was wondering if you could provide some more detail on "further pre-training" on the IMDB dataset, e.g. the hyperparameter settings for it.
Or, is it possible to share the BERT model which did the LM pre-training on the IMDB dataset?

Validation dataset split

Hi,
Thanks so much for sharing the code for this fantastic work!
In the paper you mentioned that "We empirically set the max number of the epoch to 4 and save the best model on the validation set for testing". I am wondering how did you create the validation dataset for the classification tasks? Did you split the original train dataset into train/val? If that's the case, what's the ratio you split train/validation dataset for the IMDB, AGnews etc.?

Thanks so much for your help in advance!

Dealing with multiple sentences

Hi sorry to bother you, but I have one question.

Documents have multiple sentences so how do you deal with that ? Do you split the text into sentences and the concatenate the final embeddings for each sentence or do you remove all punctuation marks so the text won't have any [SEP] tokens.

How to fine tuning model on multi-tasks?

Sorry to bother you!
But it seems to me, the run_classifier_single_layer.py does not save the model, and what should I do to fine tuning the fine tuned model?
Thanks!

further-pretraining

I got this error when doing further-pretraining

my environment
Ubuntu 18.04.4 LTS (GNU/Linux 5.4.0-74-generic x86_64)
GPU 2080ti

I use following command
python run_pretraining.py
--input_file=./tmp/tf_AGnews.tfrecord
--output_dir=./uncased_L-12_H-768_A-12_AGnews_pretrain
--do_train=True
--do_eval=True
--bert_config_file=./uncased_L-12_H-768_A-12/bert_config.json
--init_checkpoint=./uncased_L-12_H-768_A-12/bert_model.ckpt
--train_batch_size=8
--max_seq_length=128
--max_predictions_per_seq=20
--num_train_steps=100000
--num_warmup_steps=10000
--save_checkpoints_steps=10000
--learning_rate=5e-5

I got this message and further pretraining does not work
How can I fix this problem?

WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 62 vs previous value: 62. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W0622 17:33:44.304897 140418054317888 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 62 vs previous value: 62. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.

Questions about discriminative_fine_tuning

In Section 5.4.3 " We find that assign a lower learn- ing rate to the lower layer is effective to fine-tuning BERT, and an appropriate setting is ξ=0.95 and lr=2.0e-5."
Compared to the code in https://github.com/xuyige/BERT4doc-Classification/blob/master/codes/fine-tuning/run_classifier.py#L812
Seem that you divide the bert layer into 3 part (4 layers for one part) and set different learning rate for each part.
Some questions about it:

  1. How could the decay factor 0.95 match the number 2.6 in code ?
  2. And the last classify layer seem not be contained , no need to set lr for it ?

OOM when batchSize=1

Hi, thanks for your great work.
While running run_pretraining.py, I kept getting OOM for any size of the matrix.
I already reduce the batch size to 1 but didn't help.
I'm using 960M, TensorFlow-gpu1.10, Cuda toolkit 9.0
I'm wondering what version of TensorFlow are you using? Any thoughts on this issue?
Thanks in advance.

Question about Further Pre-training

Hi:
I tried to use your code on my own corpus to do classification which consists of many short sentences.I want to try some expriements with further pre-training without the NSP task.But from your code of "create_pretraining_data.py" ,I found you random choose a doc from the dataset to concatenate to another doc after [SEP] as input which confuse me a lot,could you please explain to me why this is done?Thanks a lot.

hight perplexity when Further Pre-Training

When do further pre-training on my own datas the ppl is too much high for example 709. I have 3582619 examples, and use batch size=8, epoch=3, learing rate=5e-5. Is there any advice ? Thanks a lot!

max_sequence_length in create_pretraining_data

你好,非常感谢,这个项目对我目前的工作很有帮助。我在做学生作文自动评分的项目,数据量是450mb,大约93万篇学生作文。我用create_pretraining_data这个脚本生成了一个17G的tf.records 文件,max_sequence_length 选择的是128。我的问题是:在生成预训练数据这个步骤中,max_sequence_length 是选择最大的文章的长度,还是最长的一句话的长度?

Resource exhausted

Hi,

first, thank u for having sharing ur cod with us

I am trying to further pretraining a bert model on my own corpus on colab gpu but I am getting an error of resource exhausted
can someone tell me how to fix this

Also what are the expected output of this further pretraining
Are they the bert tenserflow files that we can use for fine-tuning ( checkpoint, config, and vocab)?

Thank u

Generate Further Pre-Training Corpus

Hi,
Thank you for sharing your code. I met the following problem when running "python generate_corpus_agnews.py".

Traceback (most recent call last):
File "generate_corpus_agnews.py", line 18, in
f.write(str(test_data[i][1])+"\n")
IndexError: index 1 is out of bounds for axis 0 with size 1

And also, could provide some guideline on how I can apply your code on my own dataset?

How much time did it take to run the further pre-training step?

@xuyige Time taken

!python run_pretraining.py
--input_file=./tmp/tf_examples.tfrecord
--output_dir=./tmp/pretraining_output
--do_train=True
--do_eval=True
--bert_config_file=./uncased_L-12_H-768_A-12/bert_config.json
--init_checkpoint=./uncased_L-12_H-768_A-12/bert_model.ckpt
--train_batch_size=32
--max_seq_length=128
--max_predictions_per_seq=20
--num_train_steps=100000
--num_warmup_steps=10000
--learning_rate=5e-5
--use_tpu=False
--save_checkpoints_steps=10000

further pre-training

Hi,

I followed ur code to further pre-train a bert model on my own corpus but I got only checkpoint files without any config or vocab.txt file any ideas plz?

Thank u

0 instances wrote while further pre-training on my own dataset

Hey,
When i run the command create_pretraining_data.py i see the following msg:

INFO:tensorflow:*** Reading from input files ***
I1210 15:59:58.812381 140714487977856 create_pretraining_data.py:419] *** Reading from input files ***
INFO:tensorflow:*** Writing to output files ***
I1210 15:59:58.815751 140714487977856 create_pretraining_data.py:430] *** Writing to output files ***
INFO:tensorflow: tmp/tf_AGnews.tfrecord
I1210 15:59:58.815884 140714487977856 create_pretraining_data.py:432] tmp/tf_AGnews.tfrecord
WARNING:tensorflow:From create_pretraining_data.py:97: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.

W1210 15:59:58.816398 140714487977856 module_wrapper.py:139] From create_pretraining_data.py:97: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.

INFO:tensorflow:Wrote 0 total instances
I1210 15:59:58.819541 140714487977856 create_pretraining_data.py:162] Wrote 0 total instances
Does this mean no data is created? If, yes, can you tell me why this is happening?

Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.