Coder Social home page Coder Social logo

mandarjoshi90 / coref Goto Github PK

View Code? Open in Web Editor NEW
437.0 13.0 91.0 4.17 MB

BERT for Coreference Resolution

License: Apache License 2.0

Python 51.92% C++ 0.63% Shell 1.60% Jupyter Notebook 7.47% HTML 0.24% JavaScript 0.80% Perl 37.34%
coreference-resolution natural bert spanbert nlp

coref's People

Contributors

kentonl avatar mandarjoshi90 avatar wenyudu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

coref's Issues

small README typos

A couple of really minor typos in README:

  1. "the the section"
  2. " author={Mandar Joshi and Omer Levy and Daniel S. Weld and Luke Zettlemoyer and Omer Levy},"

Thanks for releasing all of this!

can't run evaluate.py successfully

Hello,

After creating test jsonlines files and changing eval_path to one of the test files in experiments.conf, I only can run bert_base with test.english.128.jsonlines and the F1 score is 73.38, others(only test on bert_base and bert_large) were all failed by tensorflow.python.framework.errors_impl.OutOfRangeError: Read less bytes than requested
[[{{node checkpoint_initializer_337}}]]
[[{{node checkpoint_initializer_3}}]]

could you help me to have check what's wrong on it? If current repo can run successfully, please?

Many thanks~

OntoNotes Training

The Setup for training section says:
./setup_training.sh <ontonotes/path/ontonotes-release-5.0> $data_dir

What does this path <ontonotes/path/ontonotes-release-5.0> refer? How can I access it?

training with my dataset

hello author:
I want to use this res code to train my model with another dataset.
But i find original spanbert is a pytorch.bin file. config file is ckpt model loader.
I can't load spanbert.

best wish

Unable to check Evaluation results

I want to test & evaluate performance of this model on GAP data-set. Using its below google colab notebook to avoid machine dependency issues.
https://colab.research.google.com/drive/1SlERO9Uc9541qv6yH26LJz5IM9j7YVra#scrollTo=H0xPknceFORt

How can I view the results of evaluation metrics as shown in the mentioned research paper?

Secondly, I tried to run cmd ! GPU=0 python evaluate.py $CHOSEN_MODEL in colab, assuming it would generate the evaluation results, but getting below error:

..
..
..
W0518 20:39:24.163360 140409457641344 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/optimizer_v2/learning_rate_schedule.py:409: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
W0518 20:39:24.184890 140409457641344 deprecation_wrapper.py:119] From /content/coref/optimization.py:64: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

bert:task 199 27
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
2020-05-18 20:39:34.733225: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-05-18 20:39:34.735705: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-05-18 20:39:34.735782: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (fcef0d45f3e2): /proc/driver/nvidia/version does not exist
2020-05-18 20:39:34.736239: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-05-18 20:39:34.750671: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200160000 Hz
2020-05-18 20:39:34.751024: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x78bf9c0 executing computations on platform Host. Devices:
2020-05-18 20:39:34.751074: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
Restoring from ./spanbert_base/model.max.ckpt
2020-05-18 20:39:40.322311: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
W0518 20:39:47.165590 140409457641344 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
Traceback (most recent call last):
  File "evaluate.py", line 26, in <module>
    model.evaluate(session, official_stdout=True, eval_mode=True)
  File "/content/coref/independent.py", line 538, in evaluate
    self.load_eval_data()
  File "/content/coref/independent.py", line 532, in load_eval_data
    with open(self.config["eval_path"]) as f:
FileNotFoundError: [Errno 2] No such file or directory: './dev.english.384.jsonlines'

Any idea what could be the possible reason? I am new to this area/environment and following the colab code right now.

Looking for some suggestions / steps to do evaluation part.

Thanks in advance!

Input Format

Hi Mandar,

Thanks for this repo.

I am using the pretrained model. If I have a set of sentences in txt format how should I predict the coref. Is there any method to directly convert my sentences in txt format into the required format that model need as an input?

Questions on Table 2 of Bert paper

  1. Table 2 shows many systems results on GAP, could I ask it is on GAP dev dataset or test dataset?
  2. I couldn't reproduce c2f_coref result now, not sure what' wrong with files or parameters. I am wondering if you also use gap_to_jsonlines.py and to_gap_tsv.py for c2f_coref system? Do you use tokenizer or not in gap_to_jsonlines.py? And what doc_key do you set for each sample in JSON? because it is required from one of the genres.

Thank you in advance.

Failed to fineune BERTResolver from the released checkpoint

Hi,
I am going to restore your checkpoint and finetune the model on a new dataset, but I failed to do this and received the following errors:

Traceback (most recent call last):
File “train.py”, line 23, in
model = util.get_model(config)
File “/home/chaiha/bert-e2e-coref/util.py”, line 21, in get_model
return independent.CorefModel(config)
File “/home/chaiha/bert-e2e-coref/independent.py”, line 61, in init
init_from_checkpoint(config[‘init_checkpoint’], assignment_map)
File “/home/chaiha/anaconda3/envs/e2e-gpu/lib/python3.7/site-packages/tensorflow/python/training/checkpoint_utils.py”, line 190, in init_from_checkpoint
_init_from_checkpoint, args=(ckpt_dir_or_file, assignment_map))
File “/home/chaiha/anaconda3/envs/e2e-gpu/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py”, line 1516, in merge_call
return self._merge_call(merge_fn, args, kwargs)
File “/home/chaiha/anaconda3/envs/e2e-gpu/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py”, line 1524, in _merge_call
return merge_fn(self._distribution_strategy, *args, **kwargs)
File “/home/chaiha/anaconda3/envs/e2e-gpu/lib/python3.7/site-packages/tensorflow/python/training/checkpoint_utils.py”, line 229, in _init_from_checkpoint
tensor_name_in_ckpt, str(variable_map[tensor_name_in_ckpt])
ValueError: Shape of variable coref_layer/slow_antecedent_scores/hidden_bias_0:0 ((1000,)) doesn’t match with shape of tensor coref_layer/slow_antecedent_scores/hidden_bias_0 ([3000]) from checkpoint reader.

Here is my configuration:
finetune_bert_base = ${best}{
num_docs = 2802
bert_learning_rate = 1e-05
task_learning_rate = 0.0002
max_segment_len = 128
ffnn_size = 3000
train_path = ${data_dir}/train.english.128.jsonlines
eval_path = ${data_dir}/test.english.128.jsonlines
conll_eval_path = ${data_dir}/test.english.v4_gold_conll
max_training_sentences = 11
bert_config_file = ${best.log_root}/bert_base/bert_config.json
vocab_file = ${best.log_root}/bert_base/vocab.txt
tf_checkpoint = ${best.log_root}/bert_base/model.max.ckpt
init_checkpoint = ${best.log_root}/bert_base/model.max.ckpt
}
I also made a copy of 'bert_base' folder into the folder called 'finetune_bert_base' and that still fails... Thank you for any thoughts!

System pre-requisite to run this project

I am using Windows 10 OS. What environment is required to setup & run this project on my machine?

PS: Anaconda3 is already Installed. What else is required to execute the given python/shell cmds in Setup section?

Missing path $data_dir/spanbert_base/bert_config.json

Hi, thanks for the implementation. I'm trying to train my first model but got this error:

$ GPU=0 python train.py train_spanbert_base
...
tensorflow.python.framework.errors_impl.NotFoundError: ../output/bert/data/spanbert_base/bert_config.json; No such file or directory

../output/bert/data is an empty folder before I ran setup_all.sh and setup_training.sh. Am I missing some step here?

AssertionError in train.py

Thank you for your comment.

${data_dir}/*.jsonlines, ${data_dir}/*.v4_gold_conll were made successfully.

I tried GPU=0 python train.py train_1
but, AssertionError occurred in evaluation.

It's my experiments.conf

data_dir = data

best {
  # Edit this
  data_dir = data
  model_type = independent
  # Computation limits.
  max_top_antecedents = 50
  max_training_sentences = 5
  top_span_ratio = 0.4
  max_num_speakers = 20
  max_segment_len = 256

  # Learning
  bert_learning_rate = 1e-5
  task_learning_rate = 2e-4
  num_docs = 2802

  # Model hyperparameters.
  dropout_rate = 0.3
  ffnn_size = 1000
  ffnn_depth = 1
  num_epochs = 20
  feature_size = 20
  max_span_width = 30
  use_metadata = true
  use_features = true
  use_segment_distance = true
  model_heads = true
  coref_depth = 2
  coarse_to_fine = true
  fine_grained = true
  use_prior = true

  # Other.
  train_path = train.english.jsonlines
  eval_path = dev.english.jsonlines
  conll_eval_path = dev.english.v4_gold_conll
  single_example = true
  genres = ["bc", "bn", "mz", "nw", "pt", "tc", "wb"]
  eval_frequency = 1000
  report_frequency = 100
  #log_root = ${data_dir}
  log_root = models
  adam_eps = 1e-6
  task_optimizer = adam
}

train_1 = ${best}{
  num_docs = 2802
  bert_learning_rate = 1e-05
  task_learning_rate = 0.0002
  max_segment_len = 128
  ffnn_size = 200
  model_heads = false
  train_path = ${data_dir}/train.english.128.jsonlines
  eval_path = ${data_dir}/dev.english.128.jsonlines
  conll_eval_path = ${data_dir}/dev.english.v4_gold_conll
  max_training_sentences = 5
  bert_config_file = data/cased_L-12_H-768_A-12/bert_config.json
  vocab_file = data/cased_L-12_H-768_A-12/vocab.txt
  tf_checkpoint = data/cased_L-12_H-768_A-12/bert_model.ckpt
  init_checkpoint = data/cased_L-12_H-768_A-12/bert_model.ckpt
}

and result

I0829 11:00:12.977631 139776037099264 train.py:58] [1000] loss=nan, steps/s=5.96
Loaded 343 eval examples.
2019-08-29 11:00:18.389048: W tensorflow/core/kernels/queue_base.cc:277] _0_padding_fifo_queue: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
  File "train.py", line 64, in <module>
    eval_summary, eval_f1 = model.evaluate(session, tf_global_step)
  File "/data/BERT-coref/coref/independent.py", line 559, in evaluate
    coref_predictions[example["doc_key"]] = self.evaluate_coref(top_span_starts, top_span_ends, predicted_antecedents, example["clusters"], coref_evaluator)
  File "/data/BERT-coref/coref/independent.py", line 524, in evaluate_coref
    predicted_clusters, mention_to_predicted = self.get_predicted_clusters(top_span_starts, top_span_ends, predicted_antecedents)
  File "/data/BERT-coref/coref/independent.py", line 499, in get_predicted_clusters
    assert i > predicted_index, (i, predicted_index)
AssertionError: (0, 0)

Is there anything I missed?

Possibly low F1 when finetuning BERT base

Hi Mandar,

When I finetune BERT base, I get an OntoNotes dev F1 of 73.69. I was wondering if this is within the variance that you saw for BERT base, or could there be some problem with my setup?

I'm using the requirements versions from requirements.txt (except with the MarkupSafe version changed to 1.1.1, #40, and psycopg2 changed to psycopg2-binary), and am training on a V100 32GB, with these commands:

python train.py train_bert_base
python evaluate.py train_bert_base

When evaluating your finetuned BERT base model on dev (python evaluate.py bert_base), I get an F1 of 74.05. This is closer to the 74.3 dev F1 number from Table 4, but should it match exactly? I'm wondering if there could be some difference in my setup which affects eval a bit but gets magnified during training.

Thanks,
Daniel

Possible to generate predictions (but not train a model) on Mac?

Hi,

I tried getting this going on a new macbook pro but installation didn't seem possible. I just wanted to use your pretrained model to make cluster predictions from a set of documents. I'm wondering if this use case is supported yet. It seems like maybe this is windows/linux + GPU only at this point.

Is jsonlines_to_json.py available?

I need results of a coreference resolution consisting of an original index (not bert token index).

kentonl's jsonlines_to_json.py don't work, because of indices change by bert tokenizer.

Do you have a jsonlines_to_json.py for bert coreference?
or
Do you have an idea about this problem?

Thanks!

scripts don't work.

issue#1

command : ./download_pretrained.sh bert-base
error : HTTP request sent, awaiting response... 404 Not Found


issue#2

command : GPU=0 python train.py best
error :
Traceback (most recent call last):
File "train.py", line 22, in
model = util.get_model(config)
File "/data/BERT-coref/coref/util.py", line 21, in get_model
return independent.CorefModel(config)
File "/data/BERT-coref/coref/independent.py", line 32, in init
self.bert_config = modeling.BertConfig.from_json_file(config["bert_config_file"])
File "/home/fairy_of_9/anaconda3/envs/bert/lib/python3.6/site-packages/pyhocon/config_tree.py", line 366, in getitem
val = self.get(item)
File "/home/fairy_of_9/anaconda3/envs/bert/lib/python3.6/site-packages/pyhocon/config_tree.py", line 209, in get
return self._get(ConfigTree.parse_key(key), 0, default)
File "/home/fairy_of_9/anaconda3/envs/bert/lib/python3.6/site-packages/pyhocon/config_tree.py", line 151, in _get
raise ConfigMissingException(u"No configuration setting found for key {key}".format(key='.'.join(key_path[:key_index + 1])))
pyhocon.exceptions.ConfigMissingException: 'No configuration setting found for key bert_config_file'


experiments.conf

best {
  # Edit this
  data_dir = data_set
  model_type = independent
  # Computation limits.
  max_top_antecedents = 50
  max_training_sentences = 5
  top_span_ratio = 0.4
  max_num_speakers = 20
  max_segment_len = 64 #256

  # Learning
  bert_learning_rate = 1e-5
  task_learning_rate = 2e-4
  num_docs = 2802

  # Model hyperparameters.
  dropout_rate = 0.3
  ffnn_size = 500 #1000
  ffnn_depth = 1
  num_epochs = 20
  feature_size = 20
  max_span_width = 30
  use_metadata = true
  use_features = true
  use_segment_distance = true
  model_heads = false #true
  coref_depth = 2
  coarse_to_fine = true
  fine_grained = true
  use_prior = true

  # Other.
  train_path = data_set/train.english.jsonlines
  eval_path = data_set/dev.english.jsonlines
  conll_eval_path = data_set/dev.english.v4_gold_conll
  single_example = true
  genres = ["bc", "bn", "mz", "nw", "pt", "tc", "wb"]
  eval_frequency = 1000
  report_frequency = 100
  log_root = logs
  adam_eps = 1e-6
  task_optimizer = adam
}

bert_base = ${best}{
  num_docs = 2802
  bert_learning_rate = 1e-05
  task_learning_rate = 0.0002
  max_segment_len = 128
  ffnn_size = 3000
  train_path = data_set/train.english.128.jsonlines
  eval_path = data_set/dev.english.128.jsonlines
  conll_eval_path = data_set/dev.english.v4_gold_conll
  max_training_sentences = 11
  bert_config_file = ${best.log_root}/bert_base/bert_config.json
  vocab_file = ${best.log_root}/bert_base/vocab.txt
  tf_checkpoint = ${best.log_root}/bert_base/model.max.ckpt
  init_checkpoint = ${best.log_root}/bert_base/model.max.ckpt
}
...

Is there anything I missed?

Speed up training?

Hi, I'm trying to retrain the coref model starting from another BERT model trained on different data. It seems the loss values are not going down but another issue is that training seems slow and the GPU is underutilized (screenshot below). Any tips on how to speed up the training or fit more data in the gpu?

TITAN V          | 40'C,   0 % |   316 / 12036 MB
I0905 12:20:54.217084 140321920943872 train.py:59] [100] loss=2071.04, steps/s=0.20
I0905 12:21:45.037879 140321920943872 train.py:59] [110] loss=837.87, steps/s=0.20
I0905 12:22:37.386365 140321920943872 train.py:59] [120] loss=1475.69, steps/s=0.20
I0905 12:23:34.424523 140321920943872 train.py:59] [130] loss=1111.34, steps/s=0.20
I0905 12:24:26.693988 140321920943872 train.py:59] [140] loss=1088.69, steps/s=0.20
I0905 12:25:14.780310 140321920943872 train.py:59] [150] loss=792.43, steps/s=0.20
I0905 12:26:08.272615 140321920943872 train.py:59] [160] loss=1597.89, steps/s=0.20
I0905 12:26:55.389269 140321920943872 train.py:59] [170] loss=1087.88, steps/s=0.20

Loss is 0 in step 200 and assertion error.

I'm really really sorry to ask you a trivial question.

train.py train_bert_base works well!


I have Korean data in conll 2012 form.
they are work in e2e-coref model.

I want to apply Korean data to your model.
So I did run minimize.py, and train.py
But Loss is 0 in step 200 and assertion error in eval time.

assertion error log

Traceback (most recent call last):
  File "train.py", line 64, in <module>
    eval_summary, eval_f1 = model.evaluate(session, tf_global_step)
  File "/data/BERT-coref/coref/independent.py", line 559, in evaluate
    coref_predictions[example["doc_key"]] = self.evaluate_coref(top_span_starts, top_span_ends, predicted_antecedents, example["clusters"], coref_evaluator)
  File "/data/BERT-coref/coref/independent.py", line 524, in evaluate_coref
    predicted_clusters, mention_to_predicted = self.get_predicted_clusters(top_span_starts, top_span_ends, predicted_antecedents)
  File "/data/BERT-coref/coref/independent.py", line 499, in get_predicted_clusters
    assert i > predicted_index, (i, predicted_index)
AssertionError: (0, 0)

It's my experiments.conf

train_kor = ${best}{
  num_docs = 2411
  bert_learning_rate = 1e-05
  task_learning_rate = 0.0002
  max_segment_len = 128
  ffnn_size = 800
  train_path = ${data_dir}/train.kor.128.jsonlines
  eval_path = ${data_dir}/dev.kor.128.jsonlines
  conll_eval_path = ${data_dir}/dev.kor.v4_gold_conll
  max_training_sentences = 8
  bert_config_file = ${best.log_root}/multi_cased_L-12_H-768_A-12/bert_config.json
  vocab_file = ${best.log_root}/multi_cased_L-12_H-768_A-12/vocab.txt
  tf_checkpoint = ${best.log_root}/multi_cased_L-12_H-768_A-12/bert_model.ckpt
  init_checkpoint = ${best.log_root}/multi_cased_L-12_H-768_A-12/bert_model.ckpt
}

I changed the BERT model to the multi_cased.
I editted num_docs to 2411.
image
(It's result of minimize.py. Is this correct?)

and default ffnn_size and max_training_sentences cause memory error.
so I editted ffnn_size and max_training_sentences.

Is there anything I miss?

Bert Retraining

Hi Mandar ,

While running spanbert large model , if is not able to capture some of words such as Degree .
The same text if I use other brand like Kurkure it will detect . Can I train coref model with my own dataset .

How do I do it ? I have access to onto notes 5. I'm new in the field on AI . Should I convert my dataset to ontonotes format? How can I train on custom dataset.

error with requirements.txt

I run: pip install -r requirements.txt
Got following error.

ERROR: mxnet 1.4.0 has requirement numpy<1.15.0,>=1.8.2, but you'll have numpy 1.18.1 which is incompatible.
ERROR: bert-embedding 1.0.1 has requirement numpy==1.14.6, but you'll have numpy 1.18.1 which is incompatible.
Installing collected packages: PyYAML, awscli, certifi, jpype1, cython, mmh3, future, pystanforddependencies, soupsieve, beautifulsoup4, cort
Found existing installation: PyYAML 5.1.2
ERROR: Cannot uninstall 'PyYAML'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.

Is the project depend on specific version of packages mentioned or higher version will also work ?

Is there a bug in batch_gather?

in independent.py

top_fast_antecedent_scores = util.batch_gather(fast_antecedent_scores, top_antecedents) # [k, c]
sometimes return [NaN, NaN ...]

I tried to print the value of tensors using tf.Print()

batch_gather in util.py

def batch_gather(txt, emb, indices):
  batch_size = shape(emb, 0)
  seqlen = shape(emb, 1)
  if len(emb.get_shape()) > 2:
    emb_size = shape(emb, 2)
  else:
    emb_size = 1

  flattened_emb = tf.reshape(emb, [batch_size * seqlen, emb_size])  # [batch_size * seqlen, emb]
  offset = tf.expand_dims(tf.range(batch_size) * seqlen, 1)  # [batch_size, 1]
  gathered = tf.gather(flattened_emb, indices + offset) # [batch_size, num_indices, emb]
  gathered = tf.Print(gathered, [gathered], message='gathered')
  if len(emb.get_shape()) == 2:
    gathered = tf.squeeze(gathered, 2) # [batch_size, num_indices]
    gathered = tf.Print(gathered, [gathered], message=txt+'gathered2')
  return gathered

The results are as follows.

emb == [[-inf -inf -inf...]...]
flattened_emb == [[-inf][-inf][-inf]...]
indice + offset  == [[808 809 810...]...]
gathered == [[[nan][nan][nan]]...]
gathered2 == [[nan nan nan...]...]

Sometimes it doesn't happen(well done), but it happens frequently(training stops because Loss is NaN).

Can you help me..?

setup_all check gcc version

Hi,
Thanks for releasing the code. While running predict.py I was getting

tensorflow.python.framework.errors_impl.NotFoundError: ./coref_kernels.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrESs

I searched and found this link. It would be nice to either check the GCC version or keep this as a warning in the script to help anyone else stuck here!!

about max_training_sentences

Hello author:
i want to change the config para max_training_sentences to 5, origin 3.
but i was received a error

ValueError: Shape of variable coref_layer/segment_distance/segment_distance_embeddings:0 ((5, 20)) doesn't match with shape of tensor coref_layer/segment_distance/segment_distance_embeddings ([3, 20]) from checkpoint reader.

Where should I change?
best wish
Look forward to your reply

finetuning on pytorch model

Hi Mandar,

I wonder if the model can be finetuned using the pytorch version of BERT? I saw that there are tf_checkpoint and init_checkpoint in the config file. Is it possible to modify them in order to support pytorch BERT models.

How to get model.max.ckpt 文件

experiments.conf:
tf_checkpoint = ${best.log_root}/bert_base/model.max.ckpt
init_checkpoint = ${best.log_root}/bert_base/model.max.ckpt
I am running train.py,wrong indication:Failed to find any matching file for bert_base/model.max.ckpt.
I'm not sure where the file came from or was it generated in the middle.
Is there any good way?

Minor typo on readme

For pretrained models, it says "speanbert_base" instead of "spanbert_base".

This was slightly confusing when using the download.sh script, as I could not tell if it was an error w/ the script config or something else.

How do you deal with WordPiece tokenization?

BERT will divide word into many subwords, do you make them together in some way to a entire representation, or leave the cluster larger and predict more tokens? And where are the codes in this rp? It bother me for a long time ......

Dropout bug

coref/independent.py

Lines 444 to 448 in 6db8cac

if segment_distance is not None:
with tf.variable_scope('segment_distance', reuse=tf.AUTO_REUSE):
segment_distance_emb = tf.gather(tf.get_variable("segment_distance_embeddings", [self.config['max_training_sentences'], self.config["feature_size"]], initializer=tf.truncated_normal_initializer(stddev=0.02)), segment_distance) # [k, emb]
span_width_emb = tf.nn.dropout(segment_distance_emb, self.dropout)
feature_emb_list.append(segment_distance_emb)

Hi, is there dropout wrong thing above?

Suggestion for doing core for longer sequences?

Hi,

First, thank you for providing this valuable resource.
According to Table 4 of the Bert paper, for long sequences with length 1152+ the performance declined.
I wonder if I want to do the coref for my dataset in which average sequence length is 1500+, do you suggest using 'spanbert' on my data as it is. Or it is better to segment the data into pieces of length 512?
Of course both has it's drawbacks in negatively effecting the performance of pertained model but which approach do you suggest?

I think there is a typo in experiments

It's original code.

bert_base = ${best}{
  num_docs = 2802
  bert_learning_rate = 1e-05
  task_learning_rate = 0.0002
  max_segment_len = 128
  ffnn_size = 3000
  train_path = ${data_dir}/train.english.128.jsonlines
  eval_path = ${data_dir}/dev.english.128.jsonlines
  conll_eval_path = ${data_dir}/dev.english.v4_gold_conll
  max_training_sentences = 11
  bert_config_file = ${best.log_root}/bert_base/bert_config.json
  vocab_file = ${best.log_root}/bert_base/vocab.txt
  tf_checkpoint = ${best.log_root}/bert_base/model.max.ckpt
  init_checkpoint = ${best.log_root}/bert_base/model.max.ckpt
}

train_bert_base = ${bert_base}{
  tf_checkpoint = ${best.log_root}/cased_L-12_H-768_A-12/bert_model.ckpt
  init_checkpoint = ${best.log_root}/cased_L-12_H-768_A-12/bert_model.ckpt
}

I think below is right.
If I want to train at first time.

train_bert_base = ${bert_base}{
  bert_config_file = ${best.log_root}/cased_L-12_H-768_A-12/bert_config.json
  vocab_file = ${best.log_root}/cased_L-12_H-768_A-12/vocab.txt
  tf_checkpoint = ${best.log_root}/cased_L-12_H-768_A-12/bert_model.ckpt
  init_checkpoint = ${best.log_root}/cased_L-12_H-768_A-12/bert_model.ckpt
}

Asserion Error coming in predict.py

So when i pass a text in the required json format.Its predicting for some examples properly.but it throws assertion error for other examples. And i found out its due to larger length of text.cant we use span bert coref for a paragraph having large sentences having total more than 512 words .

InvalidArgumentError (see above for traceback): assertion failed: [] [Condition x <= y did not hold element-wise:x (bert/embeddings/strided_slice_3:0) = ] [679] [y (bert/embeddings/assert_less_equal/y:0) = ] [512] [[Node: bert/embeddings/assert_less_equal/Assert/Assert = Assert[T=[DT_STRING, DT_STRING, DT_INT32, DT_STRING, DT_INT32], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](bert/embeddings/assert_less_equal/All, bert/embeddings/assert_less_equal/Assert/Assert/data_0, bert/embeddings/assert_less_equal/Assert/Assert/data_1, bert/embeddings/strided_slice_3, bert/embeddings/assert_less_equal/Assert/Assert/data_3, bert/embeddings/assert_less_equal/y)]]

Path error when run bert_base and spanbert_base

Hi, I tried to run python train.py <experiment> following the instruction in README. I tried <experiment>=bert_base and <experiment>=spanbert_base. and set export data_dir=/data/coref. I met the following errors:

  • bert_base
Restoring from: /data/coref/bert_base/model-57000
W1010 16:06:31.544241 140159242708736 deprecation.py:323] From /home/jiezhong/anaconda3/envs/coref/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
Traceback (most recent call last):
  File "train.py", line 41, in <module>
    saver.restore(session, ckpt.model_checkpoint_path)
  File "/home/jiezhong/anaconda3/envs/coref/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1278, in restore
    compat.as_text(save_path))
ValueError: The passed save_path is not a valid checkpoint: /data/coref/bert_base/model-57000

The bert_base directory created by ./download_pretrained.sh bert_base contains:

bert_config.json  checkpoint  events.out.tfevents.1551148806.learnfair2008  events.out.tfevents.1551148825.learnfair0213  events.out.tfevents.1551148826.learnfair0213  model.max.ckpt.data-00000-of-00001  model.max.ckpt.index  stdout.log  vocab.txt
  • spanbert_base
Restoring from: /checkpoint/danqi/coref_eval/final/base_pair_external_sl384_blr2e-05_tlr0.0001/model-57000
W1010 15:52:59.750342 139657016100608 deprecation.py:323] From /home/jiezhong/anaconda3/envs/coref/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
Traceback (most recent call last):
  File "train.py", line 41, in <module>
    saver.restore(session, ckpt.model_checkpoint_path)
  File "/home/jiezhong/anaconda3/envs/coref/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1276, in restore
    if not checkpoint_management.checkpoint_exists(compat.as_text(save_path)):
  File "/home/jiezhong/anaconda3/envs/coref/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "/home/jiezhong/anaconda3/envs/coref/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_management.py", line 372, in checkpoint_exists
    if file_io.get_matching_files(pathname):
  File "/home/jiezhong/anaconda3/envs/coref/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 363, in get_matching_files
    return get_matching_files_v2(filename)
  File "/home/jiezhong/anaconda3/envs/coref/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 384, in get_matching_files_v2
    compat.as_bytes(pattern))
tensorflow.python.framework.errors_impl.NotFoundError: /checkpoint/danqi/coref_eval/final/base_pair_external_sl384_blr2e-05_tlr0.0001; No such file or directory

It seems that the path /checkpoint/danqi/coref_eval/final/base_pair_external_sl384_blr2e-05_tlr0.0001 is hard-coded somewhere. Moreover, the spanbert_base directory created by ./download_pretrained.sh spanbert_base contains:

bert_config.json  checkpoint  events.out.tfevents.1561596094.learnfair1413  model.max.ckpt.data-00000-of-00001  model.max.ckpt.index  stdout.log  vocab.txt

Any help would be greatly appreciated!

question about subtoken_map

Hi,
I wanted to ask about subtoken_map in jsonlines file. In trial.jsonlines

{"doc_key": "bn/voa/02/voa_0210_0", 
"sentences": [["[CLS]", "Meanwhile", "Prime", "Minister", "E", "##hu", "##d", "Bar", "##ak", "told", "Israeli", "television", "he", "doubts", "a", "peace", "deal", "can", "be", "reached", "before", "Israel", "'", "s", "February", "6th", "election", ".", "He", "said", "he", "will", "now", "focus", "on", "suppress", "##ing", "Palestinian", "violence", ".", "[SEP]"]], 
"speakers": [["[SPL]", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "-", "[SPL]"]], 
"clusters": [[]], 
"sentence_map": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2], 
"subtoken_map": [0, 0, 1, 2, 3, 3, 3, 4, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30, 31, 32, 33, 33]
}

The subtoken_map is same (33) for "." and "[SEP]" tokens while in README example it differs for full-stop and [SEP] token.
So I wanted to ask which of the convention is expected?

Let's improve the state of the art once again, by using it

Hi!
I see you beated state of the art on coreference resolution (which is a fundamental NLP task!!) by using and tuning BERT.

Yet, BERT has been "obscoleted" by XLnet.
So would you like to be the first to use and fine tune it for coreference resolution just like you did with BERT? That would be very helpful for all those people that needs highly accurate coref.

https://github.com/zihangdai/xlnet

BTW I suggested the same idea to a researcher in dependency/constituency parsing a few weeks ago and they are now number 1 on the paperswithcode.com leaderboards!

Question about a Independent.

I have a question about a Independent.

If I use Independent and max_segment_len = 128, It divide all documents into 128 tokens. After 128 tokens enter BERT, It will make an embedding for them.

In this situation,
(1) Is it looking for a antecedent only within the 128 tokens?
OR
(2) Put all segments in BERT and get embedding for all tokens.
and after that, Is it looking for a antecedent within all token?

Issues using colab

Is here anyone using google colab to train this model? I have faced many problems such as:
1- The model was unable to create gold_conll files in colab, I have built them in my system locally and then uploaded it to my drive to solve this.
2- I am unable to train the model, it reports a problem related to pyhocon such as follows:

Traceback (most recent call last):
  File "train.py", line 18, in <module>
    config = util.initialize_from_env()
  File "/content/coref/util.py", line 37, in initialize_from_env
    config = pyhocon.ConfigFactory.parse_file("experiments.conf")[name]
  File "/usr/local/lib/python3.6/dist-packages/pyhocon/__init__.py", line 32, in parse_file
    return ConfigFactory.parse_string(content, os.path.dirname(filename))
  File "/usr/local/lib/python3.6/dist-packages/pyhocon/__init__.py", line 60, in parse_string
    return ConfigParser().parse(content, basedir)
  File "/usr/local/lib/python3.6/dist-packages/pyhocon/__init__.py", line 198, in parse
    ConfigParser._resolve_substitutions(config, substitutions)
  File "/usr/local/lib/python3.6/dist-packages/pyhocon/__init__.py", line 241, in _resolve_substitutions
    is_optional_resolved, resolved_value = ConfigParser._resolve_variable(config, substitution)
  File "/usr/local/lib/python3.6/dist-packages/pyhocon/__init__.py", line 223, in _resolve_variable
    col=col(substitution.loc, substitution.instring)))
pyhocon.exceptions.ConfigSubstitutionException: Cannot resolve variable ${data_dir} (line: 95, col: 16)

any help with this?

evaluate.py is broken

I simply want to use state of the art coreference resolution.
it should take an arbitrary input text made of valid english and output the resolutions of pronouns.
How can I achieve this basic usage ?

I tried both evaluate and predict

GPU=0 python evaluate.py spanbert_base give =>

...
UNKNOWN ERROR (303)
2020-06-02 15:17:49.112473: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (stephane): /proc/driver/nvidia/version does not exist
2020-06-02 15:17:49.112826: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-06-02 15:17:49.143598: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2901210000 Hz
2020-06-02 15:17:49.143965: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x565419b6ea50 executing computations on platform Host. Devices:
2020-06-02 15:17:49.144000: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): ,
Restoring from /home/stephane/b_data/spanbert_base/model.max.ckpt
2020-06-02 15:17:55.723471: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
W0602 15:17:58.632378 140016442066752 deprecation.py:323] From /home/stephane/miniconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
Traceback (most recent call last):
File "evaluate.py", line 26, in
model.evaluate(session, official_stdout=True, eval_mode=True)
File "/home/stephane/coref/independent.py", line 538, in evaluate
self.load_eval_data()
File "/home/stephane/coref/independent.py", line 532, in load_eval_data
with open(self.config["eval_path"]) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/stephane/b_data/dev.english.384.jsonlines'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.