kentonl / e2e-coref Goto Github PK

View Code? Open in Web Editor NEW

522.0 17.0 174.0 69 KB

End-to-end Neural Coreference Resolution

License: Apache License 2.0

Python 90.48% C++ 6.94% Shell 2.58%

e2e-coref's Introduction

Higher-order Coreference Resolution with Coarse-to-fine Inference

Introduction

This repository contains the code for replicating results from

Higher-order Coreference Resolution with Coarse-to-fine Inference
Kenton Lee, Luheng He, and Luke Zettlemoyer
In NAACL 2018

Getting Started

Install python (either 2 or 3) requirements: pip install -r requirements.txt
Download pretrained models at https://drive.google.com/file/d/1fkifqZzdzsOEo0DXMzCFjiNXqsKG_cHi
- Move the downloaded file to the root of the repo and extract: tar -xzvf e2e-coref.tgz
Download GloVe embeddings and build custom kernels by running setup_all.sh.
- There are 3 platform-dependent ways to build custom TensorFlow kernels. Please comment/uncomment the appropriate lines in the script.
To train your own models, run setup_training.sh
- This assumes access to OntoNotes 5.0. Please edit the ontonotes_path variable.

Training Instructions

Experiment configurations are found in experiments.conf
Choose an experiment that you would like to run, e.g. best
Training: python train.py <experiment>
Results are stored in the logs directory and can be viewed via TensorBoard.
Evaluation: python evaluate.py <experiment>

Demo Instructions

Command-line demo: python demo.py final
To run the demo with other experiments, replace final with your configuration name.

Batched Prediction Instructions

Create a file where each line is in the following json format (make sure to strip the newlines so each line is well-formed json):

{
  "clusters": [],
  "doc_key": "nw",
  "sentences": [["This", "is", "the", "first", "sentence", "."], ["This", "is", "the", "second", "."]],
  "speakers": [["spk1", "spk1", "spk1", "spk1", "spk1", "spk1"], ["spk2", "spk2", "spk2", "spk2", "spk2"]]
}

clusters should be left empty and is only used for evaluation purposes.
doc_key indicates the genre, which can be one of the following: "bc", "bn", "mz", "nw", "pt", "tc", "wb"
speakers indicates the speaker of each word. These can be all empty strings if there is only one known speaker.
Run python predict.py <experiment> <input_file> <output_file>, which outputs the input jsonlines with predicted clusters.

Other Quirks

It does not use GPUs by default. Instead, it looks for the GPU environment variable, which the code treats as shorthand for CUDA_VISIBLE_DEVICES.
The training runs indefinitely and needs to be terminated manually. The model generally converges at about 400k steps.

e2e-coref's People

Contributors

Stargazers

Watchers

Forkers

lxww301pku aliaspeng benjamesbabala shuolongbj sagarchaturvedi1 ml-lab jieyuzhao vikingmew phildani7 roger1993 roys174 shibhansh lreaderl changfengfeng alephic almasaskarbekov douglasli victoriasovereigne vmath89 sungjinlees davisyoshida karlstratos nyxjemk ryanzhumich talkhaldi stefpac xueguohua mkuymkuy smallsmallwood cmerwich achint08 qq547276542 gentom afcarl herbertchen1 han0ah eunsol minhlab ey4295 nitishgupta adityabantwal tanaka504 tcxdgit shubhampachori12110095 panda0881 guodj wuxiangli91 hyzcn xiaoxiong-liu elitcloud lital-basis yl3506 maigimenez tickleliu pruksmhc alexandrauma angelo337 qiuhuan kevinjesse louner orenbaldinger tsekitsi mattjburke peeyushagg ziliwang luzhongqiu javacjh uditsaxena azamat25 omkar13 abhinonymous annazare menghah tobica yangxuanyue m10an aseaday schen149 csbhagav sebastianjia liubifly nelson-liu 90217 josecoves qfxlcyc aradashrafi moinnadeem dengwc ksboy cbiehl pab0 viviqi yttas itonly kedir michkich liushui9404 sunfangbin wengbenjue pku-wuwei

e2e-coref's Issues

could not find the gold parse [/projects/WebWare6/ontonotes-release-5.0/data/files/data/arabic/annotations/nw/ann/02/ann_0237.parse] in the ontonotes distribution ... exiting ...

Please tell me how to get OntoNotes Release 5.0

On https://catalog.ldc.upenn.edu/LDC2013T19 , I cannot contact with any administrator.

So that I cannot get the access to download it. However, I can't find it anywhere else...(github or goole)

Could you please offer me a link to download it?

Error while running the pretrained model

I am trying to use the pretrained model on my own data set and find the coreference resolutions. However, after loading the pretrained model using setup_pretrained.sh, when I try to run python singleton.py best I am getting an IOError that glove.840B.300d.txt.filtered file is not found. I checked the stack trace and my directory files and the txt.filtered is not created any time during the execution.

Am I missing anything here, please clarify. Also, can you please tell me what exactly I need to do if I intend to use your model to work on my dataset and give me the output? How do I specify my input file (assume it is a text file with the data) for the program to process and give output?

NotFoundError when trying to run Singleton.py

Hi.

I am trying to implement your module for a project that I am currently working on. And there's an issue that I am facing when I try to set it up as per the instructions given.

The following traceback is generated when I run python singleton.py best
I am using Python 2.7 along with Tensorflow1.4.0 on a Ubuntu 16.04 LTS Base. I'm not really sure how to resolve this error, or where should I ideally look to resolve.

Traceback (most recent call last): File "singleton.py", line 12, in <module> import coref_model as cm File "/home/sgoutam/e2e-coref/coref_model.py", line 10, in <module> import coref_ops File "/home/sgoutam/e2e-coref/coref_ops.py", line 4, in <module> coref_op_library = tf.load_op_library("./coref_kernels.so") File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library lib_handle = py_tf.TF_LoadLibrary(library_filename, status) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.NotFoundError: ./coref_kernels.so: undefined symbol: _ZTIN10tensorflow8OpKernelE

gpu ERROR

i try to use the gpu to training the model ，and meet an error
F tensorflow/stream_executor/cuda/cuda_dnn.cc:222] Check failed: s.ok() could not find cudnnCreate in cudnn DSO; dlerror: /home/chenbo/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so: undefined symbol: cudnnCreate
Aborted (core dumped)

CUDA Version 8.0.61
Cudnn version #define CUDNN_MAJOR 6

has the cuda or cudnn specified to run it on GPU? and how to set the GPUs in shell?
GPU=0，1 python singleton.py best doesn't work

Report error for batched prediction

When I try to do the Batched Prediction with the command "python decoder.py final output.txt", it gives me the error about the dimension of two tensors don't match as the following:
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [1,8] rhs shape= [115,8]
[[Node: save/Assign_15 = Assign[T=DT_FLOAT, _class=["loc:@char_embeddings"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](char_embeddings/Adam, save/RestoreV2_15)]]

I've successfully run the setup_all.sh and setup_pretrained.sh before doing this.
Thanks in advance

Error occurred in the middle of building kernel

Got this error when i tried running "setup_all.sh"
I'm using Mac and uncommented right line for mac.
Using "virtualenv" to setup environment where tensorflow version is 1.0.0rc2.

In file included from coref_kernels.cc:5:
In file included from /Users/jhshin/tensorflow/lib/python2.7/site-packages/tensorflow/include/tensorflow/core/framework/shape_inference.h:21:
In file included from /Users/jhshin/tensorflow/lib/python2.7/site-packages/tensorflow/include/tensorflow/core/framework/node_def_util.h:23:
In file included from /Users/jhshin/tensorflow/lib/python2.7/site-packages/tensorflow/include/tensorflow/core/framework/attr_value_util.h:22:
In file included from /Users/jhshin/tensorflow/lib/python2.7/site-packages/tensorflow/include/tensorflow/core/framework/partial_tensor_shape.h:21:
In file included from /Users/jhshin/tensorflow/lib/python2.7/site-packages/tensorflow/include/third_party/eigen3/unsupported/Eigen/CXX11/Tensor:4:
In file included from /Users/jhshin/tensorflow/lib/python2.7/site-packages/tensorflow/include/unsupported/Eigen/CXX11/Tensor:149:
/Users/jhshin/tensorflow/lib/python2.7/site-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorStorage.h:39:7: error: class template partial specialization is not
more specialized than the primary template [-Winvalid-partial-specialization]
class TensorStorage<T, FixedDimensions, Options_>
^
/Users/jhshin/tensorflow/lib/python2.7/site-packages/tensorflow/include/unsupported/Eigen/CXX11/src/Tensor/TensorStorage.h:34:63: note: template is declared here
template<typename T, typename Dimensions, int Options_> class TensorStorage;

Tagged Clusters

Within the tagged clusters, are we able to determine which word is the main speaker?

I follow your requirements ,and when running"python demo.py final", i get the erors , and i have run setup_all.sh file before running demo.py

Traceback (most recent call last):
File "demo.py", line 14, in
import coref_model as cm
File "/home/wlk/PycharmProjects/e2e/coref_model.py", line 10, in
import coref_ops
File "/home/wlk/PycharmProjects/e2e/coref_ops.py", line 4, in
coref_op_library = tf.load_op_library("./coref_kernels.so")
File "/home/wlk/anaconda3/envs/e2e/lib/python2.7/site-packages/tensorflow/python/framework/load_library.py", line 64, in load_op_library
None, None, error_msg, error_code)
tensorflow.python.framework.errors_impl.NotFoundError: ./coref_kernels.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringB5cxx11Ev

Could you release the latest source code？

Could you release the latest source code for the paper named "Higher-order Coreference Resolution with Coarse-to-fine Inference", please?

How to reimplement emnlp 2017 model using this version of code?

Hi Kenton, thanks for sharing the code.
I am wondering can I reimplement emnlp's paper using this code by just changing the configuration file? If so, which configurations do I need to modify? Many thanks.

Training specs

You mentioned that training took you 48 hours to complete 400K steps to convergence, but I can't find the specs of your training machine. Was that on GPU or CPU? How big?
I am training your model on Arabic CoNLL and it took me about 11 hours to complete 7600 steps only. A simple math shows that to reach 400K steps following this rate, I would need 578 hours! I am running on a machine that has 8 vcpus, and I wanted to check with you before questioning my training.

What is the doc_key genre?

doc_key indicates the genre, which can be one of the following: "bc", "bn", "mz", "nw", "pt", "tc", "wb"

I am not sure what the different options mean and which I should use.

Error running pre-trained: coref_kernels.so: undefined symbol

I'm running into a problem running the pre-trained setup on Ubuntu, Python 2.7, TF==1.0.0:

e2e-coref$ python demo.py final
Traceback (most recent call last):
  File "demo.py", line 14, in <module>
    import coref_model as cm
  File "/home/ghost/e2e_coref/e2e-coref/coref_model.py", line 10, in <module>
    import coref_ops
  File "/home/ghost/e2e_coref/e2e-coref/coref_ops.py", line 4, in <module>
    coref_op_library = tf.load_op_library("./coref_kernels.so")
  File "/home/ghost/.local/lib/python2.7/site-packages/tensorflow/python/framework/load_library.py", line 64, in load_op_library
    None, None, error_msg, error_code)
tensorflow.python.framework.errors_impl.NotFoundError: ./coref_kernels.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringB5cxx11Ev

About languages

Can I use it for coreference resolution in Spanish?

how can I access to OntoNotes 5.0 data?

I can not find the download button in this website(https://catalog.ldc.upenn.edu/LDC2013T19) even if I register an account.

How about the performance on ontonotes v5.0?

Have you ever run experiments on ontonotes v5.0 dataset?
I tried it without changing any training configuration except switching the data from v4.0 to v5.0 and setting lm_path to None. But the best average F1 score on development dataset is only 61 after 200k steps and it reached a plateau. Hope for your reply, thanks a lot.

can i run those code in windows?

can i run those code in windows? #42

Out of memory

I set up the environment correctly and generated 2 samples with 2 input records.(which means it could handle a small input and my setup should be correct.)
But the OS killed the process since memory error. Either the program consumed too much with my input or there is memory leak in the program.

My evaluation input is about 200 MB.

Missing requirements.txt

In the installation instructions, it is stated that:

Install python (either 2 or 3) requirements: pip install -r requirements.txt

but there is no requirements.txt in the repo.

how to train pretrained model by myself data ?? Is there any code for pretrained???

Converting co-reference chains from conlll format to jsonlines format of clusters.

First of all, thank you for sharing your code.

I am unable to understand how the co-reference chains from conlll files have been converted to the cluster format (the numbers in the clusters seem arbitrary !!).

Thanks,

Evaluation on GPU

When I set GPU=0 the tensorflow does recognize the GPU, but after loading the checkpoint the code silently dies.
The code runs fine without setting the GPU environment variable. Is this a known problem?

P.S. I'm running with CUDA version 9.0 and CuDNN version 7.1

char_vocab.english.txt not accessible

Hello,

I'm trying to go through the README steps, to run python demo.py final

It seems like the latest commit removed the lines to curl char_vocab.english.txt.
Doing curl -o char_vocab.english.txt https://www.googleapis.com/storage/v1/b/e2e-coref/o/char_vocab.english.txt?alt=media directly leaves me with an empty file though.

After that demo.py crashes at line 47 with a tensorshape mismatch in the coref_model.py at line 90.
I suspect it might be related to my char_vocab.english.txt being empty?

Could you tell me how I could get this file?

EDIT: Found what I needed here: https://raw.githubusercontent.com/luheng/lsgn/master/embeddings/char_vocab.english.txt

the result seems strange

[217400] loss=0.10, steps/s=5933.69

i run test_singleton at the 217400 epoch, only get the result as follows:

Identification of Mentions: Recall: (9392 / 19155) 49.03% Precision: (9392 / 10875) 86.36% F1: 62.55%

Coreference: Recall: (6130 / 14610) 41.95% Precision: (6130 / 8390) 73.06% F1: 53.3%

Official result for bcub
version: 8.01 /home/chenbo/e2e-coref-master/conll-2012/scorer/v8.01/lib/CorScorer.pm
Repeated mention in the key: 656, 656 7290

====== TOTALS =======
Identification of Mentions: Recall: (9392 / 19155) 49.03% Precision: (9392 / 10875) 86.36% F1: 62.55%

Coreference: Recall: (5691.75228764641 / 19156) 29.71% Precision: (6630.35763531834 / 10875) 60.96% F1: 39.95%

Official result for ceafe
version: 8.01 /home/chenbo/e2e-coref-master/conll-2012/scorer/v8.01/lib/CorScorer.pm
Repeated mention in the key: 656, 656 7290

====== TOTALS =======
Identification of Mentions: Recall: (9392 / 19155) 49.03% Precision: (9392 / 10875) 86.36% F1: 62.55%

Coreference: Recall: (1271.20096754289 / 4546) 27.96% Precision: (1271.20096754289 / 2485) 51.15% F1: 36.15%

it's the right result? or i've do something wrong...

Some questions about your pretrained model.

Kenton Lee:
Hello, I am a student in XJTU from China. I have some questions on your projects [e2e-coref] on GitHub. (https://github.com/kentonl/e2e-coref)
When I run your pretrained model, the Avg.F1 on dev dataset was about 73.4% and the output on test dataset was the same as the paper(Higher-order Coreference Resolution with Coarse-to-fine Inference). But when I train my own model with the code on GitHub, the Avg.F1 on dev dataset was about 73.1%. I have trained it followed the description on the website converging at about 400k steps for there times. The output was always about 73.1%.
Is there anything wrong with my operation? Could you help me to solve the issue?
Sincerely.
Xiangyu Zhou

About the Chinese Model.

Sorry. I try my best to access OntoNotes 5.0 data but it didn`t work.
Can you offer the pretrained Chinese model? Thx.

Evaluate

How to change the experiments.conf for only evaluation?
I really get stuck in that...

Is the model suitable for Chinese？

I wanna train a Chinese coreference resolution model, but I have not the OntoNote 5.0 corpus ...
So I want to know whether you have pre-trained Chinese model, or OntoNote 5.0 corpus..
Thanks you very mach!!!

does this support batch_size > 1 ?

it seems hard to download the wordembedding

i got the error when i run the script setup_all.sh.....
curl: (7) Failed to connect to appositive.cs.washington.edu port 80: Connection timed out

how can i get the wordembedding directly, i m in China....

setup_all.sh Error!

~/Downloads/e2e-coref-master » ./setup_all.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1239M  100 1239M    0     0   689k      0  0:30:42  0:30:42 --:--:-- 1405k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2075M  100 2075M    0     0  4884k      0  0:07:15  0:07:15 --:--:-- 3470k
Archive:  glove.840B.300d.zip
  inflating: glove.840B.300d.txt     
Traceback (most recent call last):
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: dlopen(/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so, 10): Library not loaded: @rpath/libcublas.8.0.dylib
  Referenced from: /Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
  Reason: image not found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import *
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 51, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 52, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: dlopen(/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so, 10): Library not loaded: @rpath/libcublas.8.0.dylib
  Referenced from: /Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
  Reason: image not found


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.
Traceback (most recent call last):
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: dlopen(/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so, 10): Library not loaded: @rpath/libcublas.8.0.dylib
  Referenced from: /Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
  Reason: image not found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import *
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 51, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 52, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: dlopen(/Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so, 10): Library not loaded: @rpath/libcublas.8.0.dylib
  Referenced from: /Users/nityansuman/anaconda3/envs/e2e-coref/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
  Reason: image not found


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.
coref_kernels.cc:3:10: fatal error: 'tensorflow/core/framework/op.h' file not found
#include "tensorflow/core/framework/op.h"
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
(e2e-coref) ------------------------------------------------------------

Resources Timeout

Hi, dear Kenton, thank you for sharing your code, but I have a problem in downloading your data. I cannot connect to 'http://lsz-gpu-01.cs.washington.edu/resources/coref/*' though I can connect to 'https://cs.washington.edu'. Is the URL still available? Thank you. Have a nice day.

The code is compatible with Tensorflow 1.12

I tested the code with tf 1.12 and it works well.
It is helpful for people that use tf 1.12 because cudnn of tf 1.7 and 1.12 aren't compatible.

This is a suggestion more than an issue.... feel free to close it

Example train

Can you upload an example file of train dataset ?
(english_v4_auto_conll, train.english.jsonlines )
Thanks

How exactly does head embedding get used?

glove_50_300_2.txt is downloaded as head_embeddings. What exactly does the 50 refer to?
How are these used in coref_model.py? It seems that context_outputs is computed a LSTM over the ELMo embeddings. Then for each span, the context_outputs give a distribution over the tokens. But then then this distribution is used to get a weighted sum over the head_embeddings.

e2e-coref/coref_model.py

Line 379 in a24d107

span_head_emb = tf.reduce_sum(span_attention * span_text_emb, 1) # [k, emb]

What I want to confirm is that, head_embeddings are not used to calculate the distribution over head_embeddings?
Also from the papers "Experimental Setup", I didn't follow the meaning of "window size" for word embeddings and LSTM?
using GloVe word embeddings (Pennington et al., 2014) with a window size of 2 for the head word embeddings and a window size of 10 for the LSTM inputs.

Memory Usage Inquiries

First off thank you for the code! I wanted to know your thoughts about bringing down the memory usage while running demo.py. Am I right in assuming that not using the full glove set will bring down the performance since?

Demo is not working

It seems query.cgi is returning 503 (Service Unavailable)...

Runtime Error

I installed the code and tried it for some time. However, several days ago suddently it stopped working. I am getting the following error:
Done loading word embeddings.
Traceback (most recent call last):
File "demo.py", line 45, in
model = cm.CorefModel(config)
File "/Users/vadims/anaphora/e2e-coref-master/coref_model.py", line 58, in init
self.predictions, self.loss = self.get_predictions_and_loss(self.input_tensors)
File "/Users/vadims/anaphora/e2e-coref-master/coref_model.py", line 250, in get_predictions_and_loss
elmo_module = hub.Module("https://tfhub.dev/google/elmo/2")
File "/Users/vadims/anaconda/lib/python2.7/site-packages/tensorflow_hub/module.py", line 147, in init
self._spec = as_module_spec(spec)
File "/Users/vadims/anaconda/lib/python2.7/site-packages/tensorflow_hub/module.py", line 36, in as_module_spec
return load_module_spec(spec)
File "/Users/vadims/anaconda/lib/python2.7/site-packages/tensorflow_hub/module.py", line 61, in load_module_spec
return registry.loader(path)
File "/Users/vadims/anaconda/lib/python2.7/site-packages/tensorflow_hub/registry.py", line 45, in call
self._name, args, kwargs))
RuntimeError: Missing implementation that supports: loader(('/var/folders/2g/b2mxd7dn23j_myf6hm3yfbzc0000gn/T/tfhub_modules/9bb74bc86f9caffc8c47dd7b33ec4bb354d9602d',), **{})

Any ideas?
Thanks.

Why does it not stop until 500k epoches??

And I don't see any "model.max.ckpt" file!

Can't find coref_kernells.so

Could you show me where the 'coref_kernels.so' file is?

Traceback (most recent call last): File "train.py", line 10, in <module> import coref_model as cm File "/home/theanh/git/e2e-coref/coref_model.py", line 17, in <module> import coref_ops File "/home/theanh/git/e2e-coref/coref_ops.py", line 8, in <module> coref_op_library = tf.load_op_library("./coref_kernels.so") File "/home/theanh/anaconda3/envs/coref/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 58, in load_op_library lib_handle = py_tf.TF_LoadLibrary(library_filename, status) File "/home/theanh/anaconda3/envs/coref/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.NotFoundError: ./coref_kernels.so: cannot open shared object file: No such file or directory

Small issue at setup_pretrained.sh last commit

Hi,

The last commit of 2 days ago updated the script for setup_pretrained (some dowload links for the resources were not available), and I think that you forget to remove an old command:

old commit:

curl -O http://lsz-gpu-01.cs.washington.edu/resources/coref/$ckpt_file

new commit:

download_from_gcs $ckpt_file
curl -O http://lsz-gpu-01.cs.washington.edu/resources/coref/$ckpt_file

In the new commit, I would say that the second line is the old link duplicated.

how to train with one gpu

my machine has only one gpu. I changed experiment.conf as follows:

 two_local_gpus {
   addresses {
     ps = [localhost:2222]
-    worker = [localhost:2223, localhost:2224]
+    worker = [localhost:2223]
   }
-  gpus = [0, 1]
+  gpus = [0]
 }

when I run python train.py best
it prints 2018-10-23 12:01:57.495403: E tensorflow/stream_executor/cuda/cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_NO_DEVICE

I checked tensorflow with gpu by:

import tensorflow as tf
with tf.device('/gpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)

with tf.Session() as sess:
    print (sess.run(c))

2018-10-23 12:05:53.137069: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-10-23 12:05:53.137468: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: Quadro P3000 major: 6 minor: 1 memoryClockRate(GHz): 1.215
pciBusID: 0000:01:00.0
totalMemory: 5.94GiB freeMemory: 4.95GiB
2018-10-23 12:05:53.137501: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-10-23 12:05:53.321772: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-10-23 12:05:53.321825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
2018-10-23 12:05:53.321834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
2018-10-23 12:05:53.322015: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4718 MB memory) -> physical GPU (device: 0, name: Quadro P3000, pci bus id: 0000:01:00.0, compute capability: 6.1)
[[22. 28.]
[49. 64.]]

what's wrong with it?

Converts " to ``

Submitting something with quotations, say

he said "I shall have no more of this!"

gives back

he said ``I shall have no more of this~''

The first instance of " is converted to two backticks and the second is converted to two single '

Is this intended?

Faster inference

Is there a way to run considerably faster inference for a low tradeoff in accuracy without retraining the model? For example, any parameters I can change in the experiments.conf.

Lee et al (2017) vs Lee et al (2018)

Which model does the python demo.py final use ? The model explained in Lee et al (2017) or the latest one introduced in Lee et al (2018)? If it is the second one, is there a pretrained model for Lee et al 2017 that we can use in the same way ?

Loss function for help

Can you give this Loss function. Many thanks@Kenton Lee

coref_kernels.so not found

I am using tensorflow=1.11.0 version.
While running "python demo.py final", I get this error:
Traceback (most recent call last):
File "demo.py", line 7, in
import coref_model as cm
File "C:\Users\ks\H_S\e2e-coref\coref_model.py", line 17, in
import coref_ops
File "C:\Users\ks\H_S\e2e-coref\coref_ops.py", line 8, in
coref_op_library = tf.load_op_library("./coref_kernels.so")
File "C:\Users\ks\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\load_library.py", line 56, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: .\coref_kernels.so not found

How to solve this?

Conll Scores for .jsonlines

Hi,
first of all thank you for the release of the code. I managed to train on my own data and got a bunch of models. My next step would be to get the conll scores, however i only got my test document converted to .jsonlines. Is there any easy way to get the conll scores for those?

How does the use of speaker data affect the model benchmarks?

speakers indicates the speaker of each word. These can be all empty strings if there is only one known speaker.

In practical usage its hard to know the speaker of each word in advance it would be useful to know how much does this affect performance of the pretrained model?

kentonl / e2e-coref Goto Github PK

e2e-coref's Introduction

Higher-order Coreference Resolution with Coarse-to-fine Inference

Introduction

Getting Started

Training Instructions

Demo Instructions

Batched Prediction Instructions

Other Quirks

e2e-coref's People

Contributors

Stargazers

Watchers

Forkers

e2e-coref's Issues

On https://catalog.ldc.upenn.edu/LDC2013T19 , I cannot contact with any administrator.

So that I cannot get the access to download it. However, I can't find it anywhere else...(github or goole)

Could you please offer me a link to download it?

Identification of Mentions: Recall: (9392 / 19155) 49.03% Precision: (9392 / 10875) 86.36% F1: 62.55%

Coreference: Recall: (6130 / 14610) 41.95% Precision: (6130 / 8390) 73.06% F1: 53.3%

====== TOTALS ======= Identification of Mentions: Recall: (9392 / 19155) 49.03% Precision: (9392 / 10875) 86.36% F1: 62.55%

Coreference: Recall: (5691.75228764641 / 19156) 29.71% Precision: (6630.35763531834 / 10875) 60.96% F1: 39.95%

====== TOTALS ======= Identification of Mentions: Recall: (9392 / 19155) 49.03% Precision: (9392 / 10875) 86.36% F1: 62.55%

Recommend Projects

Recommend Topics

Recommend Org

====== TOTALS =======
Identification of Mentions: Recall: (9392 / 19155) 49.03% Precision: (9392 / 10875) 86.36% F1: 62.55%

====== TOTALS =======
Identification of Mentions: Recall: (9392 / 19155) 49.03% Precision: (9392 / 10875) 86.36% F1: 62.55%