Comments (5)
Sorry about the late response. Here's the pipleline. $gap_file_prefix
points to the path of the GAP file without the tsv prefix. $vocab_file
refers to the cased BERT vocab file.
#!/bin/bash
gap_file_prefix=$1
vocab_file=$2
python gap_to_jsonlines.py $gap_file_prefix.tsv $vocab_file
GPU=0 python predict.py bert_base $gap_file_prefix.jsonlines $gap_file_prefix.output.jsonlines
python to_gap_tsv.py $gap_file_prefix.output.jsonlines
python2 ../gap-coreference/gap_scorer.py --gold_tsv $gap_file_prefix.tsv --system_tsv $gap_file_prefix.output.tsv
- Table 2 is on test.
- The results seem to be off by 0.3 or so for BERT base. Not sure what changed. The genre has very little effect (upto 0.1 IIRC) on the number. I got to 82.4 with the default genre (bc).
from coref.
- I found all 4 numbers of e2e-coref on the first row are exactly the same as the results in the last row of Table 4 in the paper of Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns. But, they said the results are on GAP development set. I think the probability is very low that dev set and test results are totally the same. So could you make sure if results in Table 2 surely are on GAP test set, please?
- Thank you for your pipeline and bert_base result. Actually, I also got Overall score of 82.4. It is ok. However, my question is on c2f_coref model. The pipeline could be the same, but the codes should be slightly different for adapting to c2f_coref. Can you reproduce the 4 numbers of c2f-coref model?
Thanks a lot.
from coref.
- I did not run the e2e-coref model. Looks like we copied from the wrong table for that row. I will amend the paper. We definitely evaluated on the test set for BERT.
- I don't have that handy right now, and I'm traveling until mid November. IIRC the only change should be to make sure that each element of the
sentences
field should be a natural language sentence (as opposed to a paragraph as with bert). This is because c2f-coref contextualizes each sentence independently with LSTMs.
If that doesn't work, I'll take a look after I'm back. Thanks for your patience.
from coref.
- Because gap_to_jsonlines.py file is compatible with tokenizer with None, so I used it. The Overall F1 score I evaluated is 68.5, but not 73.5 on your paper. If you can reproduce it again to have a check on what codes you used, I will be appreciated so much.
from coref.
@HaixiaChai
Could you please share the detailed steps to test & evaluate this model using GAP data-set? (Want to know what changes were made for environmental setup, commands, data, etc)
I am new to this research area, and want to re-produce the results with both GAP & Onto-notes data-sets. Your valuable help will be appreciated in this regard.
Thanks!
from coref.
Related Issues (20)
- Has anyone got this to work? HOT 3
- How to specify paths correctly? HOT 1
- why not pytorch code for that?
- How to evaluate SpanBERT using sample test data? HOT 3
- assert num_words == np.sum(input_mask), (num_words, np.sum(input_mask))
- Found too many repeated mentions (> 10) in the response, so refusing to score
- Requirements txt is broken
- libprotobuf FATAL : CHECK failed: it != end(): key not found
- converting predicted (subtoken) output to normal text HOT 1
- Num_docs and evaluating docs
- Predicting singletons HOT 1
- Does this model apply to Chinese data๏ผ HOT 2
- Too many errors while installing requirements HOT 1
- Custom training data for BERT
- Has anyone reproduced successfully on windows?
- can anyone explain why the execution kept stuck here without generating any error or something else ?
- how can i choose the batch size, in which code file should I modify ?
- can anyone help me with the gpu configuration? it works well on cpu but when i turn to the model to run on gpu it opens succefully all the related libraries but crashes it some step
- Sentence index when splitting long sentences into non-overlapping chunks
- F1 79.96 on ontonotes 5 with your pretrained spanbert_large HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from coref.