Table 2 shows many systems results on GAP, could I ask it is on GAP dev dataset

I found all 4 numbers of e2e-coref on the first row are exactly the same as the

I did not run the e2e-coref model. Looks like we copied from the wrong table for

Because gap_to_jsonlines.py file is compatible with tokenizer with Non

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Questions on Table 2 of Bert paper about coref HOT 5 OPEN

mandarjoshi90 commented on May 28, 2024

Questions on Table 2 of Bert paper

from coref.

Comments (5)

mandarjoshi90 commented on May 28, 2024

Sorry about the late response. Here's the pipleline. $gap_file_prefix points to the path of the GAP file without the tsv prefix. $vocab_file refers to the cased BERT vocab file.

#!/bin/bash
gap_file_prefix=$1
vocab_file=$2
python gap_to_jsonlines.py $gap_file_prefix.tsv $vocab_file
GPU=0 python predict.py bert_base $gap_file_prefix.jsonlines $gap_file_prefix.output.jsonlines
python to_gap_tsv.py $gap_file_prefix.output.jsonlines
python2 ../gap-coreference/gap_scorer.py --gold_tsv $gap_file_prefix.tsv --system_tsv $gap_file_prefix.output.tsv

Table 2 is on test.
The results seem to be off by 0.3 or so for BERT base. Not sure what changed. The genre has very little effect (upto 0.1 IIRC) on the number. I got to 82.4 with the default genre (bc).

from coref.

HaixiaChai commented on May 28, 2024

I found all 4 numbers of e2e-coref on the first row are exactly the same as the results in the last row of Table 4 in the paper of Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns. But, they said the results are on GAP development set. I think the probability is very low that dev set and test results are totally the same. So could you make sure if results in Table 2 surely are on GAP test set, please?
Thank you for your pipeline and bert_base result. Actually, I also got Overall score of 82.4. It is ok. However, my question is on c2f_coref model. The pipeline could be the same, but the codes should be slightly different for adapting to c2f_coref. Can you reproduce the 4 numbers of c2f-coref model?

Thanks a lot.

from coref.

mandarjoshi90 commented on May 28, 2024

I did not run the e2e-coref model. Looks like we copied from the wrong table for that row. I will amend the paper. We definitely evaluated on the test set for BERT.
I don't have that handy right now, and I'm traveling until mid November. IIRC the only change should be to make sure that each element of the sentences field should be a natural language sentence (as opposed to a paragraph as with bert). This is because c2f-coref contextualizes each sentence independently with LSTMs.

If that doesn't work, I'll take a look after I'm back. Thanks for your patience.

from coref.

HaixiaChai commented on May 28, 2024

Because gap_to_jsonlines.py file is compatible with tokenizer with None, so I used it. The Overall F1 score I evaluated is 68.5, but not 73.5 on your paper. If you can reproduce it again to have a check on what codes you used, I will be appreciated so much.

from coref.

Hafsa-Masroor commented on May 28, 2024

@HaixiaChai
Could you please share the detailed steps to test & evaluate this model using GAP data-set? (Want to know what changes were made for environmental setup, commands, data, etc)
I am new to this research area, and want to re-produce the results with both GAP & Onto-notes data-sets. Your valuable help will be appreciated in this regard.

Thanks!

from coref.

Questions on Table 2 of Bert paper about coref HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent