sheffieldnlp / stance-conditional Goto Github PK

70.0 6.0 8.0 69 KB

Stance Detection with Conditional Encoding

Python 94.28% Perl 5.72%

stance-conditional's Issues

Stopping criterion for training

Use a more sensible stopping criterion than just a fixed number of epochs

cannot read the son file when do word2vec training

I find that when I try to read the File （additionalTweetsStanceDetection.json and additionalTweetsStanceDetectionBig.json） to train word2vec，I can't decode the json file and there are some UnicodeError. Could you please help to train the word2vec model with 300 dimensions? Thank you very much!
[

can not find automatic labeling method

In the paper, it said, "the automatic labeling method is publicly available.", but I can not find the automatic labeling code in this repository.
Can you provide the automatic labeling code, please?
Thanks.

Make sure that initialized parameters are small (e.g. between -0.1 and 0.1) to avoid extreme predictions in the beginning

Use DropoutWrapper from tensorflow.models.rnn.rnn_cell

problem with imports

Got a fresh copy from github and initialized modules and pythonpath:

echo $PYTHONPATH
/Users/andreasvlachos/Work/git/stance-conditional/twokenize_wrapper:/Users/andreasvlachos/Work/git/stance-conditional/

But when I run:
stancedetection andreasvlachos$ python3 word2vec_training.py

I get:
Traceback (most recent call last):
File "word2vec_training.py", line 4, in
from preprocess import tokenise_tweets, build_dataset, transform_tweet, transform_labels
File "/Users/andreasvlachos/Work/git/stance-conditional/stancedetection/preprocess.py", line 4, in
from twokenize_wrapper.twokenize import tokenize
ImportError: No module named 'twokenize_wrapper.twokenize'; 'twokenize_wrapper' is not a package

Could you check? I was getting past this in the previous commit.

Separate embeddings matrices for tweet and target

Replicating the results.

We have experimented a lot using the model and are unable to replicate the results of the paper. We have got a max F1 Score of 0.5637 for the hyperparameters already present in the code. Any changes to them and even on changing random seed deteriorates the result.
What can we do to replicate the results?

fullp variable

Run conditional.py successfully, but in the end I got the message. Had a look in conditional.py but fullp is not set anywhere.

Applying to test data, getting predictions for NONE/AGAINST/FAVOR
Num testing samples 70 Acc 0.34285714285714286 Correct 24 Total 70
Num testing samples 140 Acc 0.40714285714285714 Correct 57 Total 140
Num testing samples 210 Acc 0.4 Correct 84 Total 210
Num testing samples 280 Acc 0.40714285714285714 Correct 114 Total 280
Num testing samples 350 Acc 0.3914285714285714 Correct 137 Total 350
Num testing samples 420 Acc 0.3761904761904762 Correct 158 Total 420
Num testing samples 490 Acc 0.37551020408163266 Correct 184 Total 490
Num testing samples 560 Acc 0.38392857142857145 Correct 215 Total 560
Num testing samples 630 Acc 0.3904761904761905 Correct 246 Total 630
Num testing samples 700 Acc 0.3942857142857143 Correct 276 Total 700
Num testing samples 770 Acc 0.4090909090909091 Correct 315 Total 770
Traceback (most recent call last):
File "conditional.py", line 727, in
readInputAndEval(tests, outfile, hid, max_epochs, "tanh", drop, "most", str(i), modelt, w2v, acc_thresh=1)
File "conditional.py", line 668, in readInputAndEval
writer.eval(testdata, outfile, evalscript=fullp + "eval.pl")
NameError: name 'fullp' is not defined

generating stance results with new dataset?

I am trying to figure out what is the best way to utilize what you have and move beyond semeval-2016 datasets. I was able to follow/execute word2vec_training.py and generate the out file "skip_nostop_single_100features_5minwords_5context_big". The output file looks like a binary file and not readable.(is that just a model file?) . I am wondering how to use this code to take a list of tweets and produce a file resemble the gold_toy.txt file from eval.pl. Ideally,I'd take a json of tweets and then get a list of AGAINST, PRO, NONE value towards target.

This is not really an issue.

cannot download corpus

we can not download the file "additionalTweetsStanceDetection.json". Why？

Add parameter for changing loss for NONE

...because it doesn't contribute to Macro F1

Ignore the NONE training examples:

multiplying the total loss by 1-target[0](so if the class is NONE the loss is 0)
remove NONE examples from the training+dev corpus.

Or downweigh them:
loss = (1-(target[0] * (1 - alpha))) * loss

So for target class NONE and alpha 0.1 we would get:
loss = (1-(1_0.9))_loss = 0.1*loss

Set a random seed in TF for reproducible runs

Bi-directional conditional encoding

Encode target bi-directionally: you get vectors y_l and y_r (one representation from reading the target from left, the other one from reading the target from the right)
Conditionally encode tweet, but in a bi-directional way, that is (i) encode from left to right with initialization using y_l yielding x_l and (ii) encode from right to left with initialization using y_r yielding x_r
For classification concatenate x_l and x_r

Missing twokenize_wrapper module?

I am getting the following when I run python stancedetection/word2vec_training.py.

If I got it right, it is needed to use this:

https://github.com/brendano/tweetmotif

But the twokenize submodule is empty. Am I missing something?

Attention

Since targets and tweets are quite short attention might not have a big impact here, hence low priority.

Alternative accuracy hook that ignores NONE

Add an alternative accuracy hook for evaluation on the training set that ignores the NONE class as it doesn't contribute to Macro F1.

Feeding target representation during tweet processing

Another variant of conditional encoding that could be helpful is to encode the target resulting in a vector v_t and then feed the word representations of the tweet as well as the target representation at every step of LSTM encoding. That way the tweet LSTM does not have to maintain the representation of the target in it's memory.

downloaded_Donald_Trump.txt

When I run it, I get this:

python word2vec_training.py 
Traceback (most recent call last):
  File "word2vec_training.py", line 67, in <module>
    tweets_trump, targets_trump, labels_trump, ids_trump = reader.readTweetsOfficial("../data/downloaded_Donald_Trump.txt", "utf-8", 1)
  File "/Users/andreasvlachos/Work/git/stance-conditional/readwrite/reader.py", line 17, in readTweetsOfficial
    for line in io.open(tweetfile, encoding=encoding, mode='r'):
FileNotFoundError: [Errno 2] No such file or directory: '../data/downloaded_Donald_Trump.txt'

Is downloaded_Donald_Trump.txt the same file as downloaded_Donald_Trump_all.txt (which is in the dropbox folder)?

Version issue

  Hello, I am run your paogram, but  my python version is 3.6, so have some issue, can you give me some idea to deal with it?

File not found

After running word2vec successfully, I get the file named:

skip_nostop_single_100features_5minwords_5context

I am running:

python3 conditional.py

But I get:

FileNotFoundError: [Errno 2] No such file or directory: '/Users/Isabelle/Documents/TextualEntailment/SemEvalStance/stance-conditional-acl2016-fresh/out/skip_nostop_single_100features_5minwords_5context_big'

I looked at the code and it seems like there is some hard coding and I am not sure which branch I should follow. Could you have a look?

sheffieldnlp / stance-conditional Goto Github PK

stance-conditional's Issues

Recommend Projects

Recommend Topics

Recommend Org