sheffieldnlp / stance-conditional Goto Github PK
View Code? Open in Web Editor NEWStance Detection with Conditional Encoding
Stance Detection with Conditional Encoding
Use a more sensible stopping criterion than just a fixed number of epochs
In the paper, it said, "the automatic labeling method is publicly available.", but I can not find the automatic labeling code in this repository.
Can you provide the automatic labeling code, please?
Thanks.
Got a fresh copy from github and initialized modules and pythonpath:
echo $PYTHONPATH
/Users/andreasvlachos/Work/git/stance-conditional/twokenize_wrapper:/Users/andreasvlachos/Work/git/stance-conditional/
But when I run:
stancedetection andreasvlachos$ python3 word2vec_training.py
I get:
Traceback (most recent call last):
File "word2vec_training.py", line 4, in
from preprocess import tokenise_tweets, build_dataset, transform_tweet, transform_labels
File "/Users/andreasvlachos/Work/git/stance-conditional/stancedetection/preprocess.py", line 4, in
from twokenize_wrapper.twokenize import tokenize
ImportError: No module named 'twokenize_wrapper.twokenize'; 'twokenize_wrapper' is not a package
Could you check? I was getting past this in the previous commit.
We have experimented a lot using the model and are unable to replicate the results of the paper. We have got a max F1 Score of 0.5637 for the hyperparameters already present in the code. Any changes to them and even on changing random seed deteriorates the result.
What can we do to replicate the results?
Run conditional.py successfully, but in the end I got the message. Had a look in conditional.py but fullp is not set anywhere.
Applying to test data, getting predictions for NONE/AGAINST/FAVOR
Num testing samples 70 Acc 0.34285714285714286 Correct 24 Total 70
Num testing samples 140 Acc 0.40714285714285714 Correct 57 Total 140
Num testing samples 210 Acc 0.4 Correct 84 Total 210
Num testing samples 280 Acc 0.40714285714285714 Correct 114 Total 280
Num testing samples 350 Acc 0.3914285714285714 Correct 137 Total 350
Num testing samples 420 Acc 0.3761904761904762 Correct 158 Total 420
Num testing samples 490 Acc 0.37551020408163266 Correct 184 Total 490
Num testing samples 560 Acc 0.38392857142857145 Correct 215 Total 560
Num testing samples 630 Acc 0.3904761904761905 Correct 246 Total 630
Num testing samples 700 Acc 0.3942857142857143 Correct 276 Total 700
Num testing samples 770 Acc 0.4090909090909091 Correct 315 Total 770
Traceback (most recent call last):
File "conditional.py", line 727, in
readInputAndEval(tests, outfile, hid, max_epochs, "tanh", drop, "most", str(i), modelt, w2v, acc_thresh=1)
File "conditional.py", line 668, in readInputAndEval
writer.eval(testdata, outfile, evalscript=fullp + "eval.pl")
NameError: name 'fullp' is not defined
I am trying to figure out what is the best way to utilize what you have and move beyond semeval-2016 datasets. I was able to follow/execute word2vec_training.py and generate the out file "skip_nostop_single_100features_5minwords_5context_big". The output file looks like a binary file and not readable.(is that just a model file?) . I am wondering how to use this code to take a list of tweets and produce a file resemble the gold_toy.txt file from eval.pl. Ideally,I'd take a json of tweets and then get a list of AGAINST, PRO, NONE value towards target.
This is not really an issue.
we can not download the file "additionalTweetsStanceDetection.json". Why?
...because it doesn't contribute to Macro F1
Ignore the NONE training examples:
Or downweigh them:
loss = (1-(target[0] * (1 - alpha))) * loss
So for target class NONE and alpha 0.1 we would get:
loss = (1-(1_0.9))_loss = 0.1*loss
I am getting the following when I run python stancedetection/word2vec_training.py
.
If I got it right, it is needed to use this:
https://github.com/brendano/tweetmotif
But the twokenize submodule is empty. Am I missing something?
Since targets and tweets are quite short attention might not have a big impact here, hence low priority.
Add an alternative accuracy hook for evaluation on the training set that ignores the NONE class as it doesn't contribute to Macro F1.
Another variant of conditional encoding that could be helpful is to encode the target resulting in a vector v_t and then feed the word representations of the tweet as well as the target representation at every step of LSTM encoding. That way the tweet LSTM does not have to maintain the representation of the target in it's memory.
When I run it, I get this:
python word2vec_training.py
Traceback (most recent call last):
File "word2vec_training.py", line 67, in <module>
tweets_trump, targets_trump, labels_trump, ids_trump = reader.readTweetsOfficial("../data/downloaded_Donald_Trump.txt", "utf-8", 1)
File "/Users/andreasvlachos/Work/git/stance-conditional/readwrite/reader.py", line 17, in readTweetsOfficial
for line in io.open(tweetfile, encoding=encoding, mode='r'):
FileNotFoundError: [Errno 2] No such file or directory: '../data/downloaded_Donald_Trump.txt'
Is downloaded_Donald_Trump.txt
the same file as downloaded_Donald_Trump_all.txt
(which is in the dropbox folder)?
Hello, I am run your paogram, but my python version is 3.6, so have some issue, can you give me some idea to deal with it?
After running word2vec successfully, I get the file named:
skip_nostop_single_100features_5minwords_5context
I am running:
python3 conditional.py
But I get:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/Isabelle/Documents/TextualEntailment/SemEvalStance/stance-conditional-acl2016-fresh/out/skip_nostop_single_100features_5minwords_5context_big'
I looked at the code and it seems like there is some hard coding and I am not sure which branch I should follow. Could you have a look?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.