omidrohanian / irony_detection Goto Github PK
View Code? Open in Web Editor NEWCode and data used for participation in SemEval-2018 Task 3: "Irony detection in English tweets"
Code and data used for participation in SemEval-2018 Task 3: "Irony detection in English tweets"
When running the feature_generator_TaskA
notebook, specifically cell 9, I get the following error:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-9-0b9719d949b3> in <module>
5 for tweet in corpus_preprocessed:
6 chunks = chunkIt(tweet, 2)
----> 7 polarity_vectors.append(np.concatenate(((polarity(chunks[0])[1], polarity(chunks[1])[1])), axis=0))
8
9 assert len(ekphrasis_feats) == len(polarity_vectors)
~/Documents/scriptie/irony_detection/venv/lib/python3.6/site-packages/ekphrasis/utils/nlp.py in polarity(doc, neg_comma, neg_modals)
213
214 _scores = numpy.mean(numpy.array(scores), axis=0)
--> 215 _polarity = _scores[0] - _scores[1]
216
217 return _polarity, _scores
IndexError: invalid index to scalar variable.
All the preceding cells seem to run fine, so I don't know what could be causing this. Any ideas?
I tried running the feature_generator_TaskA.ipynb and it gave me this error.
UnicodeDecodeError Traceback (most recent call last)
in ()
1 if TRAIN:
2 dataset='../datasets/train/SemEval2018-T3-train-taskA_emoji.txt'
----> 3 corpus, _ = parse_dataset(dataset)
4 corpus_preprocessed = json.load(open('../extra_resources/train_preprocessed.txt','r'))
5 else:
~\Documents\Special Problem\198.2\irony_detection\subtaskA\load.py in parse_dataset(dataset)
5 dataset_name = dataset.lower()
6 with open(dataset, 'r') as data_in:
----> 7 for line in data_in:
8 if not line.lower().startswith("tweet index"): # discard first line if it contains metadata
9 line = line.rstrip() # remove trailing whitespace
~\Anaconda3\lib\encodings\cp1252.py in decode(self, input, final)
21 class IncrementalDecoder(codecs.IncrementalDecoder):
22 def decode(self, input, final=False):
---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0]
24
25 class StreamWriter(Codec,codecs.StreamWriter):
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2779: character maps to
Hi, for a uni-project, I am trying to reproduce the paper and see if I am able to get the same results.
I picked your paper because I liked the paper and it seemed quite accessible, seeing everything is readily available on GitHub, which is really nice!
I have run the project multiple times, mimicking your setup as closely as possible. However, I am unable to get the 13 features as mentioned in your paper. I am only getting 12 as selected by RFEC (leaving out the feature contrast). I have tried on multiple environments (a couple of windows and one mac). It seems to be related to the "train_feats_taskA.npy"-file when I regenerate it. If I don't regenerate it I will have the same set of features as mentioned in your paper.
I have thought of it being related to using a newer version of Stanford CoreNLP and the other packages, however, I thought maybe could shed some light on this. Do you maybe have a suggestion on what this could be related to?
The code jupyter notebook "feature_generator_TaskA.ipynb" is the same as your repo.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.