cyhex / streamcrab Goto Github PK
View Code? Open in Web Editor NEWReal-Time, Twitter sentiment analyzer engine
Home Page: http:/www.streamcrab.com
Real-Time, Twitter sentiment analyzer engine
Home Page: http:/www.streamcrab.com
The collector/trainer/twitterCollector.py has this stream:
http://stream.twitter.com/1/statuses/filter.json
It should be this:
https://stream.twitter.com/1/statuses/filter.json
(notice the https)
The directions say basically to start in the tracker directory. I get this error when doing that:
Traceback (most recent call last):
File "tests/moodClientServerTest.py", line 4, in
from tracker.lib.moodClassifierClient import MoodClassifierTCPClient
ImportError: No module named tracker.lib.moodClassifierClient
dwmcqueen@dwmcqueen-VirtualBox:~/smm/tracker$
Hi
python start-classifier.py running .
After error give program
Traceback (most recent call last):
File "start-classifier.py", line 14, in
pool = ClassifierWorkerPool()
File "/home/ilkay/Masaüstü/streamcrab-master/smm/classifier/pool.py", line 50, in init
self.trained_classifier = row.get_classifier()
AttributeError: 'NoneType' object has no attribute 'get_classifier'
What should I do ?
Thanks
Loaded maxEntTestCorpus
Classify: Bloomberg –He's the man of the Year!
/Users/fenek/Documents/pp/pingpongowl/streamcrab/smm/classifier/textprocessing.py:37: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
if t in stopwords:
/Users/fenek/Applications/anaconda/anaconda/lib/python2.7/site-packages/nltk/stem/porter.py:275: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
if word[-1] == 's':
Traceback (most recent call last):
File "toolbox/shell-classifier.py", line 34, in
features = config.classifier_tokenizer.getFeatures(txt)
File "/Users/fenek/Documents/pp/pingpongowl/streamcrab/smm/classifier/textprocessing.py", line 144, in getFeatures
return dict.fromkeys(cls.getClassifierTokens(text), 1)
File "/Users/fenek/Documents/pp/pingpongowl/streamcrab/smm/classifier/textprocessing.py", line 131, in getClassifierTokens
tokes = cls.stemm(tokes)
File "/Users/fenek/Documents/pp/pingpongowl/streamcrab/smm/classifier/textprocessing.py", line 152, in stemm
tokens[i] = stemmer.stem(t)
File "/Users/fenek/Applications/anaconda/anaconda/lib/python2.7/site-packages/nltk/stem/porter.py", line 633, in stem
stem = self.stem_word(word.lower(), 0, len(word) - 1)
File "/Users/fenek/Applications/anaconda/anaconda/lib/python2.7/site-packages/nltk/stem/porter.py", line 591, in stem_word
word = self._step1ab(word)
File "/Users/fenek/Applications/anaconda/anaconda/lib/python2.7/site-packages/nltk/stem/porter.py", line 289, in _step1ab
if word.endswith("ied"):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)
Feneks-MacBook-Pro:streamcrab fenek$ python toolbox/shell-classifier.py maxEntTestCorpus
exit: ctrl+c
Loaded maxEntTestCorpus
/dev/streamcrab$ sudo python toolbox/collect-tweets.py happy 2000
Traceback (most recent call last):
File "toolbox/collect-tweets.py", line 28, in
models.connect()
File "/home/venkat/dev/streamcrab/smm/models.py", line 125, in connect
mongoengine.connect(**conf)
File "build/bdist.linux-x86_64/egg/mongoengine/connection.py", line 173, in connect
File "build/bdist.linux-x86_64/egg/mongoengine/connection.py", line 135, in get_connection
mongoengine.connection.ConnectionError: Cannot connect to database default :
False is not a read preference.
while running "toolbox/train-classifier.py" i am getting following exception:
Traceback (most recent call last):
File "toolbox/train-classifier.py", line 13, in
from smm import models
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0.4-py2.7.egg/nltk/classify/maxent.py", line 315, in train
gaussian_prior_sigma, **cutoffs)
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0.4-py2.7.egg/nltk/classify/maxent.py", line 1440, in train_maxent_classifier_with_scipy
model.fit(algorithm=algorithm)
File "/usr/lib/python2.7/dist-packages/scipy/maxentropy/maxentropy.py", line 1026, in fit
return model.fit(self, self.K, algorithm)
File "/usr/lib/python2.7/dist-packages/scipy/maxentropy/maxentropy.py", line 226, in fit
callback=callback)
File "/usr/lib/python2.7/dist-packages/scipy/optimize/optimize.py", line 636, in fmin_cg
gfk = myfprime(x0)
File "/usr/lib/python2.7/dist-packages/scipy/optimize/optimize.py", line 176, in function_wrapper
return function(x, *args)
File "/usr/lib/python2.7/dist-packages/scipy/maxentropy/maxentropy.py", line 420, in grad
G = self.expectations() - self.K
ValueError: operands could not be broadcast together with shapes (800) (1636)
could you please help me to create classifier.
classifier_tokenizer = StopStemmTwitterProcessor not found
Classify: today is good
Classification: negative with 64.81%
Feature negativ positiv
today==1 (1) 0.325
good==1 (1) -0.044
today==1 (1) -0.631
good==1 (1) 0.031
TOTAL: 0.281 -0.600
PROBS: 0.648 0.352
Hi
Firstly, I have to say this SMM is fantastic and I'm looking forward to being able to implement some of the classifiers you mention...
I have a questions though, and hope you can help
I've run all the tests - per your readme and everything is 'OK'
when I try and initiate the program on my local machine though I get the following error -
python tests/moodClientServerTest.py
Traceback (most recent call last):
File "tests/moodClientServerTest.py", line 4, in
from tracker.lib.moodClassifierClient import MoodClassifierTCPClient
ImportError: No module named tracker.lib.moodClassifierClient
can you provide any insight please?
Thanks in advance
What fields should be replaced? I see username / password - is this Twitter username / password? Anything else?
On OSX remember to install Twitter (or the examples won't work in the default config) --
sudo pip install twitter
Hi,
Do you want to compare your classifier with other teams competed at semeval2014 and tell me the score?
http://alt.qcri.org/semeval2014/task9/
paper and results: http://alt.qcri.org/semeval2014/cdrom/pdf/SemEval2014009.pdf
Thanks,
Gerald
Timor, how to add russian language in setting for analysis of the russian tweets?
Hi again,
Did any one manage to get the tweetClassifier.py script working? I don't think the current documentation mentions the pre-requisite modules also, it links to hard coded data?!
Thanks, Ahmed
Hello,
I wonder what role do these libraries in the project.
That is, the classifiers are part of NumPy and SciPy or NLTK?
Hi,
I would like to quickly try out smm but I can't due to these two missing data files:
tweetsPFile = "/home/gx/Sites/SMM/trunk/tracker/data/tweets_positive_test.dat"
tweetsNFile = "/home/gx/Sites/SMM/trunk/tracker/data/tweets_negative_test.dat"
Did you generate them yourself ? If so, is it possible to commit them too ?
Thanks !
I have installed all the modules and run all the steps to work with the the SMM and when Im trying to run the client python program to connect to the server the server side dumps this error,
I'm currently stuck on this:
tracker # python moodClassifierd.py debug
starting debug mode...
Exception happened during processing of request from ('127.0.0.1', 51861)
Traceback (most recent call last):
File "/usr/lib/python2.6/SocketServer.py", line 560, in process_request_thread
self.finish_request(request, client_address)
File "/usr/lib/python2.6/SocketServer.py", line 322, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib/python2.6/SocketServer.py", line 617, in init
self.handle()
File "moodClassifierd.py", line 60, in handle
raise e
The moodClassifierd.py is throwing an exception on the line 60, do you guys know what it is happening?
try:
data_to_send = []
for r in recvData:
text = r.get('text')
r['x_lang'] = self.server.langCls.detect(text)[0]
r['x_mood'] = self.server.moodCls.classify(text,r['x_lang'] )
data_to_send.append(r)
except Exception,e:
raise e <------------------------------------------------------------HERE
return False
self._send(data_to_send)
RuntimeError:
Attempt to start a new process before the current process
has finished its bootstrapping phase.
This probably means that you are on Windows and you have
forgotten to use the proper idiom in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce a Windows executable.
The causing function calls are the following ones:
pool.start()
worker.start()
which are called when running start-classifier.py
These function calls also cause the following error:
TypeError: can't pickle thread.lock objects
After starting debug and running test client (python moodClientServerTest.py in the "test" directory), see this error on the client window:
Traceback (most recent call last):
File "moodClientServerTest.py", line 11, in
print MCC.classify(test_data, 'search')
File "../../tracker/lib/moodClassifierClient.py", line 57, in classify
self._readResults()
File "../../tracker/lib/moodClassifierClient.py", line 28, in _readResults
dataLen = int(dataLen)
ValueError: invalid literal for int() with base 10: ''
With this on the daemon window:
Exception happened during processing of request from ('127.0.0.1', 44979)
Traceback (most recent call last):
File "/usr/lib/python2.7/SocketServer.py", line 582, in process_request_thread
self.finish_request(request, client_address)
File "/usr/lib/python2.7/SocketServer.py", line 323, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib/python2.7/SocketServer.py", line 639, in init
self.handle()
File "moodClassifierd.py", line 61, in handle
raise e
RuntimeError: dictionary changed size during iteration
Hi,
I want to consume it .net application. I just want a function to which i give a string and it returns me score or positive/negative result.
thanks
i don't find
import StopStemmTwitterProcessor, StopTwitterProcessor
Line 50
49 row = TrainedClassifiers.objects(name=config.classifier).first()
50 self.trained_classifier = row.get_classifier()
the element row is not recognize as a TrainedClassifiers therefore you can't do the get_classifier() method
Thank you
Hi I have build successfully everything i have been able to create the databases with the following commands
To Build Training Dataset
python collector/trainer/twitterCollector.py
python collector/trainer/tweetClassifier.py
I have already created this files in /data dir
-rw-r--r-- 1 root root 293204 Oct 11 20:53 mood_traing_150k_1k_0.6.dat
-rw-r--r-- 1 root root 6104719 Oct 11 20:52 tweets_negative_raw.dat
-rw-r--r-- 1 root root 6581580 Oct 11 20:47 tweets_positive_raw.dat
My service it is OK
domU-12-31-39-06-8E-37 tracker # ./moodClassifierd.py start
starting...
OK
tests # more moodClientServerTest.py
import sys
sys.path.append('../../')
from tracker.lib.moodClassifierClient import MoodClassifierTCPClient
MCC = MoodClassifierTCPClient('127.0.0.1',6666)
test_data = {'text':'I am sad because i have a bad iphone So the 4S is announced yet preorder is sold out? alright then'}
print MCC.classify(test_data, 'search')
OUTPUT it is
[{'text': u'I am sad because i have a bad iphone So the 4S is announced yet preorder is sold out? alright then', 'x_mood': 0.0, 'x_lang': 'en'}]
I'm not able to identify if the polarity it is "POSITIVE or NEGATIVE" What im doing wrong because it look neutral, I am assuming that i will have to get -.xxx for negative and close to 1 for positive ?
Do you have a technical paper about the result of applying those classifiers on data? Is the data read from the public stream api?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.