Coder Social home page Coder Social logo

usent's Introduction

usent โ€” The attached code is a Python implementation of a dictionary-based sentiment classification procedure which combines two different bootstrapping procedures, namely for subjectivity and polarity detection (as in [3, 4] respectively). The rule-based polarity classifier is an extension of the one that was presented in [5].

  • E. Riloff and J. Wiebe. Learning extraction patterns for subjective expressions. In Proceedings of the 2003 conference on Empirical methods in natural language processing, 2003.
  • D. K. M Wiegand. Bootstrapping supervised machine-learning polarity classifiers with rule-based classification. In Proceedings of the ECAI-Workshop on Computational Approaches to Subjectivity and Sentiment Analysis, 2009.
  • T. Wilson, J. Wiebe, and P. Hoffmann. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, 2005.

The code was used for an opinion mining and retrieval system presented at CICLing 2013 [1], and for improving one-class collaborative filtering [2]. There is also a folder called /TED_comment_annotations that contains the files of the human study we conducted on TED comment sentiment classification (with 6 human annotators).

@incollection{pappas13c,
 location = {Samos, Greece},
 year = {2013},
 booktitle = {Computational Linguistics and Intelligent Text Processing},
 volume = {7817},
 doi = {10.1007/978-3-642-37256-8_17},
 title = {Distinguishing the Popularity between Topics: A System for Up-to-Date Opinion Retrieval and Mining in the Web},
 author = {Pappas, Nikolaos and Katsimpras, Georgios and Stamatatos, Efstathios},
 pages = {197-209}
}

@inproceedings{pappas13a,
 author = {Pappas, Nikolaos and Popescu-Belis, Andrei},
 title = {Sentiment Analysis of User Comments for One-Class Collaborative Filtering Over {TED} Talks},
 booktitle = {Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval},
 series = {SIGIR '13},
 year = {2013},
 isbn = {978-1-4503-2034-4},
 location = {Dublin, Ireland},
 pages = {773--776},
 numpages = {4},
} 

Installing dependencies

The available code for unsupervised sentiment classification requires Python programming language and pip package manager to run. For detailed installing instructions please refer to the following links:
http://www.python.org/getit/
http://www.pip-installer.org/en/latest/

After installing them, you should be able to install the following packages:

$ pip install nltk  
$ pip install stemmer 
$ pip install numpy
$ pip install pickle 

After you install nltk you will need some corpora to train the sequential POS tagger (pos.py) and the nltk tokenizer.

$ python 
import nltk 
nltk.download() 

The issue of the above command will load a graphical interface that lets you manage several corpora related to nltk library. From the list select and download the following corpora: tokenizers/punkt/english, wordnet, brown, conll2000 and treebank.

Lastly, pyml library is needed for the SVM classifier that is used currently in our code.
Download http://pyml.sourceforge.net/ and then issue:

 $ tar zxvf PyML-0.7.11.tar.gz
 $ cd PyML-0.7.11
 $ python setup.py build
 $ python setup.py install 

Processing pipeline

The current pipline that is implemented in sentiment.py is depicted in the following diagram. Initially, the input text is split into sentences and each sentence is fed to a high precision subjectivity classifier. If the sentence is classified as subjective then syntactic patterns are learned from this instance. In case that the sentence is not detected as such then it is fed to the pattern-based classifier. The pattern-based classifier outputs the class of the sentence based on the learned patterns so far. If the instance is subjective then again more patterns are learned from it, otherwise it is fed to a high precision objectivity classifier. If the sentence is classified as objective, then it is ignored, otherwise it is fed to the polarity classifier. Finally, the polarity classifier estimates the numerical sentiment and normalized sentiment values and outputs the result. The instances with high confidence from the polarity classifier can be further used to train an SVM classifier to improve further the classification performance (see paper for further details). At the current version this option is disabled, but you can easily enable it. Similarly, you can remove some of the components from the pipeline according to your needs (e.g. skip subjectivity classification).

Examples

To estimate the total sentiment and total normalized sentiment (as described in the papers), you can simply execute the sentiment.py file and give the desired block of text as an argument. Make sure that you escape symbols such as '"' and '!'. Apart from the command line execution you can integrate the library to your code and use directly the returned results. Below you can find two simple examples for demonstrating purposes:

$ python sentiment.py "I have to give much love and respect to Rony. Your work is Amazing\\!"
[+] Loaded existing UBT tagger!
[+] Loaded existing pattern knowledge!

[*] Checking block of text:
[1] I have to give much love and respect to Rony.
[2] Your work is Amazing!

[*] Overall sentiment analysis:

 Parts:  2
 Sentiments:  ['positive', 'positive']
 Scores:  [4, 6.0]
 Results:  {'positive': {'count': 2, 'score': 10.0, 'nscore': 1.9},
	    'neutral': {'count': 0, 'score': 0, 'nscore': 0},
	    'negative': {'count': 0, 'score': 0, 'nscore': 0}}


 subjective-----> 100.00%
 objective------> 0.00%


 positive-------> 100.00%
 neutral--------> 0.00%
 negative-------> 0.00%

[x] positive (10.00, 1.90)
$ python sentiment.py "I was blown away by some of the comments here posted by people who is either 
uneducated, ignorant, self-righteous or al-of-the-above. I'm irritated and saddened as I read these 
finger-pointing \"i'm right and you're wrong\" type of posts\!"
[+] Loaded existing UBT tagger!
[+] Loaded existing pattern knowledge!

[*] Checking block of text:
[1] I was blown away by some of the comments here posted by people who is either uneducated, ignorant, self-righteous or al-of-the-above.
[2] I'm irritated and saddened as I read these finger-pointing "i'm right and you're wrong" type of posts!

[*] Overall sentiment analysis:

 Parts:  2
 Sentiments:  ['negative', 'negative']
 Scores:  [-4, -4.0]
 Results:  {'positive': {'count': 0, 'score': 0, 'nscore': 0},
	    'neutral': {'count': 0, 'score': 0, 'nscore': 0},
	    'negative': {'count': 2, 'score': -8.0, 'nscore': -0.3722943722943723}}


 subjective-----> 100.00%
 objective------> 0.00%


 positive-------> 0.00%
 neutral--------> 0.00%
 negative-------> 100.00%

[x] negative (-8.00, -0.37)

usent's People

Contributors

nik0spapp avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.