Coder Social home page Coder Social logo

davidcico / natural-language-processing-with-svm-for-sentiment-analysis Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 9.4 MB

In this repository, an example of natural language processing (NLP) for document classification is performed using a support vector machine (SVM) model.

Python 100.00%
natural-language-processing python svm svm-classifier python3 nlp finance finance-application reuters-dataset

natural-language-processing-with-svm-for-sentiment-analysis's Introduction

Natural-Language-Processing-With-SVM-For-Sentiment-Analysis

There are a significant number of steps to carry out between viewing a text document on a web site and using its content as input to an automated trading strategy. In particular the following steps must be carried out:

  • Automate the download of multiple, continually generated articles from external sources at a potentially high throughput.

  • Parse these documents for the relevant sections of text/information that require analysis, even if the format differs between documents.

  • Convert arbitrarily long passages of text (over many possible languages) into a consistent data structure that can be understood by a classification system.

  • Determine a set of groups (or labels) that each document will be a member of. Examples include “positive” and “negative” or “bullish” and “bearish”.

  • Create a training corpus of documents that have known labels associated with them. For instance, a thousand financial articles may need tagging with the “bullish” or “bearish” labels.

  • Train the classifier(s) on this corpus by means of a software library such as Scikit-Learn.

  • Use the classifier to label new documents, in an automated, ongoing manner.

  • Assess the “classification rate” and other associated performance metrics of the classifier.

  • Integrate the classifier into an automated trading system, either by means of filtering other trade signals or generating new ones.

  • Continually monitor the system and adjust it as necessary if its performance begins to degrade.

In this particular script we make use of a given dataset that already comes with its own provided labels. This will allow us to concentrate on the implementation of the classification pipeline, rather than spending a substantial amount of time obtaining and tagging documents. While beyond the scope of this study, it is possible to make use of Python libraries, such as ScraPy and BeautifulSoup, to automatically obtain many web-based articles and effectively extract their text-based data from the HTML making up the page data. Under the assumption that we have a document corpus that is pre-labelled (the process of which will be outlined below), we will begin by taking the training corpus and incorporating it into a Python data structure that is suitable for pre-processing and consumption via the classifier.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

You need Python 3.x to run the following code. You can have multiple Python versions (2.x and 3.x) installed on the same system without problems. Python needs to be first installed, then Scipy before installing Scikit-Learn for machine learning tools.

In Ubuntu, Mint and Debian you can install Python 3 like this:

sudo apt-get install python3 python3-pip

Alongside Python, the SciPy packages are also required. In Ubuntu and Debian, the SciPy ecosystem can be installed by:

sudo apt-get install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose

Finally, the latest release of Scikit-Learn machine learning package, which can be installed with pip:

pip install -U scikit-learn

For other Linux flavors, OS X and Windows, packages are available at:

http://www.python.org/getit/
https://www.scipy.org/install.html
https://scikit-learn.org/stable/install.html

File descriptions

  • 'Data' directory contains different files on which such NLP work can be carried out.
  • 'NLP_SVM_analysis.pdf' contains the explanation about how to carry such NLP studies, explaining the different steps taken in the code in depth, and the accuracy of the results obtained.
  • 'reuters_svm.py' which is the python script running the SVM model on document classification.

Running the program

The different ".py" files need to be placed in the same folder for the main script to be run. The code is then ready to be used, and just requires running the following command:

python reuters_svm.py

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

Authors

See also the list of contributors who participated in this project.

natural-language-processing-with-svm-for-sentiment-analysis's People

Contributors

davidcico avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.