Coder Social home page Coder Social logo

naive_bayes_classifier's Introduction

Naive_Bayes_Classifier

A program that implements Naive Bayes' Algorithm to classify documents according to a set vocabulary.

Setting Up

This system has been developed, and mainly works exclusively for Linux systems, specifically Ubuntu. However, the source code has been provided in *.cpp format to allow for porting to other systems as well. Further instructions shall be given under the assumption that a Linux system is being utilized. Although the binary files have been provided, the following commands can be used to generate them.

g++ Hash_Tagger.cpp -o hashtagger -std=c++11
g++ Hash_Tester.cpp -o hashtest -std=c++11

Data Format

This system requires 3 main items:

  1. A vocabulary which is used to assign tags or classes to documents.
  2. Training data so that the system can start drawing inferences between data and tags.
  3. Test data which requires classification.

Data inside angular brackets <data> will be ignored in training and test data. This is to allow for raw HTML to be given.

Vocabulary

Vocabulary.txt shows an example case of Vocabulary. The format of vocabulary is as follows:
ID,Name,Category

ID - A unique number from 1 to 4 billion, used for the specific tag.
Name - A number/string which represents the tag that will be given as input in the training data, and that will be generated as output.
Category - The category that the tag falls under.

Training Data

Training data should consist of files in consecutive numerical order, starting from 0.txt to n.txt. Each text file should have a key file, 0.key to n.key which contains the tags given for that file, with each tag being on each line.

Test Data

Similar to training data, files in consecutive numerical order, starting from 0.txt to n.txt. Key files will be generated in this case.

Example Use

Provided is a simple walkthrough of the system, along with test files necessary to begin execution.
Vocabulary.txt - List of languages.
train/ - Contains various documents in different languages.

naive_bayes_classifier's People

Contributors

akkamath avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.