Coder Social home page Coder Social logo

ryckes / spam-classifier-comparator Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 264 KB

Utility to measure some characteristics of a spam classifier (false positive rate, false negative rate, precision...) when fed with a training set and a test set, so it can be compared with other classifiers with the same purpose.

License: GNU General Public License v3.0

PHP 100.00%

spam-classifier-comparator's Introduction

Spam classifier comparator

In this project I provide a set of interfaces for spam classifiers, classification runners (normal, k-fold cross validation...), results formatters, etc. and some implementations for them (random and Graham classifier, normal and cross-validation runner, default formatter...) which can be used to test a set of classifiers against the same test data with the same training data for comparison.

Why? Why???!!!

I've had spam issues in comments in my blog and I'd like to try a few different content-based classifiers and see how they perform with my data. I thought it would be cool to be able to compare a wider range of messages (some user may want to include additional metadata in the classifier computations) so I will code it with some degree of genericity or polymorphism. To do that, I have decided to represent the result of a given classification with a simple boolean type: true for positive, false for negative.

My goal is that, if you already have your classes for representing messages and classifiers, you can build a simple Adapter for your classifier and plug both along with a pair of training set and test set and see the numbers pop out. Thus I have to make it simple so you don't need to modify the internals of the classifier.

The Classifier interface will probably change soon, because with the current implementation of the Cross-Validation runner it is not possible to change the default parameters of the classifier under test (except through inheritance, but it would be unfeasible to create an inherited class for all the combinations of parameters one may want to try).

Interface

Right now it can only be used through scripts like the ones shown in CVExampleComparator.php and ExampleComparator.php. The DefaultFormatter class returns a simple representation of the results, but you can create a formatter that implements the ResultsFormatter interface (or not) to give the RunnerResults any format.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.