Coder Social home page Coder Social logo

simple_bayes's Issues

Universal save/load function

In order to save the bayes filter e.g. in a database/key-value store I think an universal save/load function would be great.

My suggestion is to adapt the storage behavior so that the save function returns a {:ok, pid, data} tuple. In case of the filesystem storage, which doesn't need to return data, the returned tuple is {:ok, pid, nil}.

The load function than needs to accepts the encoded data.

What do you think?

Additional Bayes algorithms

First, many thanks for your great Bayes library!

As an enhancement and to handle different use cases, it would be great to have the ability to select the Bayes algorithm to use.

I would suggest these two additional algorithms:

  • Binarized Multinomial Naive Bayes (The Binarized Multinomial Naive Bayes can be used when the frequencies of the words don’t play a key role in a classification, e.g. sentiment analysis)
  • Bernoulli Naive Bayes (The Bernoulli Naive Bayes can be used when the absence of a particular word matters, e.g. spam or adult content filtering)

What do you think?

Remove classify_one in favour of an option

This is minor but it seems as though allowing classify_one or classify is unnecessary. There could just be an option of :top (or something better) to specify how many to classify:

bayes |> SimpleBayes.classify("Maybe green maybe red but definitely round and sweet.", top: 3) # classify 3

bayes |> SimpleBayes.classify("Maybe green maybe red but definitely round and sweet.") # classify all

bayes |> SimpleBayes.classify("Maybe green maybe red but definitely round and sweet.", top: 1) # classify 1, rather than a separate function

Seems that being generic to the number to return would be handy, and trim down any special casing.

Improved encoding/decoding?

Heya!

Loving this library, except I've encountered an unfortunate pathological case.

I have trained a classifier on roughly 288,000 labeled texts. Cardinality of labels is 5, and length of the text ~5 words.

Here's how it's configured:

model:          :bernoulli,
storage:        :file_system,
file_path:      @storage,
default_weight: 1.0,
smoothing:      0.0,
stem:           false,
stop_words:     []

I then persisted this trained model to disk. It's roughly 11Mb when stored. Then I attempted to load it back into memory...

After 10 (!) minutes I received this error:

** (ArithmeticError) bad argument in arithmetic expression
    (stdlib) :math.pow(7318, 24158)
             lib/simple_bayes/classifier/models/bernoulli.ex:7: SimpleBayes.Classifier.Model.Bernoulli.probability_of/3
             lib/simple_bayes/classifier/probability.ex:37: anonymous fn/4 in SimpleBayes.Classifier.Probability.for_collection/3
    (elixir) lib/map.ex:114: Map.do_new_transform/3
             lib/simple_bayes/classifier.ex:13: SimpleBayes.Classifier.classify/3

Ouch.

So, there's two things here:

  1. There's an issue prevent me from reloading the trained classifier.
  2. Loading is painfully slow.

When I have the time, I'll explore the reasons behind (1). For now, I'll see how far dets can get me.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.