Coder Social home page Coder Social logo

vishalbelsare / smrt Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tgsmith61591/smrt

0.0 1.0 0.0 82.91 MB

Handle class imbalance intelligently by using variational auto-encoders to generate synthetic observations of your minority class.

License: BSD 3-Clause "New" or "Revised" License

Shell 3.45% Python 96.55%

smrt's Introduction

Build status codecov Supported versions Supported versions

Synthetic Minority Reconstruction Technique (SMRT)

Handle your class imbalance more intelligently by using SMOTE's younger, more sophisticated cousin

Installation

Installation is easy. After cloning the project onto your machine and installing the required dependencies, simply use the setup.py file:

$ git clone https://github.com/tgsmith61591/smrt.git
$ cd smrt
$ python setup.py install

About

SMRT (Sythetic Minority Reconstruction Technique) is the new SMOTE (Synthetic Minority Oversampling TEchnique). Using variational auto-encoders, SMRT learns the latent factors that best reconstruct the observations in each minority class, and then generates synthetic observations until the minority class is represented at a user-defined ratio in relation to the majority class size.

SMRT avoids one of SMOTE's greatest risks: In SMOTE, when drawing random observations from whose k-nearest neighbors to synthetically reconstruct, the possibility exists that a "border point," or an observation very close to the decision boundary may be selected. This could result in the synthetically-generated observations lying too close to the decision boundary for reliable classification, and could lead to the degraded performance of an estimator. SMRT avoids this risk implicitly, as the VariationalAutoencoder learns a distribution that is generalizable to the lowest-error (i.e., most archetypal) observations.

See the paper for more in-depth reference.

Example

The SMRT example is an ipython notebook with reproducible code and data that compares an imbalanced variant of the MNIST dataset after being balanced with both SMOTE and SMRT. The following are several of the resulting images produced from both SMOTE and SMRT, respectively. Even visually, it's evident that SMRT better synthesizes data that resembles the input data.

Original:

The MNIST dataset was amended to contain only zeros and ones in an unbalanced (~1:100, respectively) ratio. The top row are the original MNIST images, the second row is the SMRT-generated images, and the bottom row is the SMOTE-generated images:
Original

Notes

smrt's People

Contributors

tgsmith61591 avatar jasonw247 avatar hasibzunair avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.