Coder Social home page Coder Social logo

opfimb's Introduction

OpfImb

OPFImb is an Optimum-Path Forest-based library created for handling the problem of imbalanced datasets. OPFImb was developed to be simple and easy-to-use for users who are familiarized with existing frameworks which tackle the same research issue that often represent a problem in the context of classification tasks. Our library is composed of functions that handle either oversampling or undersampling of imbalanced datasets by using a range of variants designed to specific aspects of the data distribution under analysis.

For the oversampling procedure, synthetic samples are created using a Gaussian distribution computed through the mean value and the covariance of the samples within the clusters of the minority class samples generated using the Unsupervised Optimum-Path Forest (OPF) model. Regarding the undersampling, Supervised learning by OPF is employed to assign a score for each training sample that correct conquers an instance of the testing set. Training samples with zeros or negative scores are candidates to be removed from the training set.

The following methods are so far available in the OPFImb:

Overampling:

  • O2PF: it represents the standard Oversampling Optimum-Path Forest method;
  • O2PF_RI: O2PF Radius Interpolation;
  • O2PF_MI: O2PF Mean Interpolation;
  • O2PF_P: O2PF Prototype;
  • O2PF_WI: O2PF Weight Interpolation.

Undersampling:

  • OPF-US: represents the standard Undersampling Optimum-Path Forest method. Removes low-ranked samples from majority class until the dataset is balanced;
  • OPF-US1: removes samples from the majority class with negative scores;
  • OPF-US2: removes samples from the majority class with scores lower or equal to zero;
  • OPF-US3: removes all samples with negative scores.

Besides the above-mentioned methods, OPFImb provides three hybrid approaches that firstly apply an undersampling method followed by the oversampling performed by the standard O2PF. These hybrid methods are described as follows:

  • OPF-US1-O2PF: undersampling by using OPF-US1 followed by oversampling performed by O2PF;
  • OPF-US2-O2PF: undersampling by using OPF-US2 followed by oversampling performed by O2PF;
  • OPF-US3-O2PF: undersampling by using OPF-US3 followed by oversampling performed by O2PF.

Examples:

  • Overampling;
    • python oversampling.py
    • python oversampling_example_simple.py data/vertebral_column/1/train.txt 20
  • Undersampling;
    • python undersampling.py
    • python undersampling_example_simple.py data/vertebral_column/1/train.txt data/vertebral_column/1/valid.txt
  • Hybrid;
    • python hybrid.py;
    • python hybrid_example_simple.py data/vertebral_column/1/train.txt 20 data/vertebral_column/1/valid.txt

Results:

  • result_folder/OpfImb_variant_name/dataset_name/fold_number/pred.txt
    • testing set predicted labels over fold
  • result_folder/OpfImb_variant_name/dataset_name/fold_number/results.txt
    • testing set results over fold. Follows the pattern: best_k,accuracy,recall,f1,execution_time
  • result_folder/OpfImb_variant_name/dataset_name/fold_number/validation.txt
    • validation set results over fold, where each line denotes a evaluated k_max. Follows the pattern: best_k,accuracy,recall,f1,execution_time

OPF code inspired in the OPFYTHON lib:

Paper describing the code:

  • Handling Imbalanced Datasets Through Optimum-Path Forest (Submitted to journal TPAMI).

opfimb's People

Contributors

danilojodas avatar leandropassosjr avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.