Coder Social home page Coder Social logo

dse's Introduction

Deterministic Sampling Ensemble


DSE-diagram1

Deterministic Sampling Ensemble diagram

DSE-diagram1

Deterministic Sampling diagram


Experiment 1 - Evaluating the best sampling method

Experiment files:

Methods:

  • DSE - Deterministic Sampling Ensemble

Base classifiers:

Data streams:

  • Generators:
  • Concept drift:
    • sudden
    • incremental
  • Objects: 15 000
  • Features: 10
  • Imbalance Ratio: 10%
  • Noise: 10%
  • Random samples: 333

Results:

O

Results of Random Under Sampling combination with oversampling methods. Darker is better, best value isbold and underscored

SVMS

Results of SVMSMOTE combination with undersampling methods. Darker is better, best value is bold andunderscored

NCR

Results of NCR combination with oversampling methods. Darker is better, best value is bold and underscored


Experiment 2 - Evaluating the best balance ratio param

Files:

Methods:

  • DSE - Deterministic Sampling Ensemble

Base classifiers:

Data streams:

  • Generators:
  • Concept drift:
    • sudden
    • incremental
  • Objects: 15 000
  • Features: 10
  • Imbalance Ratio: 10%
  • Noise: 10%
  • Random samples: 333

Results:

BALANCE

Balance parameter setup experiment. Darker is better, best value bold and underscore


Experiment 3 - Evaluating the performance on different noise ratio data stream

Files:

Methods:

Base classifiers:

Data streams:

  • Generator: stream-learn
  • Concept drift: incremental
  • Objects: 10 000
  • Features: 10
  • Imbalance Ratio: 10%
  • Noise: 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%
  • Random samples: 111, 222, 333, 444, 555

Results:

noise_exp

Selected mean results from noise experiments


Experiment 4 - Evaluating the performance on different balance ratio data stream

Files:

Base classifiers:

Methods:

Data streams:

  • Generator: stream-learn
  • Concept drift: incremental
  • Objects: 10 000
  • Features: 10
  • Imbalance Ratio: 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%
  • Noise: 10%
  • Random samples: 111, 222, 333, 444, 555

Results:

balance_exp

Selected mean results from noise and balance experiments


Experiment 5 - Main evaluation (synthetic data)

Files:

Base classifiers:

Methods:

Data streams:

  • Generators:
  • Concept drifts:
    • 1 sudden
    • 1 incremental
    • 5 sudden
    • 5 incremental
  • Objects: 100 000
  • Features: 10
  • Imbalance Ratio: 10%, 20%, 30%
  • Noise: 0%, 10%
  • Random samples: 111, 222

Results:

multi_incremental_hbar

Wilcoxon pair rank-sum tests for synthetic data streams with incremental concept drift. Dashed vertical line isa critical value with a confidence level 0.05 (green – win, yellow – tie, red – loss)

multi_sudden_hbar

Wilcoxon pair rank-sum tests for synthetic data streams with sudden concept drift. Dashed vertical line is acritical value with a confidence level 0.05 (green – win, yellow – tie, red – loss)


Experiment 5 - Main evaluation (real data)

Files:

Base classifiers:

Methods:

Data streams:

Results:

covtype

F-score metric over the data chunks for covtypeNorm-1-2vsAll data stream with SVM base classifier

poker

F-score metric over the data chunks for poker-lsn-1-2vsAll data stream with SVM base classifier

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.