Coder Social home page Coder Social logo

capita_selecta's Introduction

READ ME

In context of online learning data comes in streams and hereby, one of the biggest challenges that a model faces is to be able preserve its optimality. Real-world data often comes from non-stationary environments and evolves over time, leading to a change in its distribution. This phenomenon is called concept drift. In certain cases, concept drift might be particularly disruptive to the optimality of Machine Learning systems. Once the model becomes sub-optimal, it needs to be re-configured. For an improved re-configuration process, understanding and quantifying drift is essential. This research is focused on analysing and quantifying drift using Hellinger Distance Based Drift Detection. Extending previous work of a fellow student Thomas Boot, class drift is analyzed using posterior class probabilities and the effect of window size to the accuracy of drift detection and measurement is explored. According to the experiment results, an abrupt drift classification method is proposed according to the magnitude of the detected drift. Code developed for this project can be used as a library for drift analysis and classification with small improvements.

Requirements

This repository uses Python 3.9. Dependencies required by this project can be installed via

pip install -r requirements.txt

Usage

  • main.py is the main function used during development. Parameters can be changed from the user-defined variables block inside the main.py script.
  • Extracting the /data folder from Cagla_Sozen_data.zip is sufficient for setting the default data path.
  • Plots generated will be outputted in the /out folder. To change the location, HDDDM_run (function run_hdddm) script should be changed.
  • Custom datasets with drift can be generated using the GenerateDatasets script. Currently it contains the generation code for all the datasets used for the experiments.

Details

  • /src contains the core classes and scripts containing the functions required for the project.
    • Discretize.py contains a class for instantiating a Discretizer with the selected method.
    • Distance.py contains the distance functions, Hellinger Distance, Total Variation Distance, KL_divergence, Jensen Shannon Divergence.
    • HDDDM_alternative_approach.py contains an implementation of the original HDDM algorithm as proposed by Ditzler and Polikar [1] with added different approaches [2], [3].
    • HDDDM_run script contains the function to run HDDDM with the selected parameters.
    • ProbabilityTypes.py contains the enumerated types of Approaches (Probabilities) [2], [3].
    • util.py contains utility functions
  • /out is the default output folder for the plots generated.
  • /data contains the data to be used as .csv files

References

[1] G. Ditzler and R. Polikar. Hellinger distance based drift detection for nonstationary environments. In 2011 IEEE Symposium on Computational Intelligence in Dynamic and Uncertain Environments (CIDUE), pp. 41โ€“48, 2011. doi: 10.1109/CIDUE.2011.5948491

[2] J. Sarnelle, A. Sanchez, R. Capo, J. Haas, and R. Polikar. Quantifying the limited and gradual concept drift assumption. In 2015 International Joint Conference on Neural Networks, IJCNN 2015, Proceedings of the International Joint Conference on Neural Networks. Institute of Electrical and Electronics Engineers Inc., United States, Sept. 2015. International Joint Conference on Neural Networks, IJCNN 2015 ; Conference date: 12-07-2015 Through 17-07-2015. doi: 10.1109/IJCNN.2015.7280850

[3] G. Webb, L. Lee, B. Goethals, and F. Petitjean. Analyzing concept drift and shift from sample data. Data Mining and Knowledge Discovery, 32, 09 2018. doi: 10.1007/s10618-018-0554-1

capita_selecta's People

Contributors

caglasozen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.