Coder Social home page Coder Social logo

solvingoutliersproblem's Introduction

OUTLIERS

An outlier is a single data point that goes far outside the average value of a group of statistics. Outliers may be exceptions that stand outside individual samples of populations as well. In a more general context, an outlier is an individual that is markedly different from the norm in some respect.

Titanic Dataset

image

This is the boxplot of the "age" variable :

image

As seen there are outliers in this dataset.

How to find outlier thresholds ?

Formula:

image

image

The formula functionalized and applied.

Now, we can grab the outliers and take a look.

How to Solve the Outlier Problem?

-Trimming (Deletion)

Outliers can be deleted. We have implemented it but this is not recommended as it will often result in data loss.

-Imputation

As with the methods of dealing with missing data, the method of assigning values ​​can be preferred instead of outliers. It is more advantageous than the problems caused by data loss in deletion operations. The values to be assigned instead of outliers can be representative statistics such as mean, median, mode, or any fixed value.

-Data suppression (re-assignment with thresholds)

Data suppression refers to the various methods or restrictions that are applied to ACS estimates to limit the disclosure of information about individual respondents and to reduce the number of estimates with unacceptable levels of statistical reliability.

Multivariate Outlier Analysis: Local Outlier Factor (LOF)

When we looked at the variables separately, we detected outliers. So, if we look at the variables together, can we get outlier variables? For example, if a person was married 3 times at the age of 18.

Being 18 years old or getting married 3 times are not problems, but being 18 years old and married 3 times can be an outlier.

Applied to the "LocalOutlierFactor" dataset. If the values close to -1, it indicates that it is INLIER.

Elbow Method

image

A graph was created according to the threshold values, and when we examined the graph, the point where the slope change was the hardest was determined. The determined slope change point was chosen as threshold.

The individual variables may appear as outliers, but we found outliers depending on the situation between the variables.

Note : If working with tree methods, these values should not be changed.

You can find the mentioned titles and explanations in detail in SolvingOutliersProblem.py

solvingoutliersproblem's People

Contributors

baranylcn avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.