Coder Social home page Coder Social logo

c-bata / outlier-utils Goto Github PK

View Code? Open in Web Editor NEW
54.0 4.0 18.0 42 KB

Utility library for detecting and removing outliers from normally distributed datasets using the Smirnov-Grubbs test.

Home Page: https://pypi.python.org/pypi/outlier-utils

License: MIT License

Python 100.00%
python outliers statistics

outlier-utils's Introduction

outlier-utils

Utility library for detecting and removing outliers from normally distributed datasets using the Smirnov-Grubbs test.

Requirements

Overview

Both the two-sided and the one-sided version of the test are supported. The former allows extracting outliers from both ends of the dataset, whereas the latter only considers min/max outliers. When running a test, every outlier will be removed until none can be found in the dataset. The output of the test is flexible enough to match several use cases. By default, the outlier-free data will be returned, but the test can also return the outliers themselves or their indices in the original dataset.

Examples

  • Two-sided Grubbs test with a Pandas series input
>>> from outliers import smirnov_grubbs as grubbs
>>> import pandas as pd
>>> data = pd.Series([1, 8, 9, 10, 9])
>>> grubbs.test(data, alpha=0.05)
1     8
2     9
3    10
4     9
dtype: int64
  • Two-sided Grubbs test with a NumPy array input
>>> import numpy as np
>>> data = np.array([1, 8, 9, 10, 9])
>>> grubbs.test(data, alpha=0.05)
array([ 8,  9, 10,  9])
  • One-sided (min) test returning outlier indices
>>> grubbs.min_test_indices([8, 9, 10, 1, 9], alpha=0.05)
[3]
  • One-sided (max) tests returning outliers
>>> grubbs.max_test_outliers([8, 9, 10, 1, 9], alpha=0.05)
[]
>>> grubbs.max_test_outliers([8, 9, 10, 50, 9], alpha=0.05)
[50]

License

This software is licensed under the MIT License.

outlier-utils's People

Contributors

c-bata avatar dependabot[bot] avatar lukius avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

outlier-utils's Issues

G-value miscalculated

In "def test_once(self, data, alpha):" have defined:

$G=value/data.std()$

where "value" is the value to be evaluated minus the mean, given as relative postion of the value in relation of the mean.

$value = abs(data - data.mean()).argmax()$

This steps is similar with the we can look here, on NIST site. So, to avaluetion this steps, we go work with [199.31, 199.53, 200.19, 200.82, 201.92, 201.95, 202.18, 245.57].

The G-value on outlier-utils is "2.6392", against "2.4687" of NIST site, so....What the problem?
The problem is with the way to calculate std, because as you want to calculate a std from a sample, you must use data.std(ddof=1).

To calculate the std that NIST uses, just do:
$data.std()=value/G$
Therefore, NIST use "15.8525" how std, against "14.8287" of outlier-utils.

So, the first equation, it is actually:
$G=value/data.std(ddof=1)$

KeyError: 0 when running example

data = pd.Series([1, 8, 9, 10, 9])
grubbs.test(data, alpha=0.05)

Gives

KeyError                                  Traceback (most recent call last)
<ipython-input-5-1838b39dbcff> in <module>
      1 data = pd.Series([1, 8, 9, 10, 9])
----> 2 grubbs.test(data, alpha=0.05)

~\miniconda3\lib\site-packages\outliers\smirnov_grubbs.py in test(data, alpha)
    241 
    242 def test(data, alpha=DEFAULT_ALPHA):
--> 243     return two_sided_test(data, alpha)

~\miniconda3\lib\site-packages\outliers\smirnov_grubbs.py in two_sided_test(data, alpha)
    205 
    206 def two_sided_test(data, alpha=DEFAULT_ALPHA):
--> 207     return _two_sided_test(data, alpha, OutputType.DATA)
    208 
    209 

~\miniconda3\lib\site-packages\outliers\smirnov_grubbs.py in _two_sided_test(data, alpha, output_type)
    193 
    194 def _two_sided_test(data, alpha, output_type):
--> 195     return _test(TwoSidedGrubbsTest, data, alpha, output_type)
    196 
    197 

~\miniconda3\lib\site-packages\outliers\smirnov_grubbs.py in _test(test_class, data, alpha, output_type)
    189 
    190 def _test(test_class, data, alpha, output_type):
--> 191     return test_class(data).run(alpha, output_type=output_type)
    192 
    193 

~\miniconda3\lib\site-packages\outliers\smirnov_grubbs.py in run(self, alpha, output_type)
    120 
    121         while True:
--> 122             outlier_index = self._test_once(data, alpha)
    123             if outlier_index is None:
    124                 break

~\miniconda3\lib\site-packages\outliers\smirnov_grubbs.py in _test_once(self, data, alpha)
    101         :return: the index of the outlier if one if found; None otherwise
    102         """
--> 103         target_index, value = self._target(data)
    104 
    105         g = value / data.std()

~\miniconda3\lib\site-packages\outliers\smirnov_grubbs.py in _target(self, data)
    152         relative_values = abs(data - data.mean())
    153         index = relative_values.argmax()
--> 154         value = relative_values[index]
    155         return index, value
    156 

~\miniconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    869         key = com.apply_if_callable(key, self)
    870         try:
--> 871             result = self.index.get_value(self, key)
    872 
    873             if not is_scalar(result):

~\miniconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
   4402         k = self._convert_scalar_indexer(k, kind="getitem")
   4403         try:
-> 4404             return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
   4405         except KeyError as e1:
   4406             if len(self) > 0 and (self.holds_integer() or self.is_boolean()):

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 0

build error: is CHANGES.rst missing when installing via pip?

there was no problem when building from the zip

$ sudo -H pip install outlier-utils
Collecting outlier-utils
Using cached outlier_utils-0.0.2.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/private/tmp/pip-build-gmoeex/outlier-utils/setup.py", line 8, in
CHANGES = open(os.path.join(BASE_PATH, 'CHANGES.rst')).read()
IOError: [Errno 2] No such file or directory: '/private/tmp/pip-build-gmoeex/outlier-utils/CHANGES.rst'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.