Coder Social home page Coder Social logo

c-bata / outlier-utils Goto Github PK

View Code? Open in Web Editor NEW
54.0 4.0 18.0 42 KB

Utility library for detecting and removing outliers from normally distributed datasets using the Smirnov-Grubbs test.

Home Page: https://pypi.python.org/pypi/outlier-utils

License: MIT License

Python 100.00%
python outliers statistics

outlier-utils's Issues

build error: is CHANGES.rst missing when installing via pip?

there was no problem when building from the zip

$ sudo -H pip install outlier-utils
Collecting outlier-utils
Using cached outlier_utils-0.0.2.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/private/tmp/pip-build-gmoeex/outlier-utils/setup.py", line 8, in
CHANGES = open(os.path.join(BASE_PATH, 'CHANGES.rst')).read()
IOError: [Errno 2] No such file or directory: '/private/tmp/pip-build-gmoeex/outlier-utils/CHANGES.rst'

G-value miscalculated

In "def test_once(self, data, alpha):" have defined:

$G=value/data.std()$

where "value" is the value to be evaluated minus the mean, given as relative postion of the value in relation of the mean.

$value = abs(data - data.mean()).argmax()$

This steps is similar with the we can look here, on NIST site. So, to avaluetion this steps, we go work with [199.31, 199.53, 200.19, 200.82, 201.92, 201.95, 202.18, 245.57].

The G-value on outlier-utils is "2.6392", against "2.4687" of NIST site, so....What the problem?
The problem is with the way to calculate std, because as you want to calculate a std from a sample, you must use data.std(ddof=1).

To calculate the std that NIST uses, just do:
$data.std()=value/G$
Therefore, NIST use "15.8525" how std, against "14.8287" of outlier-utils.

So, the first equation, it is actually:
$G=value/data.std(ddof=1)$

KeyError: 0 when running example

data = pd.Series([1, 8, 9, 10, 9])
grubbs.test(data, alpha=0.05)

Gives

KeyError                                  Traceback (most recent call last)
<ipython-input-5-1838b39dbcff> in <module>
      1 data = pd.Series([1, 8, 9, 10, 9])
----> 2 grubbs.test(data, alpha=0.05)

~\miniconda3\lib\site-packages\outliers\smirnov_grubbs.py in test(data, alpha)
    241 
    242 def test(data, alpha=DEFAULT_ALPHA):
--> 243     return two_sided_test(data, alpha)

~\miniconda3\lib\site-packages\outliers\smirnov_grubbs.py in two_sided_test(data, alpha)
    205 
    206 def two_sided_test(data, alpha=DEFAULT_ALPHA):
--> 207     return _two_sided_test(data, alpha, OutputType.DATA)
    208 
    209 

~\miniconda3\lib\site-packages\outliers\smirnov_grubbs.py in _two_sided_test(data, alpha, output_type)
    193 
    194 def _two_sided_test(data, alpha, output_type):
--> 195     return _test(TwoSidedGrubbsTest, data, alpha, output_type)
    196 
    197 

~\miniconda3\lib\site-packages\outliers\smirnov_grubbs.py in _test(test_class, data, alpha, output_type)
    189 
    190 def _test(test_class, data, alpha, output_type):
--> 191     return test_class(data).run(alpha, output_type=output_type)
    192 
    193 

~\miniconda3\lib\site-packages\outliers\smirnov_grubbs.py in run(self, alpha, output_type)
    120 
    121         while True:
--> 122             outlier_index = self._test_once(data, alpha)
    123             if outlier_index is None:
    124                 break

~\miniconda3\lib\site-packages\outliers\smirnov_grubbs.py in _test_once(self, data, alpha)
    101         :return: the index of the outlier if one if found; None otherwise
    102         """
--> 103         target_index, value = self._target(data)
    104 
    105         g = value / data.std()

~\miniconda3\lib\site-packages\outliers\smirnov_grubbs.py in _target(self, data)
    152         relative_values = abs(data - data.mean())
    153         index = relative_values.argmax()
--> 154         value = relative_values[index]
    155         return index, value
    156 

~\miniconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    869         key = com.apply_if_callable(key, self)
    870         try:
--> 871             result = self.index.get_value(self, key)
    872 
    873             if not is_scalar(result):

~\miniconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
   4402         k = self._convert_scalar_indexer(k, kind="getitem")
   4403         try:
-> 4404             return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
   4405         except KeyError as e1:
   4406             if len(self) > 0 and (self.holds_integer() or self.is_boolean()):

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.