Coder Social home page Coder Social logo

luminol's Introduction

luminol

Python Versions Build Status

Overview

Luminol is a light weight python library for time series data analysis. The two major functionalities it supports are anomaly detection and correlation. It can be used to investigate possible causes of anomaly. You collect time series data and Luminol can:

  • Given a time series, detect if the data contains any anomaly and gives you back a time window where the anomaly happened in, a time stamp where the anomaly reaches its severity, and a score indicating how severe is the anomaly compare to others in the time series.
  • Given two time series, help find their correlation coefficient. Since the correlation mechanism allows a shift room, you are able to correlate two peaks that are slightly apart in time.

Luminol is configurable in a sense that you can choose which specific algorithm you want to use for anomaly detection or correlation. In addition, the library does not rely on any predefined threshold on the values of a time series. Instead, it assigns each data point an anomaly score and identifies anomalies using the scores.

By using the library, we can establish a logic flow for root cause analysis. For example, suppose there is a spike in network latency:

  • Anomaly detection discovers the spike in network latency time series
  • Get the anomaly period of the spike, and correlate with other system metrics(GC, IO, CPU, etc.) in the same time range
  • Get a ranked list of correlated metrics, and the root cause candidates are likely to be on the top.

Investigating the possible ways to automate root cause analysis is one of the main reasons we developed this library and it will be a fundamental part of the future work.


Installation

make sure you have python, pip, numpy, and install directly through pip:

pip install luminol

the most up-to-date version of the library is 0.4.


Quick Start

This is a quick start guide for using luminol for time series analysis.

  1. import the library
import luminol
  1. conduct anomaly detection on a single time series ts.
detector = luminol.anomaly_detector.AnomalyDetector(ts)
anomalies = detector.get_anomalies()
  1. if there is anomaly, correlate the first anomaly period with a secondary time series ts2.
if anomalies:
    time_period = anomalies[0].get_time_window()
    correlator = luminol.correlator.Correlator(ts, ts2, time_period)
  1. print the correlation coefficient
print(correlator.get_correlation_result().coefficient)

These are really simple use of luminol. For information about the parameter types, return types and optional parameters, please refer to the API.


Modules

Modules in Luminol refers to customized classes developed for better data representation, which are Anomaly, CorrelationResult and TimeSeries.

Anomaly

class luminol.modules.anomaly.Anomaly
It contains these attributes:

self.start_timestamp: # epoch seconds represents the start of the anomaly period.
self.end_timestamp: # epoch seconds represents the end of the anomaly period.
self.anomaly_score: # a score indicating how severe is this anomaly.
self.exact_timestamp: # epoch seconds indicates when the anomaly reaches its severity.

It has these public methods:

  • get_time_window(): returns a tuple (start_timestamp, end_timestamp).

CorrelationResult

class luminol.modules.correlation_result.CorrelationResult
It contains these attributes:

self.coefficient: # correlation coefficient.
self.shift: # the amount of shift needed to get the above coefficient.
self.shifted_coefficient: # a correlation coefficient with shift taken into account.

TimeSeries

class luminol.modules.time_series.TimeSeries

__init__(self, series)
  • series(dict): timestamp -> value

It has a various handy methods for manipulating time series, including generator iterkeys, itervalues, and iteritems. It also supports binary operations such as add and subtract. Please refer to the code and inline comments for more information.


API

The library contains two classes: AnomalyDetector and Correlator, and there are two sets of APIs, one corresponding to each class. There are also customized modules for better data representation. The Modules section in this documentation may provide useful information as you walk through the APIs.

AnomalyDetector

class luminol.anomaly_detector.AnomalyDetecor

__init__(self, time_series, baseline_time_series=None, score_only=False, score_threshold=None,
         score_percentile_threshold=None, algorithm_name=None, algorithm_params=None,
         refine_algorithm_name=None, refine_algorithm_params=None)
  • time_series: The metric you want to conduct anomaly detection on. It can have the following three types:
1. string: # path to a csv file
2. dict: # timestamp -> value
3. lumnol.modules.time_series.TimeSeries
  • baseline_time_series: an optional baseline time series of one the types mentioned above.
  • score only(bool): if asserted, anomaly scores for the time series will be available, while anomaly periods will not be identified.
  • score_threshold: if passed, anomaly scores above this value will be identified as anomaly. It can override score_percentile_threshold.
  • score_precentile_threshold: if passed, anomaly scores above this percentile will be identified as anomaly. It can not override score_threshold.
  • algorithm_name(string): if passed, the specific algorithm will be used to compute anomaly scores.
  • algorithm_params(dict): additional parameters for algorithm specified by algorithm_name.
  • refine_algorithm_name(string): if passed, the specific algorithm will be used to compute the time stamp of severity within each anomaly period.
  • refine_algorithm_params(dict): additional parameters for algorithm specified by refine_algorithm_params.

Available algorithms and their additional parameters are:

1.  'bitmap_detector': # behaves well for huge data sets, and it is the default detector.
    {
      'precision'(4): # how many sections to categorize values,
      'lag_window_size'(2% of the series length): # lagging window size,
      'future_window_size'(2% of the series length): # future window size,
      'chunk_size'(2): # chunk size.
    }
2.  'default_detector': # used when other algorithms fails, not meant to be explicitly used.
3.  'derivative_detector': # meant to be used when abrupt changes of value are of main interest.
    {
      'smoothing factor'(0.2): # smoothing factor used to compute exponential moving averages
                                # of derivatives.
    }
4.  'exp_avg_detector': # meant to be used when values are in a roughly stationary range.
                        # and it is the default refine algorithm.
    {
      'smoothing factor'(0.2): # smoothing factor used to compute exponential moving averages.
      'lag_window_size'(20% of the series length): # lagging window size.
      'use_lag_window'(False): # if asserted, a lagging window of size lag_window_size will be used.
    }

It may seem vague for the meanings of some parameters above. Here are some useful insights:

The AnomalyDetector class has the following public methods:

  • get_all_scores(): returns an anomaly score time series of type TimeSeries.
  • get_anomalies(): return a list of Anomaly objects.

Correlator

class luminol.correlator.Correlator

__init__(self, time_series_a, time_series_b, time_period=None, use_anomaly_score=False,
         algorithm_name=None, algorithm_params=None)
  • time_series_a: a time series, for its type, please refer to time_series for AnomalyDetector above.
  • time_series_b: a time series, for its type, please refer to time_series for AnomalyDetector above.
  • time_period(tuple): a time period where to correlate the two time series.
  • use_anomaly_score(bool): if asserted, the anomaly scores of the time series will be used to compute correlation coefficient instead of the original data in the time series.
  • algorithm_name: if passed, the specific algorithm will be used to calculate correlation coefficient.
  • algorithm_params: any additional parameters for the algorithm specified by algorithm_name.

Available algorithms and their additional parameters are:

1.  'cross_correlator': # when correlate two time series, it tries to shift the series around so that it
                       # can catch spikes that are slightly apart in time.
    {
      'max_shift_seconds'(60): # maximal allowed shift room in seconds,
      'shift_impact'(0.05): # weight of shift in the shifted coefficient.
    }

The Correlator class has the following public methods:

  • get_correlation_result(): return a CorrelationResult object.
  • is_correlated(threshold=0.7): if coefficient above the passed in threshold, return a CorrelationResult object. Otherwise, return false.

Example

  1. Calculate anomaly scores.
from luminol.anomaly_detector import AnomalyDetector

ts = {0: 0, 1: 0.5, 2: 1, 3: 1, 4: 1, 5: 0, 6: 0, 7: 0, 8: 0}

my_detector = AnomalyDetector(ts)
score = my_detector.get_all_scores()
for timestamp, value in score.iteritems():
    print(timestamp, value)

""" Output:
0 0.0
1 0.873128250131
2 1.57163085024
3 2.13633686334
4 1.70906949067
5 2.90541813415
6 1.17154110935
7 0.937232887479
8 0.749786309983
"""
  1. Correlate ts1 with ts2 on every anomaly.
from luminol.anomaly_detector import AnomalyDetector
from luminol.correlator import Correlator

ts1 = {0: 0, 1: 0.5, 2: 1, 3: 1, 4: 1, 5: 0, 6: 0, 7: 0, 8: 0}
ts2 = {0: 0, 1: 0.5, 2: 1, 3: 0.5, 4: 1, 5: 0, 6: 1, 7: 1, 8: 1}

my_detector = AnomalyDetector(ts1, score_threshold=1.5)
score = my_detector.get_all_scores()
anomalies = my_detector.get_anomalies()
for a in anomalies:
    time_period = a.get_time_window()
    my_correlator = Correlator(ts1, ts2, time_period)
    if my_correlator.is_correlated(threshold=0.8):
        print("ts2 correlate with ts1 at time period (%d, %d)" % time_period)

""" Output:
ts2 correlates with ts1 at time period (2, 5)
"""

Contributing

Clone source and install package and dev requirements:

pip install -r requirements.txt
pip install pytest pytest-cov pylama

Tests and linting run with:

python -m pytest --cov=src/luminol/ src/luminol/tests/
python -m pylama -i E501 src/luminol/

luminol's People

Contributors

brennv avatar earthgecko avatar riteshmaheshwari avatar skhode avatar tabaaway avatar vicky002 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

luminol's Issues

Package Definition

Hi,

This is not really an issue but couple questions. The example code that calculates the anomaly scores e.g:

from luminol.anomaly_detector import AnomalyDetector

ts = {0: 0, 1: 0.5, 2: 1, 3: 1, 4: 1, 5: 0, 6: 0, 7: 0, 8: 0}

my_detector = AnomalyDetector(ts)
score = my_detector.get_all_scores()
for timestamp, value in score.iteritems():
print(timestamp, value)

Does it calculate the scores as they come like a real-time anomaly detection instead of looking at what the value is before? Is there a way to tune the parameters of the above code as well like the window size and chunk size? If so, how?

Thank you very much.

ts2 ??

Hi! For the secondary time_series, for example, I had a list of market prices for my variable ts, what would be a good ts2?

Thanks!

Calculating anomaly score for multivariate data set.

I have been using Luminol to calculate anomaly scores for a univariate data sets(Timestamp & Value) and getting good results. Now, I want to move into multivariate data sets(Timestamp & Value 1 & Value 2 & .... & Value N) and detect a single anomaly score based upon all values. I finding hard on how to proceed with this problem statement. Is there a way I can apply Luminol to this problem or could you suggest me a way on how to proceed?

Thank you.

Python 3.6 doesn't run your examples

Looks like the support for Python 3 is not completely developed yet. With Python 3.6 the examples you have don't run:

from luminol.anomaly_detector import AnomalyDetector

What's the easy fix for this?

Issue when importing modules

Hello,

When I try to import the modules, I got some error messages here,
>>> from luminol.correlator import Correlator Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.4/dist-packages/luminol/correlator.py", line 19, in <module> from luminol.anomaly_detector import AnomalyDetector File "/usr/local/lib/python3.4/dist-packages/luminol/anomaly_detector.py", line 19, in <module> from luminol.algorithms.anomaly_detector_algorithms.all import anomaly_detector_algorithms File "/usr/local/lib/python3.4/dist-packages/luminol/algorithms/anomaly_detector_algorithms/all.py", line 12, in <module> from luminol.algorithms.anomaly_detector_algorithms import * File "/usr/local/lib/python3.4/dist-packages/luminol/algorithms/anomaly_detector_algorithms/default_detector.py", line 12, in <module> from exp_avg_detector import ExpAvgDetector ImportError: No module named 'exp_avg_detector'

I think there is some issue with the module exp_avg_detector.

Could you please help fix that?

Thanks a lot!

Sophie

Installation error in Alpine

I am trying to install Luminol in Alpine, but its throwing error while installing numpy. Is it possible to install Luminol in Alpine with python 3.6?

Add test runner

Could we add a test runner like travis to make it easier to contribute? #11

Citing luminol

Hi,

I would like to cite luminol for an academic publication, but was unable to find a list of authors other than "Naarad Developers". Would it be possible to let me know how you would you like me to cite the package?

Thank you very much in advance.

Best wishes,
Alex

Python3 roadmap

Continuing the discussion from #15. I think adding travis and getting tests passing would good start. That way we can watch follow-on PRs pass or fail.

  • fix tests #20
  • add travis #11
  • fix spacing #25
  • clean up #27
  • add Python 3 support #28
  • maybe more pep8 fun #29 and/or refactor tests
  • update readme #30 and bump version #31
  • pypi release

Pandas Support

I am interested in your module, however I noticed that it doesn't support Pandas dataframes out of the box. Would you mind explaining the reasoning behind this? Also - I could potentially add this capability. However, I am trying to understand any potential pitfalls.

Thanks in advance. Great work.

incorrect normalize method

Unless there is a specific reason for only using the max value to normalize the values with, the normalize function could do with being normalized or renamed if the current normalize method is the desired method of "normalization" as it is not what is generally accepted as normalizing, just using the max that is.

Although this normalize method does return all the values between 0 and 1, for a positive values set, it does not normalize the data if there are negative values in the set, this could skew correlations and seeing as the average is being applied, arguably normalize is being incorrectly calculated.

without pip install, can I use it?

I am trying to use it inside out restricted environment, is there a way I can download the package and run following instructions from you in our DEV environment?

error in AnomalyDetector instantiation

I'm trying to run the Quick Start example, and in the very first command that instantiates a detector I get the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The command run is

detector = anomaly_detector.AnomalyDetector(ts)

Could this be caused by a change in the Pandas library?

I installed the last version of luminol, 0.3.1, with pip and I'm using Python 2.7.11 from the Anaconda distribution version 4.1.0 on Kubuntu 15.10.

This is the full traceback of the error:

<class 'pandas.core.series.Series'>
Traceback (most recent call last):
  File "open_heat_treatments.py", line 97, in <module>
    main()
  File "open_heat_treatments.py", line 90, in main
    detector = anomaly_detector.AnomalyDetector(ts)
  File "/home/dp/anaconda2/lib/python2.7/site-packages/luminol/anomaly_detector.py", line 44, in __init__
    self.time_series = self._load(time_series)
  File "/home/dp/anaconda2/lib/python2.7/site-packages/luminol/anomaly_detector.py", line 69, in _load
    if not time_series:
  File "/home/dp/anaconda2/lib/python2.7/site-packages/pandas/core/generic.py", line 892, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How to set the new parameter values to the algorithm

I would like to modify the below parameters in my algorithm.

'precision'(4): # how many sections to categorize values,
'lag_window_size'(2% of the series length): # lagging window size,
'future_window_size'(2% of the series length): # future window size,
'chunk_size'(2): # chunk size.

How to pack and pass the above parameters in "algorithm_params" variable.

Thanks in advance.

0.0 Anomaly Score for larger dataSet-bitmap_detector

Dear Team,
I am using luminol for the larger dataset which has almost 25k rows and the issue is by using algorithm as "bitmap_detector" ,the anomaly score for the initial 200 values and the last 200 values remains as "0.0" even if there is some difference within the input values .
If we try to pass only those 200 values we receive some anomaly scores. please respond asap.

Thanks,
Prabhat

Latest fix/release is not published to pip

Hello,

Is the repo still being maintained? I noticed that you have a fix for module 'numpy' has no attribute 'asscalar' but it has not been published. Can you please publish the latest?

Thank you.

Timestamp format

Hi
i'm looking for toolkits for timeseries anomaly detection, i think this cloud help me but i didn't understand how to exactly work with luminol, how is the data format or the input format,can some one provide me with a simple example, for example i have a .csv file with values and date/time and i want to detect anomalies.. what format should i use for input?

Warn user on automatic modify of algorithm or parameters

Please throw a warning message to the user when automatically modifying parameters or algorithms. Doing this silently makes it extremely difficulty to debug and fine-tune.

def _sanity_check(self):
"""
Check if there are enough data points.
"""
windows = self.lag_window_size + self.future_window_size
if (not self.lag_window_size or not self.future_window_size or self.time_series_length < windows or windows < DEFAULT_BITMAP_MINIMAL_POINTS_IN_WINDOWS):
raise exceptions.NotEnoughDataPoints
# If window size is too big, too many data points will be assigned a score of 0 in the first lag window
# and the last future window.
if self.lag_window_size > DEFAULT_BITMAP_MAXIMAL_POINTS_IN_WINDOWS:
self.lag_window_size = DEFAULT_BITMAP_MAXIMAL_POINTS_IN_WINDOWS
if self.future_window_size > DEFAULT_BITMAP_MAXIMAL_POINTS_IN_WINDOWS:
self.future_window_size = DEFAULT_BITMAP_MAXIMAL_POINTS_IN_WINDOWS

def _detect(self, score_only):
"""
Detect anomaly periods.
:param bool score_only: if true, only anomaly scores are computed.
"""
try:
algorithm = self.algorithm(**self.algorithm_params)
self.anom_scores = algorithm.run()
except exceptions.NotEnoughDataPoints:
algorithm = anomaly_detector_algorithms['default_detector'](self.time_series)
self.threshold = self.threshold or ANOMALY_THRESHOLD['default_detector']
self.anom_scores = algorithm.run()
if not score_only:
self._detect_anomalies()

Refined vs unrefined anomaly?

In the constants section it says:

# Indicate which algorithm to use to calculate anomaly scores.
ANOMALY_DETECTOR_ALGORITHM = 'bitmap_detector'

# Indicate which algorithm to use to get refined maximal score within each anomaly.
ANOMALY_DETECTOR_REFINE_ALGORITHM = 'exp_avg_detector'

What does this mean?

Exemple Run

Hello, first of all sorry about my ignorance in the subject, I'm a new Python user and at programming at all.

I'm really having trouble to run the code, I can't find the part of the code which I should change the parameters and the directory to load my personal data...

Could anybody gimme a little help to make the anomally detection exemple works with the Luminol Libray?

Thanks in advance

Materials to read about anomaly detection

Not really a code related question, but more of a methods question.
Is there some material on the basic concepts that have been used to develop the luminol package? For example, whats the basic idea behind detecting the anomaly, how to interpret the score, how does the algorithm handles the seasonality and trend in the data. Should we make the time series stationary before using it? Or how does the package manages to work with data with non-stationary time series?

Thanks

Regressing / smoothing input time-series based on anomalies

Is there a way to objectively regress / normalize discrete points in the original time-series (ts) based on the anomalies time series (spikes), which are essentially "weights". I basically want to use the anomaly detector as a smoothing mask. Does this exist currently?

detector = anomaly_detector.AnomalyDetector(ts)
spikes = detector.get_all_scores().values

Cross Correlation question

Hello Developers,

I just find luminol use cross correlation to calculate the coefficient. Could you please explain what does the "shift" mean? if shift = -2, that means, the ts2 which will be shifted 2 time period can correlate with ts1, right?

One more question, within cross correlation method, is there any way to verify if it is positive correlation or negative correlation?

latest version (0.4) not published to pypi?

Hello! I see that all of @brennv 's PRs have been merged in (see issue #22 ), and the package version has been incremented here in the repo, but PyPi has not yet been updated to v0.4.

@RiteshMaheshwari , could I ask you for one last favor: Publish the latest version of luminol to PyPi, so we can reap the benefits of all those recent commits? Again, if you're not the person to tag / nag, please point me in the right direction. Thanks for your help!

Example 1 Put anomaly scores in a list is giving error

getting ValueError: (22, 'Invalid argument')
in line: t_str = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(timestamp))

actually, the error is due to value of timestamp provided as argument in time.localtime(timestamp)

using SAR-device.sdb.await.csv provided in luminol/demo/src/static/data/.

Using Python 2.7. Please Help.

Where are the affiliation metrics?

According to Huet et al. (10.1145/3534678.3539339), the source code corresponding to the computattion of the proposed affiliation metrics can be found in this repository, yet I have not been able to find anything? Could someone confirm that these metrics are indeed present and if so, where?

error in diff_percent_threshold.py

the code in enumerater should be baseline_value = self.baseline_time_series[timestamp] instead of baseline_value = self.baseline_time_series[i].
otherwise it will give "timestamp does not exist in time series object" exception.

problem with import

I have install the package via pip (using python 2.7.12)
When I import the module luminol , it lacks all of the basic functions. I've atached a print screen.
Can you help me ?
image

Real-world dataset for testing

Thanks for developing the package. I am wondering is there any real world dataset to test the anomaly detection as well as correlation for large volume of data using the package? I mean for testing both efficiency and effectiveness.

Streaming Data

Is it possible to use this library when data is not offline. I am dealing with streaming data, trying to figure out the way to use this library for it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.