Coder Social home page Coder Social logo

minqi824 / adbench Goto Github PK

View Code? Open in Web Editor NEW
777.0 15.0 123.0 2.06 GB

Official Implement of "ADBench: Anomaly Detection Benchmark", NeurIPS 2022.

License: BSD 2-Clause "Simplified" License

Python 100.00%
anomaly-detection benchmark data-mining deep-learning machine-learning outlier-detection semi-supervised-learning supervised-learning unsupervised-learning data-sicence

adbench's Introduction

Glad to see you! πŸ‘‹ Welcome to my (Minqi Jiang, ζ±Ÿζ•η₯Ί for Chinese) profile:

I'm a third-year PhD candidate in Shanghai University of Finance and Economics (SUFE). At SUFE, I work with my PhD tutor Songqiao Han. I am glad to be a member of SUFE AI Lab (lead by professor Hailiang Huang). Currently, Anomaly Detection (aka Outlier Detection) is my major research direction, and I'm also interested in NLP and Quantitative Investment, see as follows:

Title Research Direction Conference/Journal Paper Code
ADGym: Design Choices for Deep Anomaly Detection Anomaly Detection NeurIPS 2023 πŸ“„ πŸ’»
Anomaly Detection with Score Distribution Discrimination Anomaly Detection KDD 2023 πŸ“„ πŸ’»
ADBench: Anomaly detection benchmark Anomaly Detection NeurIPS 2022 πŸ“„ πŸ’»
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection NLP IJCAI Workshop@LLM πŸ“„ πŸ’»
An Improved Stacking Framework for Predicting Stock Price Index Direction Quantitative Investment Economic Computation & Economic Cybernetics Studies & Research πŸ“„
An improved Stacking framework for stock index prediction by leveraging tree-based ensemble models and deep learning algorithms Quantitative Investment Physica A: Statistical Mechanics and its Applications πŸ“„
An extended regularized Kalman filter based on Genetic Algorithm: Application to dynamic asset pricing models Quantitative Investment The Quarterly Review of Economics and Finance πŸ“„

Quick links to know me better...


adbench's People

Contributors

braudocc avatar minqi824 avatar xiyanghu avatar yzhao062 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

adbench's Issues

Dataset Source/Link

Thanks for the great job! I wonder if it's possible to provide the link/source of the dataset so we can know more about them? Thanks a lot.

dependency installation:

Shall we add a setup.py to ensure that all the dependency are installed?

I needed to install it.

Platform

It there going to be a platform so I can evaluate my method on it?

Include ELKI for some 20+ additional algorithms

ELKI, which can easily be invoked from command line, provides many additional algorithms missing from this benchmark, such as:

  • DB Outlier
  • HilOut
  • KNNDD (not the same as KNN Outlier)
  • KNNSOS
  • KNN-weight
  • Local Isolation Coefficient
  • ODIN
  • Reference-Based outlier detection
  • SOS
  • HySort OD
  • ALOCI
  • INFLO
  • KDEOS
  • LDF
  • LDOF
  • LOCI
  • LoOP
  • SimplifiedLOF
  • VarianceOfVolume
  • ABOD / FastABOD, LB-ABOD
  • IDOS
  • ISOS
  • LID
  • GLOSH for HDBSCAN*

In other cases, it may be desirable to compare the performance of different implementations:

  • Isolation Forest
  • kNN Outlier
  • LOF
  • COF
  • CBLOF
    Because sometimes one implementation may be better than another.

copula function error in some datasets

I am getting errors when running synthetic dependency anomalies for multiple datasets. I found this remark in data_generator.py "# we found that copula function may occur error in some datasets". How did you overcome this issue? The dependency anomalies fail to generate.

The BaseADDataset can not import

Hello, when I run deepsad.py, there's an error that BaseADDataset cannot import, but two weeks ago it can be imported. May the problem is that package's version is outdated?
Thank you for providing help!
mmexport1679003636153

ADBench with custom data

Hello, I am trying to replicate the demo notebook but using a different open source data. I am getting keyerror. I have tried changing the data as well and have reduced the number of rows as well. Please help.

Attaching screenshots

ad`b_2

adb_1

Error in model fitting

Hello guys!
Super amazing job! Thank you.
I have tried first examples, but some donΒ΄t run well, could you help me please?
thank you so much.

CODE:

# customized model on ADBench's datasets
from adbench.run import RunPipeline
from adbench.baseline.Customized.run import Customized

# notice that you should specify the corresponding category of your customized AD algorithm
# for example, here we use Logistic Regression as customized clf, which belongs to the supervised algorithm
# for your own algorithm, you can realize the same usage as other baselines by modifying the fit.py, model.py, and run.py files in the adbench/baseline/Customized
pipeline = RunPipeline(suffix='ADBench', parallel='supervise', realistic_synthetic_mode=None, noise_type=None)
results = pipeline.run(clf=Customized)

# customized model on customized dataset
import numpy as np
dataset = {}
dataset['X'] = np.random.randn(1000, 20)
dataset['y'] = np.random.choice([0, 1], 1000)
results = pipeline.run(dataset=dataset, clf=Customized)
print(results)

KIND OF REPETITIVE OUTPUT:

generating duplicate samples for dataset 39_vertebral...
current noise type: None
{'Samples': 1000, 'Features': 6, 'Anomalies': 138, 'Anomalies Ratio(%)': 13.8}
Error in model fitting. Model:Customized, Error: scikit-learn estimators should always specify their parameters in the signature of their __init__ (no varargs). <class 'adbench.baseline.Customized.model.LR'> with constructor (self, *args, **kwargs) doesn't  follow this convention.
Current experiment parameters: ('39_vertebral', 1.0, 2), model: Customized, metrics: {'aucroc': nan, 'aucpr': nan}, fitting time: None, inference time: None

python 3.10.11
pyod = 1.0.0
MAC M2, Ventura 13

I FOUND THAT probably has to do with how parameters are feed, but i really dont think this could be the solution in t his ca se
https://stackoverflow.com/questions/40025406/inherit-from-scikit-learns-lassocv-model

Thank you again for your help

fatal: early EOF fatal: fetch-pack: invalid index-pack output

δΈ‹θ½½ζ¨‘εž‹ζ—ΆζŠ₯ι”™ :
PS D:\PyCharm> git clone https://github.com/Minqi824/ADBench.git
Cloning into 'ADBench'...
remote: Enumerating objects: 1074, done.
remote: Counting objects: 100% (189/189), done.
remote: Compressing objects: 100% (94/94), done.
error: RPC failed; curl 18 HTTP/2 stream 5 was reset0 KiB/s
error: 995 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output

CV in ADBench

Hi,

I am a little new to Anomaly detection but I was curious about what is the right way to do cross validation while using ADBench as the test and train samples are already split via datagenerator. An easy way will be to concatenate test and train datasets and then put them in the CV loop, but is there a cleaner way possible?

Dependency issue

The requirements.txt file restricts the version of PyOD to 1.0.0, but not any of the other libraries. However, the newest version of scikit-learn and tensorflow throws errors for some models (LODA and DeepSVDD for example). You should either restrict scikit-learn and tensorflow to previous versions or use the newest version of PyOD. It makes dealing with ADBench very annoying with creating my requirements.txt for my project. It is related to this issue yzhao062/pyod#406.

Data set choice: pay attention to use unbalanced data

Data sets with 50% anomalies are not anomaly detection!

More data sets does not mean more meaningful results, because "garbage in, garbage out".
One of the big problems with current anomaly detection research is that we do not use good data sets to evaluate results, hence everything works sometimes by chance, and there is little systematic benefits observable because the data sets are not properly labeled as anomalies.
I am by now convinced that from most of the commonly used data sets, you cannot draw meaningful conclusions because of unsuitable labeling.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.