Coder Social home page Coder Social logo

ml4its / mtad-gat-pytorch Goto Github PK

View Code? Open in Web Editor NEW
312.0 8.0 73.0 920.69 MB

PyTorch implementation of MTAD-GAT (Multivariate Time-Series Anomaly Detection via Graph Attention Networks) by Zhao et. al (2020, https://arxiv.org/abs/2009.02040).

License: MIT License

Python 99.04% Shell 0.96%
attention anomaly-detection gnn 2021 graph-neural-networks graph-attention-networks pytorch deep-learning time-series mtad-gat

mtad-gat-pytorch's Introduction

ℹ️ This repo is not under active maintenance. PRs are however very welcome!
Thanks to our contributors:


Our implementation of MTAD-GAT: Multivariate Time-series Anomaly Detection (MTAD) via Graph Attention Networks (GAT) by Zhao et al. (2020).

  • This repo includes a complete framework for multivariate anomaly detection, using a model that is heavily inspired by MTAD-GAT.
  • Our work does not serve to reproduce the original results in the paper.
  • 📧 For contact, feel free to use [email protected]

❗ Key Notes

  • By default we use the recently proposed GATv2, but include the option to use the standard GAT
  • Instead of using a Variational Auto-Encoder (VAE) as the Reconstruction Model, we use a GRU-based decoder.
  • We provide implementations of the following thresholding methods, but their parameters should be customized to different datasets:
    • peaks-over-threshold (POT) as in the MTAD-GAT paper
    • thresholding method proposed by Hundman et. al.
    • brute-force method that searches through "all" possible thresholds and picks the one that gives highest F1 score
    • All methods are applied, and their respective results are outputted together for comparison.
  • Parts of our code should be credited to the following:
    • OmniAnomaly for preprocessing and evaluation methods and an implementation of POT
    • TelemAnom for plotting methods and thresholding method
    • pyGAT by Diego Antognini for inspiration on GAT-related methods
    • Their respective licences are included in licences.

⚡ Getting Started

To clone the repo:

git clone https://github.com/ML4ITS/mtad-gat-pytorch.git && cd mtad-gat-pytorch

Get data:

cd datasets && wget https://s3-us-west-2.amazonaws.com/telemanom/data.zip && unzip data.zip && rm data.zip &&
cd data && wget https://raw.githubusercontent.com/khundman/telemanom/master/labeled_anomalies.csv &&
rm -rf 2018-05-19_15.00.10 && cd .. && cd ..

This downloads the MSL and SMAP datasets. The SMD dataset is already in repo. We refer to TelemAnom and OmniAnomaly for detailed information regarding these three datasets.

Install dependencies (virtualenv is recommended):

pip install -r requirements.txt 

Preprocess the data:

python preprocess.py --dataset <dataset>

where <dataset> is one of MSL, SMAP or SMD.

To train:

 python train.py --dataset <dataset>

where <dataset> is one of msl, smap or smd (upper-case also works). If training on SMD, one should specify which machine using the --group argument.

You can change the default configuration by adding more arguments. All arguments can be found in args.py. Some examples:

  • Training machine-1-1 of SMD for 10 epochs, using a lookback (window size) of 150:
python train.py --dataset smd --group 1-1 --lookback 150 --epochs 10 
  • Training MSL for 10 epochs, using standard GAT instead of GATv2 (which is the default), and a validation split of 0.2:
python train.py --dataset msl --epochs 10 --use_gatv2 False --val_split 0.2

⚙️ Default configuration:

Default parameters can be found in args.py.

Data params:

--dataset='SMD' --group='1-1' --lookback=100 --normalize=True

Model params:

--kernel_size=7 --use_gatv2=True --feat_gat_embed_dim=None --time_gat_embed_dim=None
--gru_n_layers=1 --gru_hid_dim=150 --fc_n_layers=3 --fc_hid_dim=150 --recon_n_layers=1
--recon_hid_dim=150 --alpha=0.2

Train params:

--epochs=30 --val_split=0.1 --bs=256 --init_lr=1e-3 --shuffle_dataset=True --dropout=0.3
--use_cuda=True --print_every=1 --log_tensorboard=True

Anomaly Predictor params:

--save_scores=True --load_scores=False --gamma=1 --level=None --q=1e-3 --dynamic_pot=False
--use_mov_av=False

👀 Output and visualization results

Output are saved in output/<dataset>/<ID> (where the current datetime is used as ID) and include:

  • summary.txt: performance on test set (precision, recall, F1, etc.)
  • config.txt: the configuration used for model, training, etc.
  • train/test.pkl: saved forecasts, reconstructions, actual, thresholds, etc.
  • train/test_scores.npy: anomaly scores
  • train/validation_losses.png: plots of train and validation loss during training
  • model.pt model parameters of trained model

This repo includes example outputs for MSL, SMAP and SMD machine 1-1.

result_visualizer.ipynb provides a jupyter notebook for visualizing results. To launch notebook:

jupyter notebook result_visualizer.ipynb

Predicted anomalies are visualized using a blue rectangle.
Actual (true) anomalies are visualized using a red rectangle.
Thus, correctly predicted anomalies are visualized by a purple (blue + red) rectangle.
Some examples:

SMD test set (feature 0) SMD train set (feature 0)
drawing drawing

Example from SMAP test set: drawing

Example from MSL test set (note that one anomaly segment is not detected): drawing

🧬 Model Overview

drawing

Figure above adapted from Zhao et al. (2020)

  1. The raw input data is preprocessed, and then a 1-D convolution is applied in the temporal dimension in order to smooth the data and alleviate possible noise effects.
  2. The output of the 1-D convolution module is processed by two parallel graph attention layer, one feature-oriented and one time-oriented, in order to capture dependencies among features and timestamps, respectively.
  3. The output from the 1-D convolution module and the two GAT modules are concatenated and fed to a GRU layer, to capture longer sequential patterns.
  4. The output from the GRU layer are fed into a forecasting model and a reconstruction model, to get a prediction for the next timestamp, as well as a reconstruction of the input sequence.

📖 GAT layers

Below we visualize how the two GAT layers view the input as a complete graph.

Feature-Oriented GAT layer Time-Oriented GAT layer
drawing drawing

Left: The feature-oriented GAT layer views the input data as a complete graph where each node represents the values of one feature across all timestamps in the sliding window.

Right: The time-oriented GAT layer views the input data as a complete graph in which each node represents the values for all features at a specific timestamp.

📖 GATv2

Recently, Brody et al. (2021) proposed GATv2, a modified version of the standard GAT.

They argue that the original GAT can only compute a restricted kind of attention (which they refer to as static) where the ranking of attended nodes is unconditioned on the query node. That is, the ranking of attention weights is global for all nodes in the graph, a property which the authors claim to severely hinders the expressiveness of the GAT. In order to address this, they introduce a simple fix by modifying the order of operations, and propose GATv2, a dynamic attention variant that is strictly more expressive that GAT. We refer to the paper for further reading. The difference between GAT and GATv2 is depicted below:

drawing

mtad-gat-pytorch's People

Contributors

axeloh avatar srigas avatar wqvaale avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mtad-gat-pytorch's Issues

Embedding vector dimension issue in the paper

image
The dimension of the input data is n×k. After the convolution is completed, the dimension of the data must have changed. However, according to the content of the paper, the dimension has not changed after the convolution is completed and is still n×k. And,then send it to GATs

tqdm is not imported in prediction.py + request

Dear authors,

First of all, I would like to thank you for publicly offering your implementation version of this method from Zhao et. al (2020) on Github. This material greatly helps me in conducting my master thesis research into unsupervised methods for multivariate time series anomaly detection.

I have one problem I have run into, an error while executing your code for the SMAP dataset. training goes well up until predicting and calculating the anomaly scores from line 167 in train.py onwards. Here the predict_anomalies function on line 118 calls the get_score function from prediction.py, which on line 51 returns "NameError: name 'tqdm' is not defined". This is probably due to the lack of the tqdm import statement at the beginning of the prediction.py file.

I hope this feedback is of value to you.

Apart from mentioning this problem, I have another question. Is it possible to load different datasets into your implementation, or is it hard to do this due to hardcoding practices for example? My intention is to load a custom multivariate time series dataset and evaluate the performance of this method. The dataset comprises several .csv files, each having data of one of multiple IoT sensors where columns constitute the multivariate features and rows the time dimension. In order to transform this data to your data format requirements, I will condense these files into one big file where the multivariate sensor features of all sensors are combined in the columns. Could you recommend a way to load this file into your implementation? I was thinking of adapting the preprocess.py file to do so. Perhaps you could add the option to load custom datasets in the future.

In any case, thank you very much for your efforts.

Running repo on custom data

Hi! Just wondering if there is a way I could run this architecture on a custom dataset? because I can run on the datasets provided in the README file but I would like to check how this mdoel works on my own custom multi-time series set

Thank you in advance

out_dim or n_features

self.recon_model = ReconstructionModel(window_size, gru_hid_dim, recon_hid_dim, out_dim, recon_n_layers, dropout)

self.recon_model = ReconstructionModel(window_size, gru_hid_dim, recon_hid_dim, out_dim, recon_n_layers, dropout)
should the "out_dim" be changed to "n_features" to match the shape of input x while the loss is calculated by MSELoss(recons, x) ??

The parameter of `adjust_predicts()`

Thank you for your excellent work!
I don't understand the adjust_predicts() function when I was reading the source code.

In the adjust_predicts() function, the comment indicates that

threshold (float): The threshold of anomaly score. A point is labeled as "anomaly" if its score is [[lower than]] the threshold.

But when you preprocess the data, in the preprocess.py file

for anomaly in anomalies:
      label[anomaly[0] : anomaly[1] + 1] = True

My question is, why do you consider a point is labeled as "anomaly" if its score lower than the threshold in the adjust_predicts() function when you set the Label of the anomaly to True during data preprocessing? In my opinion the anomaly point which score is higher than the threshold.

Thanks for your time. Hope you can answer my question.

Some question about the param target_dims

I have the question about this param, as the mtad-gat is a multivariable time series model which uses modules to catch the time dependencyand the feature dependency, if use only one dim to training and testing, it just degenerates into univariate time series model. What's the use of the corresponding module of feature dependency in this situation?

msl and smap dataset preprocess gives error

(venv) PS C:\Users\hp\PycharmProjects\mtad-gat-pytorch> python preprocess.py --dataset SMAP
SMAP test_label (427617,)
Traceback (most recent call last):
File "C:\Users\hp\PycharmProjects\mtad-gat-pytorch\preprocess.py", line 96, in
load_data(ds)
File "C:\Users\hp\PycharmProjects\mtad-gat-pytorch\preprocess.py", line 89, in load_data
concatenate_and_save(c)
File "C:\Users\hp\PycharmProjects\mtad-gat-pytorch\preprocess.py", line 81, in concatenate_and_save
temp = np.load(path.join(dataset_folder, category, filename + ".npy"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\hp\PycharmProjects\mtad-gat-pytorch\venv\Lib\site-packages\numpy\lib\npyio.py", line 427, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'datasets/data\train\A-1.npy'

Loss and some slice of the output tensors become NAN

When you rerun the train.py after a while the use_bias parameter causing NANs in the outputs of the Attention layers. The situation occures after the Bias is added to the tensor. Do you have any explaination or solution beside to set the use_bias parameter to False?

The issue with the dataset

Hello! Thank you very much for your work. I have a question. If I'm working with multivariate time series data, for example, 800 timestamps with 10 features each, do I only need one .npy file? I would greatly appreciate it if you could answer my question.

Multiple inconsistent training results

First of all, thank you very much for your code! This has been very helpful to me! But I want to ask one question: "Taking the 1-1 dataset of SMD as an example, why is it that after multiple training and testing sessions, the f1 index of the test results is different, and the difference is significant! Sometimes f1 can reach 0.7, and sometimes it soars to 0.8. This way, I cannot judge the true performance of the model at all! May I ask why this is? And how should I solve it?"

about adjust_predicts() ,please!!!

First,thanks for making this repo public, and I have learned a lot from the issues, thanks for your reply.
I have seen many times about that:

for i in range(len(predict)):
    if any(actual[max(i, 0) : i + 1]) and predict[i] and not anomaly_state:
        anomaly_state = True
        anomaly_count += 1
        for j in range(i, 0, -1):
            if not actual[j]:
                break
            else:
                if not predict[j]:
                    predict[j] = True
                    latency += 1
    elif not actual[i]:
        anomaly_state = False
    if anomaly_state:
        predict[i] = True

It's part of the "adjust_predicts", I am very curious, what is the purpose?
And the how does the "latency" work out?

attention layer

h = self.sigmoid(torch.matmul(attention, x))

I found the attention alpha are multiplicated with original x instead of W * x, which is not same to the graph convolution network. Could you show me the reason? Thanks very much!

FC layer out_dim not matching RECOn layer in_dim

I have run into a logic issue at the point where the Forecast layer output goes into the Reconstruction layer.

`gru_n_layers=1
in_dim=150
out_dim = 8 #Output dimension of the FC layer
forecast_n_layers=1
forecast_hid_dim=150
n_layers=1
hid_dim=150
window_size = 20

dropout = 0.0 if n_layers == 1 else dropout
rnn = nn.GRU(in_dim, hid_dim, n_layers, batch_first=True, dropout=dropout)
print(rnn)

fc = nn.Linear(hid_dim, out_dim)
print(fc)

h_final_end = x_
print(h_final_end.shape)
h_final_end_rep = h_final_end.repeat_interleave(window_size, dim=1).view(x_.size(0), window_size, -1)
print(h_final_end_rep.shape)
decoder_out, _ = rnn(h_final_end_rep)
print(decoder_out.shape)
out = fc(decoder_out)
print(out.shape)`

The Forecast layer output dimension is 8 referring to n_features and the the Reconstruction layer input dimension is 150 referring to gru_hid_dim.

running separetly in notebook i got the error:
RuntimeError: input.size(-1) must be equal to input_size. Expected 150, got 8

I must missed something:(

The reason why use shuffle in time-series data

Hi. Thanks for your wonderful work!

I'm curious about the reason why 'shuffle = True' is default option in this implementation below,
because the data is time-series data.

def create_data_loaders(train_dataset, batch_size, val_split=0.1, shuffle=True, test_dataset=None):

Is there any reason why shuffle the time-series data?
(or even if shuffled data can get the time-oriented features in GAT?)

A Question about the implementation.

Thanks for making this repo public. I have some questions after reading your code.

recon_loss = torch.sqrt(self.recon_criterion(x, recons))

The paper used a VAE-like method for the reconstructor but you simply used MSE like naive autoencoder. I wonder whether this is because of the stability of optimization. In my case, I tried to sample at every timestep like the LSTM-VAE model and sometimes the loss just became nan.

When computing 'e' in the FeatureAttentionLayer, the output of MTAD_GAT 'predictions', 'recons' are all nan, and training is not possible due to the presence of nan.

Dear authors,

Thank you for uploading this code. I am a beginner in multivariate time series anomaly detection and this has been very helpful in my research. I have read and understood your code, but the output is always nan when training. And i can be sure that the data input is normal.

Therefore, I output the result of each step in forward() in mtad_gat.py. Then, after the feature_gat() layer of operation, there is a problem.

So I step into feature_gat(), after e = torch.matmul(a_input, self.a).squeeze(3) , some nan appears, as shown in the figure. Then after softmax there are more nans, usually one column is nan.

I wonder how to solve this problem? I also tried to adjust batch_size,look_back, but nothing works.
捕获

Environment:

  • linux
  • cpu/gpu
  • torch1.10.0 cpu only/torch1.10.0+cu111

About data cleaning

Thank you for your wonderful work! I have a problem that I don't seem to see the part about data cleaning in this repo, i.e. the part of spectral residuals and replacing abnormal data, have you implemented it?

The purpose of `adjust_anomaly_scores`

Thanks for your briliiant work. I would like to know the purpose of adjust_anomaly_scores. Thanks for your time.

 # Remove errors for time steps when transition to new channel (as this will be impossible for model to predict)
    if dataset.upper() not in ['SMAP', 'MSL']:
        return scores

    adjusted_scores = scores.copy()
    if is_train:
        md = pd.read_csv(f'./datasets/data/{dataset.lower()}_train_md.csv')
    else:
        md = pd.read_csv('./datasets/data/labeled_anomalies.csv')
        md = md[md['spacecraft'] == dataset.upper()]

    md = md[md['chan_id'] != 'P-2']

    # Sort values by channel
    md = md.sort_values(by=['chan_id'])

    # Getting the cumulative start index for each channel
    sep_cuma = np.cumsum(md['num_values'].values) - lookback
......................

Exception: Dataset ".\DATASETS\DATA\SMAP_TRAIN_MD.CSV" not available.

raise Exception(f'Dataset "{dataset}" not available.')
Exception: Dataset ".\DATASETS\DATA\SMAP_TRAIN_MD.CSV" not avaipython.exe .\train.py --dataset .\datasets\data\smap_train_md.csvv) PS C:\Users\dheiver\Desktop\Nova pasta\mtad-gat-pytorch>
{'dataset': '.\DATASETS\DATA\SMAP_TRAIN_MD.CSV', 'group': '1-1', 'lookback': 100, 'normalize': True, 'spec_res': False, 'kernel_size': 7, 'use_gatv2': True, 'feat_gat_embed_dim': None, 'time_gat_embed_dim': None, 'gru_n_layers': 1, 'gru_hid_dim':
150, 'fc_n_layers': 3, 'fc_hid_dim': 150, 'recon_n_layers': 1, 'recon_hid_dim': 150, 'alpha': 0.2, 'epochs': 30, 'val_split': 0.1, 'bs': 256, 'init_lr': 0.001, 'shuffle_dataset': True, 'dropout': 0.3, 'use_cuda': True, 'print_every': 1, 'log_tensorboard': True, 'scale_scores': False, 'use_mov_av': False, 'gamma': 1, 'level': None, 'q': None, 'dynamic_pot': False, 'comment': ''}
Traceback (most recent call last):
File "C:\Users\dheiver\Desktop\Nova pasta\mtad-gat-pytorch\train.py", line 43, in
raise Exception(f'Dataset "{dataset}" not available.')
Exception: Dataset ".\DATASETS\DATA\SMAP_TRAIN_MD.CSV" not available.
(env) PS C:\Users\dheiver\Desktop\Nova pasta\mtad-gat-pytorch>

some question about model and the result

Hi Axel,
I have some question for the repo

  1. i read the OmniAnomaly code, i found it use 25/55 as the out_dim for the MSL and SMAP, i also open the MSL data, only the first dimension has value, so can i think the MSL and SMAP is just univariate time series, but almost paper say the dataset is multivariate, it's make me confuse.
  2. i use the repo as the baseline for my research, i found you replace the decoder from VAE to GRU, so did you try the original VAE for the decoder , i do some experiment but i can't achieve the result in the original paper, so if you try VAE can achieve the result in original paper

I will appreciate it if you can reply as soon as possible.

losses are always nan

Hi, hope you are fine.
Thanks for this wonderful work.
I tried training with MSL, and SMD, and my losses are always nan.
Moreover, I also tried GDN repo, and I found that there is a difference in MSL data as compared to this repo.
Thanks for any help.

Regards,
Ali

Evaluation Code

How can I just evaluate trained model?

Is there any method or function just for evaluation?

about the example output,please!!!

The code is written very well and looks very comfortable. Thank you very much. However, I have a question. I trained using the default args on the SMD dataset, and the results obtained were quite different from the "example output". Is it possible that the training parameters for the two are different? If so, could you provide the parameters for the "example output"? Thank you very much.

about gat_layer

Thank you for your excellent work.
I don’t understand something about the gat layer. Your graph_attention is implemented with the function make_attention_input, but it seems that you just copied and spliced x(v) in various ways. I can’t understand how this part implements graph_attention. Can you explain it in detail? ?
In addition, if I want to build a graph that is not fully connected (each node has a fixed number of edges), is this possible?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.