wenjiedu / pypots Goto Github PK

A Python toolbox/library for reality-centric machine/deep learning and data mining on partially-observed time series with PyTorch, including SOTA neural network models for science tasks of imputation, classification, clustering, forecasting & anomaly detection on incomplete (irregularly-sampled) multivariate time series with NaN missing values/data

Home Page: https://pypots.com

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

time-series missing-data missing-values machine-learning partially-observed-time-series time-series-analysis imputation classification forecasting clustering

pypots's Introduction

🤙 Contact info:

👋 Hi, I'm Wenjie Du (杜文杰 in Chinese). My research majors in modeling time series with machine learning, especially partially-observed time series (POTS), namely, incomplete time series with missing values, A.K.A. irregularly-sampled time series. I strongly advocate open-source and reproducible research, and I always devote myself to building my work into valuable real-world applications. Unix philosophy "Do one thing and do it well" is also my life philosophy, and I always strive to walk my talk. My research goal is to model this non-trivial and kaleidoscopic world with machine learning to make it a better place for everyone. It's my honor if my work could help you in any way.

🤔 POTS is ubiquitous in the real world and is vital to AI landing in the industry. However, it still lacks attention from academia and is also in short of a dedicated toolkit even in a community as vast as Python. Therefore, to facilitate our researchers and engineers' work related to POTS, I'm leading PyPOTS Research Team (pypots.com) to build a comprehensive Python toolkit ecosystem for POTS modeling, including data preprocessing, neural net training, and benchmarking. Stars🌟 on our repos are also very welcome of course if you like what we're trying to achieve with PyPOTS.

💬 I'm open to questions related to my research and always try my best to help others. I love questioning myself and I never stop. If you have questions for discussion or have interests in collaboration, please feel free to drop me an email or ping me on LinkedIn/WeChat/Slack (contact info is at the top) 😃 You can follow me on Google Scholar and GitHub to get notified of our latest publications and open-source projects. Note that I'm very glad to help review papers related to my research, but ONLY for open-source ones with readable code.

❤️ If you enjoy what I do, you can fund me and become a sponsor. And I assure you that every penny from sponsorships will be used to support impactful open-science research.

😊 Thank you for reading my profile. Feel free to contact me if you'd like to trigger discussions.

🏠 Visits

pypots's People

Contributors

Stargazers

Watchers

Forkers

jyotirmayaijaradar lodrantl lyzl2010 stpingi maciejskrabski the-black-coat alice202108 steveliu91 objectin jackyin68 xiaohan2013 the-blackcoat cleancoindev licj1 oswaldxia tdl77 samfallahian wsgan001 joshuawe lazograf oguiza carrotsniper terragord7 manu87ds daniyalk20 tazbiulhassan nirey10 xzscode demstalferez mmtmr skpalu youngyang-sjtu ronghaogu mekongdelta-mind lumisong yhzhu99 saisumanv minhozju augustjw dragonfly2023h leeohalloran vemuribv tkexchange shitoudidi cracer colesussmeier kiettuan1792001 marekb-sci justin0388 newkeyto ahmaddroobi99 icuraslw hit636 gugababa wx0chan victoeywilly linglongqian bbridgecn lll-hll shubhampachori12110095 hervwan tingwei161803

pypots's Issues

Running tests locally fails if host has less than 2 cuda devices

Issue description

When developing locally the tests in tests/test_tranining_on_multi_gpus.py fail on a single-gpu host.

Possible fix: skip the tests in this module alltogether if host does not have 2 cuda-enabled devices minimum.

Please release package source on PyPI

Currently only a .whl file is available on PyPI. Please release both the .whl file and the source (*.tar.gz) file on PyPI.

can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

PS C:\Users\Lyc\Downloads\PyPOTS-main\PyPOTS-main> & C:/Users/Lyc/AppData/Local/Programs/Python/Python39/python.exe c:/Users/Lyc/Downloads/PyPOTS-main/PyPOTS-main/pypots/tests/test_imputation.py
Running test cases for BRITS...
Model initialized successfully. Number of the trainable parameters: 580976
epoch 0: training loss 1.2366, validating loss 0.4201
epoch 1: training loss 0.8974, validating loss 0.3540
epoch 2: training loss 0.7426, validating loss 0.2919
epoch 3: training loss 0.6147, validating loss 0.2414
epoch 4: training loss 0.5411, validating loss 0.2157
ERunning test cases for BRITS...
Model initialized successfully. Number of the trainable parameters: 580976
epoch 0: training loss 1.2054, validating loss 0.4022
epoch 1: training loss 0.8631, validating loss 0.3399
epoch 2: training loss 0.7204, validating loss 0.2863
epoch 3: training loss 0.5995, validating loss 0.2399
epoch 4: training loss 0.5325, validating loss 0.2123
ERunning test cases for LOCF...
LOCF test_MAE: 0.17510570872656786
.Running test cases for LOCF...
.Running test cases for SAITS...
Model initialized successfully. Number of the trainable parameters: 1332704
epoch 0: training loss 0.9181, validating loss 0.2936
epoch 1: training loss 0.6287, validating loss 0.2303
epoch 2: training loss 0.5345, validating loss 0.2086
epoch 3: training loss 0.4735, validating loss 0.1895
epoch 4: training loss 0.4224, validating loss 0.1744
ERunning test cases for SAITS...
Model initialized successfully. Number of the trainable parameters: 1332704
epoch 0: training loss 0.7823, validating loss 0.2779
epoch 1: training loss 0.5015, validating loss 0.2250
epoch 2: training loss 0.4418, validating loss 0.2097
epoch 3: training loss 0.4119, validating loss 0.1994
epoch 4: training loss 0.3866, validating loss 0.1815
ERunning test cases for Transformer...
Model initialized successfully. Number of the trainable parameters: 666122
epoch 0: training loss 0.7715, validating loss 0.2843
epoch 1: training loss 0.4861, validating loss 0.2271
epoch 2: training loss 0.4176, validating loss 0.2077
epoch 3: training loss 0.3822, validating loss 0.2005
epoch 4: training loss 0.3592, validating loss 0.1961
ERunning test cases for Transformer...
Model initialized successfully. Number of the trainable parameters: 666122
epoch 0: training loss 0.8033, validating loss 0.2910
epoch 1: training loss 0.4856, validating loss 0.2345
epoch 2: training loss 0.4282, validating loss 0.2157
epoch 3: training loss 0.3882, validating loss 0.2051
epoch 4: training loss 0.3599, validating loss 0.1942
E

ERROR: test_impute (main.TestBRITS)

Traceback (most recent call last):
File "c:\Users\Lyc\Downloads\PyPOTS-main\PyPOTS-main\pypots\tests\test_imputation.py", line 125, in setUp
self.brits.fit(self.train_X, self.val_X)
File "C:\Users\Lyc\AppData\Local\Programs\Python\Python39\lib\site-packages\pypots\imputation\brits.py", line 504, in fit
self._train_model(training_loader, val_loader, val_X_intact, val_X_indicating_mask)
File "C:\Users\Lyc\AppData\Local\Programs\Python\Python39\lib\site-packages\pypots\imputation\base.py", line 142, in _train_model
if np.equal(self.best_loss, float('inf')):
File "C:\Users\Lyc\AppData\Local\Programs\Python\Python39\lib\site-packages\torch_tensor.py", line 732, in array
return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

======================================================================
ERROR: test_parameters (main.TestBRITS)

======================================================================
ERROR: test_impute (main.TestSAITS)

Traceback (most recent call last):
File "c:\Users\Lyc\Downloads\PyPOTS-main\PyPOTS-main\pypots\tests\test_imputation.py", line 45, in setUp
self.saits.fit(self.train_X, self.val_X)
File "C:\Users\Lyc\AppData\Local\Programs\Python\Python39\lib\site-packages\pypots\imputation\saits.py", line 170, in fit
self._train_model(training_loader, val_loader, val_X_intact, val_X_indicating_mask)
File "C:\Users\Lyc\AppData\Local\Programs\Python\Python39\lib\site-packages\pypots\imputation\base.py", line 142, in _train_model
if np.equal(self.best_loss, float('inf')):
File "C:\Users\Lyc\AppData\Local\Programs\Python\Python39\lib\site-packages\torch_tensor.py", line 732, in array
return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

======================================================================
ERROR: test_parameters (main.TestSAITS)

======================================================================
ERROR: test_impute (main.TestTransformer)

Traceback (most recent call last):
File "c:\Users\Lyc\Downloads\PyPOTS-main\PyPOTS-main\pypots\tests\test_imputation.py", line 89, in setUp
self.transformer.fit(self.train_X, self.val_X)
File "C:\Users\Lyc\AppData\Local\Programs\Python\Python39\lib\site-packages\pypots\imputation\transformer.py", line 256, in fit
self._train_model(training_loader, val_loader, val_X_intact, val_X_indicating_mask)
File "C:\Users\Lyc\AppData\Local\Programs\Python\Python39\lib\site-packages\pypots\imputation\base.py", line 142, in _train_model
if np.equal(self.best_loss, float('inf')):
File "C:\Users\Lyc\AppData\Local\Programs\Python\Python39\lib\site-packages\torch_tensor.py", line 732, in array
return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

======================================================================
ERROR: test_parameters (main.TestTransformer)

Ran 8 tests in 176.311s

FAILED (errors=6)

name 'MessagePassing' is not defined pypots0.0.9

File "/home/ubuntu/miniconda3/envs/pypots/lib/python3.8/site-packages/pypots/classification/init.py", line 10, in
from pypots.classification.raindrop import Raindrop
File "/home/ubuntu/miniconda3/envs/pypots/lib/python3.8/site-packages/pypots/classification/raindrop.py", line 83, in
class ObservationPropagation(MessagePassing):
NameError: name 'MessagePassing' is not defined

Parallel training on multi GPUs

1. Feature description

Enable to train PyPOTS NN models on multiple CUDA device parallely.

Parallel training on multiple GPUs for acceleration is useful, and this feature is on our list, but without priority. Mainly because:

If your dataset is very large, PyPOTS provides a data lazy-loading strategy to help you only load necessary data samples during training. Simply using multiple GPU devices for training cannot ease the memory load, because your data still has to be loaded into RAM first for distributing to GPUs;
Different from LLMs, neural network models for time series modeling usually are not large models. A single GPU can accelerate the training to a good speed. So far, you even can run all models in PyPOTS on your laptop with CPUs at an acceptable training and inference speed. Especially, nowadays laptops generally have at least 4 cores. I’m not saying training on multiple GPUs is useless. In some extreme scenes, it can be very helpful;
Recently, this feature was requested by a member of our community who is using PyPOTS to train a GRU-D model for a POTS classification task, the training takes too much time (even after trying to increase the value of num_workers) and one has 4 GPUs on the machine but cannot use them for parallel training to speed up. Therefore, I'm considering adding the feature of parallel training in the following release. I implemented it with DataParallel, but PyTorch suggests using DistributedDataParallel https://pytorch.org/tutorials/intermediate/ddp_tutorial.html#comparison-between-dataparallel-and-distributeddataparallel. As I mentioned above, I think this is not a necessary feature so I postpone the redesign.

2. Motivation

Speed up the training process.

3. Your contribution

Will make a PR to add this feature.

`Attributes` heading of classes in the docs are not showing correctly

1. System Info

Refer to https://docs.pypots.com

2. Information

The official example scripts
My own created scripts

3. Reproduction

There's no Attributes heading displayed.

4. Expected behavior

Same problem in this issue ivadomed/ivadomed#315

The greeting workflow is not working

1. System Info

https://github.com/WenjieDu/PyPOTS/actions/runs/5541742029/jobs/10117529692

2. Information

The official example scripts
My own created scripts

3. Reproduction

https://github.com/WenjieDu/PyPOTS/actions/runs/5541742029/jobs/10117529692

4. Expected behavior

Running failed.

Add templates for contributors to add new models

1. Feature description

All models have been separated into a single package in PR #86 for better standardization and easier management. I think we need templates to guide our contributors to add their models.

2. Motivation

This can make it less complicated for contributors to integrate their models into PyPOTS.

3. Your contribution

Will make a PR.

如果我有其他训练集，怎么利用你们的模型进行训练呢

Issue description

可以单单用你们的模型跑出结果吗，类似于outputs = model(X)的形式？

BRITS imputation test fails on cuda device mismatch

Hi,
when trying to run imputation tests with commit 6dcc894 on dev branch.

py3.9_cuda11.3_cudnn8.2.0_0

$ python -m pytest tests/test_imputation.py

./tests/test_imputation.py::TestBRITS::test_parameters Failed with Error: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
  File ".../unittest/case.py", line 59, in testPartExecutor
    yield
  File ".../unittest/case.py", line 588, in run
    self._callSetUp()
  File ".../unittest/case.py", line 547, in _callSetUp
    self.setUp()
  File ".../PyPOTS/pypots/tests/test_imputation.py", line 98, in setUp
    self.brits.fit(self.train_X, self.val_X)
  File "/PyPOTS/pypots/imputation/brits.py", line 504, in fit
    self._train_model(training_loader, val_loader, val_X_intact, val_X_indicating_mask)
  File "/PyPOTS/pypots/imputation/base.py", line 154, in _train_model
    if np.equal(self.best_loss, float("inf")):
  File .../lib/python3.9/site-packages/torch/_tensor.py", line 732, in __array__
    return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

PyPOTS needs a tool like `pypots-cli env`

Feature description

pypots-cli should include an environment tool to help users and developers easily install dependencies.

Motivation

It could be very useful, considering the situation I reported in discussion #58 that torch_geometirc and related dependencies may be hard to install for our users. Hence, with pypots-cli env, running a simple command like pypots-cli env --install optional should help install dependencies presented in file setup.cfg that include torch_geometirc.

Your contribution

Will create a PR to add this feature and link it with this issue.

Adding functions to process time-series datasets with sliding-window and save into h5 files

1. Feature description

We need some utilities to help with data processing

adding a sliding-window function to process 2-D [n_steps, n_features] datasets into 3-D [n_samples, n_steps, n_features] datasets for model input;
adding a data saving function to save datasets (organized as dictionaries) into hdf5 files for the data lazy-loading strategy;

2. Motivation

For ease of usage.

3. Your contribution

Will create a PR to add them.

Log file output

1. Feature description

Allow training logs to be output to a file

2. Motivation

Adding the functionality to output logs to a file would be better

3. Your contribution

I will try to help

The footer on the docs home page does not appear properly

1. System Info

It's on the remote https://docs.pypots.com, deployed by ReadTheDocs.

2. Information

The official example scripts
My own created scripts

3. Reproduction

No need to operate.

4. Expected behavior

View this error on https://docs.pypots.com. The footer does not appear at the bottom of the page, but on the right side.

error when trying to train raindrop classification on multiple gpu

1. System Info + Information

system info: torch 2.0.1, pypots 0.1.1 - gpu: 8x RTX 4090
problem: when training the raindrop model as usual i wanted to make use of all my gpus. I did everything as in the documentation but got the following error after changing the device variable to a list-
Thanks a lot!

2. Reproduction

raindrop = Raindrop( n_steps = X.shape[1], n_features = X.shape[2], ... num_workers = 8, ... device = ['cuda:0', 'cuda:1'], ... )

4. Expected behavior

no error

Separate each model into a single package

1. Feature description

PyPOTS may should separate each model into a single package. Take SAITS as an example, its package structure should be like below

├── pypots
│	├── imputation
│	│	├── saits
│	│	│	├── __init__.py
│	│	│	├── dataset.py
│	│	│	├── model.py
│	│	│	└── module.py

model.py includes the main model/algorithm and the wrapper exposed to users;
module.py includes layers and modules for the main model/algorithm if necessary;
dataset.py includes specifically-designed class Dataset for this model's data processing;

2. Motivation

To make the library more standardized, and for easier management.

3. Your contribution

Will create a PR to finish this.

Numpy is not available error

Discussed in #31

^{Originally posted by lauredecaudin January 25, 2023}
Hello !

I tried to use the package with the code found in the README,

import numpy as np
from sklearn.preprocessing import StandardScaler
from pypots.data import load_specific_dataset, mcar, masked_fill
from pypots.imputation import SAITS
from pypots.utils.metrics import cal_mae
# Data preprocessing. Tedious, but PyPOTS can help. 🤓
data = load_specific_dataset('physionet_2012')  # PyPOTS will automatically download and extract it.
X = data['X']
num_samples = len(X['RecordID'].unique())
X = X.drop('RecordID', axis = 1)
X = StandardScaler().fit_transform(X.to_numpy())
X = X.reshape(num_samples, 48, -1)
X_intact, X, missing_mask, indicating_mask = mcar(X, 0.1) # hold out 10% observed values as ground truth
X = masked_fill(X, 1 - missing_mask, np.nan)
# Model training. This is PyPOTS showtime. 💪
saits = SAITS(n_steps=48, n_features=37, n_layers=2, d_model=256, d_inner=128, n_head=4, d_k=64, d_v=64, dropout=0.1, epochs=10)
saits.fit(X)  # train the model. Here I use the whole dataset as the training set, because ground truth is not visible to the model.
imputation = saits.impute(X)  # impute the originally-missing values and artificially-missing values
mae = cal_mae(imputation, X_intact, indicating_mask)

And I get this error :

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<command-282728034361361> in <module>
     15 # Model training. This is PyPOTS showtime. 💪
     16 saits = SAITS(n_steps=48, n_features=37, n_layers=2, d_model=256, d_inner=128, n_head=4, d_k=64, d_v=64, dropout=0.1, epochs=10)
---> 17 saits.fit(X)  # train the model. Here I use the whole dataset as the training set, because ground truth is not visible to the model.
     18 imputation = saits.impute(X)  # impute the originally-missing values and artificially-missing values
     19 mae = cal_mae(imputation, X_intact, indicating_mask)

/databricks/python/lib/python3.8/site-packages/pypots/imputation/saits.py in fit(self, train_X, val_X)
    216 
    217     def fit(self, train_X, val_X=None):
--> 218         train_X = self.check_input(self.n_steps, self.n_features, train_X)
    219         if val_X is not None:
    220             val_X = self.check_input(self.n_steps, self.n_features, val_X)

/databricks/python/lib/python3.8/site-packages/pypots/base.py in check_input(self, expected_n_steps, expected_n_features, X, y, out_dtype)
     76                 X = torch.tensor(X).to(self.device)
     77             elif is_array:
---> 78                 X = torch.from_numpy(X).to(self.device)
     79             else:  # is tensor
     80                 X = X.to(self.device)

RuntimeError: Numpy is not available

Anyone knows what's happening ?

I use Databricks with a cluster 10.4 LTS https://docs.databricks.com/release-notes/runtime/10.4.html
I installed PyPots with pip install pypots

Thanks

Docs footer on page`References` has problem

Issue description

The footer display problem mentioned in #80 exists again. The problem seems to be related with docutils and sphinxcontrib-bibtex. Currently versions are Sphinx==6.2.1, docutils==0.19, sphinxcontrib-bibtex==2.5.0, furo==2023.3.27.

Refer to https://docs.pypots.com/en/test/references.html to see the mispresented footer, which should be at the page bottom but is on the right now.

PyPI auto release workflow failed

1. System Info

https://github.com/WenjieDu/PyPOTS/actions/runs/4993196983/jobs/8941963801

2. Information

The official example scripts
My own created scripts

3. Reproduction

https://github.com/WenjieDu/PyPOTS/actions/runs/4993196983/jobs/8941963801

4. Expected behavior

Failed with missing dependencies.

How can I customize my own dataset to fit PyPOTS SOTA imputation models?

1. Feature description

I want to run Pypots SOA models for my own dataset.

2. Motivation

I have a multivariate dataset and want to check how PyPots models are working on it for data imputation.

3. Your contribution

None so far

Docs building failed

1. System Info

Docs building failed on ReadTheDocs. https://readthedocs.org/projects/pypots/builds/20424594/

2. Information

The official example scripts
My own created scripts

3. Reproduction

https://readthedocs.org/projects/pypots/builds/20424594/

4. Expected behavior

Building failed.

Separate optimizers and models

1. Feature description

To make the framework more usable, we should separate models and optimizers, for example:

saits=SAITS(
n_steps=48,
n_features=37,
n_layers=3,
...
optimizer=pypots.optim.Adam(),
...
)

2. Motivation

Such a design can give users more options, and make PyPOTS framework more powerful. For example, we can add more functionalities into pypots.optim.Optimizer classes, like lr scheduler.

3. Your contribution

Will make a PR to implement it.

TSDB has adjusted its API in v0.1

1. System Info

PyPOTS v0.1.1

2. Information

The official example scripts
My own created scripts

3. Reproduction

https://github.com/WenjieDu/PyPOTS/actions/runs/5803636573

4. Expected behavior

ImportError: cannot import name '_download_and_extract' from 'tsdb.data_processing'

GPU enabled model raises Exception: expected self and mask to be on the same device, but got mask on cpu and self on cuda:0

Hello,
great library, but using gpu enabled machine results in errors.

pypots version = 0.0.6 (the one available in PyPI)

code to replicate problem:

import unittest
from pypots.tests.test_imputation import TestBRITS, TestLOCF, TestSAITS, TestTransformer
from pypots import __version__


if __name__ == "__main__":
    print(__version__)
    unittest.main()

results:

0.0.6
Running test cases for BRITS...
Model initialized successfully. Number of the trainable parameters: 580976
ERunning test cases for BRITS...
Model initialized successfully. Number of the trainable parameters: 580976
ERunning test cases for LOCF...
LOCF test_MAE: 0.1712224306027283
.Running test cases for LOCF...
.Running test cases for SAITS...
Model initialized successfully. Number of the trainable parameters: 1332704
Exception: expected self and mask to be on the same device, but got mask on cpu and self on cuda:0
ERunning test cases for SAITS...
Model initialized successfully. Number of the trainable parameters: 1332704
Exception: expected self and mask to be on the same device, but got mask on cpu and self on cuda:0
ERunning test cases for Transformer...
Model initialized successfully. Number of the trainable parameters: 666122
epoch 0: training loss 0.7681, validating loss 0.2941
epoch 1: training loss 0.4731, validating loss 0.2395
epoch 2: training loss 0.4235, validating loss 0.2069
epoch 3: training loss 0.3781, validating loss 0.1914
epoch 4: training loss 0.3530, validating loss 0.1837
ERunning test cases for Transformer...
Model initialized successfully. Number of the trainable parameters: 666122
epoch 0: training loss 0.7826, validating loss 0.2820
epoch 1: training loss 0.4687, validating loss 0.2352
epoch 2: training loss 0.4188, validating loss 0.2132
epoch 3: training loss 0.3857, validating loss 0.1977
epoch 4: training loss 0.3604, validating loss 0.1945
E
======================================================================
ERROR: test_impute (pypots.tests.test_imputation.TestBRITS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "mydirs(...)/python3.9/site-packages/pypots/tests/test_imputation.py", line 99, in setUp
    self.brits.fit(self.train_X, self.val_X)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/brits.py", line 494, in fit
    training_set = DatasetForBRITS(train_X)  # time_gaps is necessary for BRITS
  File "mydirs(...)/python3.9/site-packages/pypots/data/dataset_for_brits.py", line 62, in __init__
    forward_delta = parse_delta(forward_missing_mask)
  File "mydirs(...)/python3.9/site-packages/pypots/data/dataset_for_brits.py", line 36, in parse_delta
    delta.append(torch.ones(1, n_features) + (1 - m_mask[step]) * delta[-1])
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

======================================================================
ERROR: test_parameters (pypots.tests.test_imputation.TestBRITS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "mydirs(...)/python3.9/site-packages/pypots/tests/test_imputation.py", line 99, in setUp
    self.brits.fit(self.train_X, self.val_X)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/brits.py", line 494, in fit
    training_set = DatasetForBRITS(train_X)  # time_gaps is necessary for BRITS
  File "mydirs(...)/python3.9/site-packages/pypots/data/dataset_for_brits.py", line 62, in __init__
    forward_delta = parse_delta(forward_missing_mask)
  File "mydirs(...)/python3.9/site-packages/pypots/data/dataset_for_brits.py", line 36, in parse_delta
    delta.append(torch.ones(1, n_features) + (1 - m_mask[step]) * delta[-1])
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

======================================================================
ERROR: test_impute (pypots.tests.test_imputation.TestSAITS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/base.py", line 83, in _train_model
    results = self.model.forward(inputs)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/saits.py", line 95, in forward
    imputed_data, [X_tilde_1, X_tilde_2, X_tilde_3] = self.impute(inputs)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/saits.py", line 62, in impute
    enc_output, _ = encoder_layer(enc_output)
  File "mydirs(...)/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/transformer.py", line 122, in forward
    enc_output, attn_weights = self.slf_attn(enc_input, enc_input, enc_input, attn_mask=mask_time)
  File "mydirs(...)/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/transformer.py", line 72, in forward
    v, attn_weights = self.attention(q, k, v, attn_mask)
  File "mydirs(...)/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/transformer.py", line 32, in forward
    attn = attn.masked_fill(attn_mask == 1, -1e9)
RuntimeError: expected self and mask to be on the same device, but got mask on cpu and self on cuda:0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "mydirs(...)/python3.9/site-packages/pypots/tests/test_imputation.py", line 35, in setUp
    self.saits.fit(self.train_X, self.val_X)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/saits.py", line 171, in fit
    self._train_model(training_loader, val_loader, val_X_intact, val_X_indicating_mask)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/base.py", line 123, in _train_model
    raise RuntimeError('Training got interrupted. Model was not get trained. Please try fit() again.')
RuntimeError: Training got interrupted. Model was not get trained. Please try fit() again.

======================================================================
ERROR: test_parameters (pypots.tests.test_imputation.TestSAITS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/base.py", line 83, in _train_model
    results = self.model.forward(inputs)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/saits.py", line 95, in forward
    imputed_data, [X_tilde_1, X_tilde_2, X_tilde_3] = self.impute(inputs)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/saits.py", line 62, in impute
    enc_output, _ = encoder_layer(enc_output)
  File "mydirs(...)/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/transformer.py", line 122, in forward
    enc_output, attn_weights = self.slf_attn(enc_input, enc_input, enc_input, attn_mask=mask_time)
  File "mydirs(...)/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/transformer.py", line 72, in forward
    v, attn_weights = self.attention(q, k, v, attn_mask)
  File "mydirs(...)/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/transformer.py", line 32, in forward
    attn = attn.masked_fill(attn_mask == 1, -1e9)
RuntimeError: expected self and mask to be on the same device, but got mask on cpu and self on cuda:0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "mydirs(...)/python3.9/site-packages/pypots/tests/test_imputation.py", line 35, in setUp
    self.saits.fit(self.train_X, self.val_X)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/saits.py", line 171, in fit
    self._train_model(training_loader, val_loader, val_X_intact, val_X_indicating_mask)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/base.py", line 123, in _train_model
    raise RuntimeError('Training got interrupted. Model was not get trained. Please try fit() again.')
RuntimeError: Training got interrupted. Model was not get trained. Please try fit() again.

======================================================================
ERROR: test_impute (pypots.tests.test_imputation.TestTransformer)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "mydirs(...)/python3.9/site-packages/pypots/tests/test_imputation.py", line 68, in setUp
    self.transformer.fit(self.train_X, self.val_X)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/transformer.py", line 257, in fit
    self._train_model(training_loader, val_loader, val_X_intact, val_X_indicating_mask)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/base.py", line 129, in _train_model
    if np.equal(self.best_loss, float('inf')):
  File "mydirs(...)/python3.9/site-packages/torch/_tensor.py", line 732, in __array__
    return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

======================================================================
ERROR: test_parameters (pypots.tests.test_imputation.TestTransformer)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "mydirs(...)/python3.9/site-packages/pypots/tests/test_imputation.py", line 68, in setUp
    self.transformer.fit(self.train_X, self.val_X)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/transformer.py", line 257, in fit
    self._train_model(training_loader, val_loader, val_X_intact, val_X_indicating_mask)
  File "mydirs(...)/python3.9/site-packages/pypots/imputation/base.py", line 129, in _train_model
    if np.equal(self.best_loss, float('inf')):
  File "mydirs(...)/python3.9/site-packages/torch/_tensor.py", line 732, in __array__
    return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

----------------------------------------------------------------------
Ran 8 tests in 20.239s

FAILED (errors=6)

i suspect that you call .to(device) too early on data. You might also override device parameter when initiating new tensors (i.e. in torch.ones in parse_delta)

Best regards!

Add git hooks for linting before code committing or push

Feature description

We run a code-linting workflow on pushes. Although we already have a lint-checking tool in pypots-cli dev --lint-code, I figure out that I sometimes forget to run it and commit code with lints. This could result in unnecessary troubles like code modification and re-pushing. Hence, we should have some git hooks for PyPOTS project to help solve this problem, tools like pre-commit may be useful.

Motivation

See the description above. We don't want to put all checking and testing things on CI and cloud. We need some more normative rules for code development here.

Your contribution

Will create a PR and link it to this issue.

Variable sequence length

1. Feature description

To enable variable 'sequence length' of the input data.

2. Motivation

Some of the input training data are composed of multiple concatenated time series of different lengths.

3. Your contribution

I will try to help

Too low classification metrics when using classification models

1. System Info

According to the feedback from our community, the reported classification metrics (e.g. PR-AUC, ROC-AUC) are too low. This shouldn't be and must be bug.

2. Information

The official example scripts
My own created scripts

3. Reproduction

A figure from our PyPOTS user.

4. Expected behavior

PyPOTS reports very low ROC-AUC, around 0.5.

Daily testing continuously failed with MacOS Python 3.7

Issue description

https://github.com/WenjieDu/PyPOTS/actions/runs/5238494634/jobs/9457435056

Question about imputation

Hi!

In the example you provided, the following code is used to impute the originally-missing values and artificially-missing values

imputation = saits.impute(X)

After I converted imputation to DataFrame, I compared it with the original data and found that there was a big difference. Did I make a mistake? What should I do?

I used the following code to compare the original data and imputation ：

data['X'].head(5)

pd.DataFrame(imputation.reshape(-1, 37)).head(5)

The stale workflow is not working

1. System Info

The stale workflow we currently have here is not working https://github.com/WenjieDu/PyPOTS/blob/1bbb7dbd092fdce20b663f4087bfa39a6129e163/.github/stale.yml. We need to update and enable it to run.

2. Information

The official example scripts
My own created scripts

3. Reproduction

The workflow does not appear in Actions.

4. Expected behavior

Not running.

Adding new dataset

1. Feature description

Hi,

The data is from coastal base stations that monitor vessel traffic. Irregularly sampled and with missing values. Automatic identification system data in this case contains 8 columns and approximately 10 000 samples per vessel. There are approximately 200 vessels per .csv file. I have a dozen of them for the area in Norway near Ålesund.
ais_20201118.zip

2. Motivation

This could be a fun playground for data imputation and prediction.

3. Your contribution

I believe I could help with coding, but this would be my first pull request. I'm trying to implement the BRITS model for data imputation, so at the moment, I'm preparing it for training.

Add access for explainability (TimeSHAP)

1. Feature description

Changes to the classify() and forward() methods to make the models compatible with TimeSHAP and other XAI methods.

2. Motivation

I would like to be able to understand which timesteps and features were most relevant for a particular prediction.

3. Your contribution

I don't know enough about how the models are structured to feel confident in changing the api.

learning-rate and pretrained model of SAITS

Hello, Wenjie,

I tried the PyPOTS with, it awesome! But I have following questions:
(1) During training with SAITS model, I found the learning-rate is recommend to lr = 0.00068277455043675505 in ‘PhysioNet2012_SAITS_best.ini’ file. I am wondering if there are some great methods to get such a learning-rate? (I only know to set 0.001, 0.0001 or such kind of stuffy numbers)
(2) if there are some possible to release the pretrained state_dict .pth file of SAITS(base) and SAITS? Because during training with my custom dataset, I encounter with an early-stop problem inside of 100 epochs, so I decide to see if there will be the same problem with PhysioNet2012 of epochs = 10000.
Or the training log files of SAITS(base) and SAITS would be helpful !

Thank you very much for your reply !

Code coverage is using wrong source directory

1. System Info

https://coveralls.io/builds/59841278 reports a 96% coverage which is using tests rather than pypots.

2. Information

The official example scripts
My own created scripts

3. Reproduction

https://coveralls.io/builds/59841278

4. Expected behavior

Using package tests rather than pypots as the source to report a 96% coverage.

Early stop

Wenjie,

I tried the PyPOTS with the Beijing Air quality database. For the dataset preparation, I follow the gene_UCI_BeijingAirQuality_dataset. The following is the PyPOTS setup.

saits_base = SAITS(seq_len=seq_len, n_features=132, 
                   n_layers=2,  # num of group-inner layers
                   d_model=256, # model hidden dim
                   d_inner=128, # hidden size of feed forward layer
                   n_head=4, # head num of self-attention
                   d_k=64, d_v=64, # key dim, value dim
                   dropout=0, 
                   epochs=200,
                   patience=30,
                   batch_size=32,
                   weight_decay=1e-5,
                   ORT_weight=1,
                   MIT_weight=1,
                  )

saits_base.fit(train_set_X)

PyPOTS stops earlier than the epochs specified (stops around epoch 80), without triggering either print('Exceeded the training patience. Terminating the training procedure...') or print('Finished all training epochs.').

epoch 0: training loss 0.9637 
epoch 1: training loss 0.6161 
epoch 2: training loss 0.5177 
epoch 3: training loss 0.4783 
epoch 4: training loss 0.4489 
...
epoch 73: training loss 0.2462 
epoch 74: training loss 0.2460 
epoch 75: training loss 0.2480 
epoch 76: training loss 0.2452 
epoch 77: training loss 0.2452 
epoch 78: training loss 0.2458 
epoch 79: training loss 0.2449 
epoch 80: training loss 0.2423 
epoch 81: training loss 0.2425 
epoch 82: training loss 0.2443 
epoch 83: training loss 0.2403 
epoch 84: training loss 0.2406

Then I evaluate the model performance (not knowing why the model stops early) on test_set as

test_set_mae = cal_mae(test_set_imputation, test_set_X_intact, test_set_indicating_mask)
0.21866121846582318

I have a few questions:

What could be the cause for the early stop?
In addition, is there any object in saits_base that stores the loss history?
Does the function cal_mae calculate the same MAE in your paper? For this Beijing air quality case, I should be able to tune the hyperparameter to get the test_set_mae down to around 0.146?

Thank you,
Haochen

Should add Pull-Request template

1. Feature description

PyPOTS needs a PR template.

2. Motivation

To make PRs to PyPOTS standardized. Also, a template can guide our contributors.

3. Your contribution

Will make a PR and link to this issue.

Adding data having same TimeStamp but different source

Issue description

Hi Wenjie,

I want to prepare my dataset having same Timestamp but from different source. Can you please let me know how to do it?

model giving same output prediction

Issue description

hello,
I have the problem that my trained models gives me the same output for no matter which input I have.
Is it a problem with the data or maybe the model/my training of the model?
So here are the labels & it's amount in the dataset:

Now my problem is tht when I use the classify feature to classify my test dataset the only label which raindrop predicts is 0:

I have only trained the model for some epochs to check it because training is pricy :), but may it be that this causes this issue or should I change the data preprocessing?

Conda-forge failed building on Azure

Issue description

https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=706772&view=logs&j=656edd35-690f-5c53-9ba3-09c10d0bea97&t=e5c8ab1d-8ff9-5cae-b332-e15ae582ed2d&l=680

Add test cases for `pypots-cli` tool

Issue description

At this time point, there're only very limited testing cases for pypots-cli here. We need more test cases to ensure its quality.

[Feature request] Is it possible to "warm-up" the transformer?

Thank you for creating this wonderful resource! This is an amazing and useful tool!

Regarding SAITS, is it possible to pass a learning rate scheduler, rather than a fixed learning rate, for the transformer to pre-train?

I ask this because I compared the outputs of training 100 epochs vs 1000 epochs. The loss continues to decrease, but the error on holdout timepoints does not change between 100 vs 1000 epochs. Strangely, the prediction (after 100 & 1000 epochs) is less accurate than linear interpolation...! I wondered if it is because the transformers have too many parameters, and it needs some help learning initially.

Autocorrelation

Issue description

Which parameters in SAITS helps to improve autocorrelation modelling? Thanks :)

Some problems in the demo

Question description

I tried for the demo you have shown in the README.But It seems have some problems when run the model.How can I fix it? Thank you!

Enabling users to customize loss function

1. Feature description

Add a feature to enable users to specify their own loss functions, which should be callable python functions.

2. Motivation

Currently the loss functions in PyPOTS models are fixed. This definitely has no problem because reproducing algorithms and models exactly is an important part in PyPOTS project. Algorithms and models should be kept as same as possible with the descriptions from original papers.

However, users from time to time have to specify the loss function for better optimization. For example, in some scenarios, users use MAE as their evaluation metric to assess the imputation accuracy, while some other users with their applications prefer to use MSE to evaluate the final imputation results. From the perspective of helping our users get better results, we should add such a feature.

3. Your contribution

Will create a PR to finish it.

Enable auto-publishing to PyPI

To make publishing a new version on PyPI easy, this tedious operation should be automated.

Daily testing failed

1. System Info

https://github.com/WenjieDu/PyPOTS/actions/runs/4898810658

2. Information

The official example scripts
My own created scripts

3. Reproduction

https://github.com/WenjieDu/PyPOTS/actions/runs/4898810658

4. Expected behavior

Failed on the step Install other dependencies, got error ModuleNotFoundError: No module named 'torch' but torch has already been installed in the previous step.

Badge of PyPI downloads is invalid

Issue description

The usage in the quick-start examples cannot run properly

1. System Info

https://docs.pypots.com/en/latest/examples.html#quick-start-examples

2. Information

The official example scripts
My own created scripts

3. Reproduction

Copy and past to run failed.

4. Expected behavior

The parameter n_head of SAITS should be n_heads.

CI-testing failed because of `protobuf`

Issue description

I notice that we're having some tasks failed in our CI-testing workflow, for example, https://github.com/WenjieDu/PyPOTS/actions/runs/5064524248/jobs/9092539446.

/usr/share/miniconda/envs/pypots-test/lib/python3.8/site-packages/google/protobuf/descriptor.py:51: in <module>
    from google.protobuf.pyext import _message
E   ImportError: /usr/share/miniconda/envs/pypots-test/lib/python3.8/site-packages/google/protobuf/pyext/_message.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN6google8protobuf2io17SafeDoubleToFloatEd

I have investigated this error and it has nothing to do with code in PyPOTS. The issue source is that these failed jobs have their conda installing a newer version of protobuf which is not compatible with other dependencies. It can be directly solved by specifying the version of protobuf as 4.21.12, and I have tested it.

The starter tutorial cannot run well. AttributeError: `load_load` function do not exist.

1. System Info

Latest pypots (v0.1.1)
python 3.10
Debian 11

2. Information

The official example scripts
My own created scripts

3. Reproduction

Simply run the Quick-start Examples

4. Expected behavior

Meet error:

Traceback (most recent call last):
  File "/home/zhuyh/projects/ImputeEHR/pypots.py", line 46, in <module>
    saits.load_load("examples/saits/manually_saved_saits_model")
AttributeError: 'SAITS' object has no attribute 'load_load'

wenjiedu / pypots Goto Github PK

pypots's Introduction

pypots's People

Contributors

Stargazers

Watchers

Forkers

pypots's Issues

Issue description

ERROR: test_impute (main.TestBRITS)

====================================================================== ERROR: test_parameters (main.TestBRITS)

====================================================================== ERROR: test_impute (main.TestSAITS)

====================================================================== ERROR: test_parameters (main.TestSAITS)

====================================================================== ERROR: test_impute (main.TestTransformer)

====================================================================== ERROR: test_parameters (main.TestTransformer)

1. Feature description

2. Motivation

3. Your contribution

1. System Info

2. Information

3. Reproduction

4. Expected behavior

1. System Info

2. Information

3. Reproduction

4. Expected behavior

1. Feature description

2. Motivation

3. Your contribution

Issue description

Feature description

Motivation

Your contribution

1. Feature description

2. Motivation

3. Your contribution

1. Feature description

2. Motivation

3. Your contribution

1. System Info

2. Information

3. Reproduction

4. Expected behavior

1. System Info + Information

2. Reproduction

4. Expected behavior

1. Feature description

2. Motivation

3. Your contribution

Discussed in #31

Issue description

1. System Info

2. Information

3. Reproduction

4. Expected behavior

1. Feature description

2. Motivation

3. Your contribution

1. System Info

2. Information

3. Reproduction

4. Expected behavior

1. Feature description

2. Motivation

3. Your contribution

1. System Info

2. Information

3. Reproduction

4. Expected behavior

Feature description

Motivation

Your contribution

1. Feature description

2. Motivation

3. Your contribution

1. System Info

2. Information

3. Reproduction

4. Expected behavior

Issue description

======================================================================
ERROR: test_parameters (main.TestBRITS)

======================================================================
ERROR: test_impute (main.TestSAITS)

======================================================================
ERROR: test_parameters (main.TestSAITS)

======================================================================
ERROR: test_impute (main.TestTransformer)

======================================================================
ERROR: test_parameters (main.TestTransformer)